Digital Curation

The last piece of the e-publishing puzzle came into focus today– or at least the last piece of this course. Because readers don’t just read text online– they “read” art, data and multimedia too.

Joyce Ray gave us an overview of digital curation today, and how difficult to preserve digital artifacts so that they can be found and interpreted.

The main elements:

  • TRAC checklist
  • data lifecycle
  • digital identifiers
  • metadata– including intentions of artist or creator
  • data management plan- now compulsory for funding

Aside from great metadata and digital identifiers, archivists also need to think about server space and costs. I guess physical libraries and archives also have to worry about losing funding or losing space, but for a digital archive, all it takes is one power failure, one flick of the switch to lose an entire collection.

We talked a lot about how to archive data– but how to make it readable? Infographics and manipulative data sets (like the ones put out by Sage) seem like the most viable options right now. It’s funny– the book has been the chosen vehicle of text for a thousand years (give or take) but the jury is still out on the best medium to deliver data.

Very simplistic infographic. Image credit George Takei,

What will become the easiest and most compelling way for people to “read” and actually understand raw data sets, born digital documents, and similar non-narrative texts?

And how will we access them. We talked about digital repositories and then digital access subscriptions. Once the idea of an all-inclusive subscription model was put on the table I started to think about cable, and the power that cable and internet providers have. I think that digital archives and repositories could end up with a greater status as gatekeepers than we currently realize. Library cards are free, but download speeds aren’t.

Publishers like Oxford are already acting as gatekeepers to some extent, by working with libraries to make sure that their databases are in front of every British citizen.

Is this just a continuation of the model always used by libraries and publishers, or is this new territory?

Back to Open Access

Or, why doesn’t Oxford University Press play nice with Oxford’s Bodleian Library?

I want to expand a bit on open access today, after taking in presentations from several representatives of Oxford University and Oxford University Press. So far we’ve been defining open access as who can read a published e-document. But Oxford University Press has taken it a bit farther, making sure that OUP databases are not only free, but available in every public library in the UK, at considerable cost to themselves.

So it’s fairly disappointing to realize that across the street, the Bodleian Library is hanging onto to an amazing ephemera collection it has yet to digitize. I know the Proquest/John Johnson project is coming along, but it was pretty sad to learn that the Dickens exhibit we came to see had little more than an online announcement to accompany it. I almost feel as though a library as privileged as the Bodleian should have an obligation to digitize their collection for open access.

Well the next day, Rhodri Jackson offered a few counterpoints to my righteous indignation. Even with the growing push for open access, there are still some valid drawbacks and misconceptions worth pointing out.

In England, the new Finch report is calling for all tax-payer funded research to be made publicly available to tax-payers by 2014. But Jackson claims there’s a tax-payer fallacy at work. We pay twice for services all the time. In New York, our taxes fund the subway, but we still need to purchase our MTA cards to ride them. (Now I think the MTA fares are incredibly over-priced, and a regressive tax on the poor and working classes, but that’s another story.)

Right or wrong, there’s still a cost associated with making content available and somebody’s got to pay for it. In the green model, for example, academics must deposit their articles in a repository. Who maintains that repository and makes sure it’s open to the public? Libraries with their tax-payer funded, ever-shrinking budgets?

The SOAP report also offered two reason not to publish in open access, both of which should matter to academics– funding and journal equality. With the gold model, we run the risk of preventing academics who aren’t at well-funded institutions from publishing their work. And with the pay-to-publish model, it would be all to easy for rich researchers to publish vanity articles.

Now these drawbacks to open access publishing of academic articles don’t directly relate to the difficulties that academic institutions seem to have when it comes to digitizing their primary source materials. But they are related in one way. Both cost money, and it’s not clear who should be footing the bill– authors, publishers, readers, or some combination of the three.

Databases for the People

Between the Berg Fashion Library and Oxford Reference Online, we’ve had some great demos over the last two days of all that digital publishing can be.

Berg did a great job of funneling their money into creating a good product– thorough cataloging, effective search, compelling discovery layer– their fashion database is useful to students and fun to use. They sure put that Harry Potter money to good use!

New edition of Harry Potter and the Philosopher’s Stone, that we picked up on our visit to Bloomsbury Press

And Oxford cut an interesting deal with the public libraries of Great Britain to make sure that every citizen would have access to their main reference works- the dictionary, the biography and music database.

So Oxford says this isn’t a money-making venture for them, and that makes sense given the discount they offered to the libraries.

But if you build a compelling-enough user-friendly database can you sell enough subscriptions to make it possible for researchers to make money to publish instead of paying to publish?

Inquiring minds wish to know.

One idea that surfaced a lot last week is how e-documents are never finished.

Data can be revisited and reworked; articles can be updated. An e-published document is subject to change at any point in time.

Now how does that affect what it’s worth? And how does an archivist keep track of all the changes?

Today we move from the academic to the casual reader, as we move from articles to full e-books.

Ruth Jones from Ingram Publishing described publishing as a way to meet the demand for information. And in that sense, creating an e-book that can be updated after the fact is anticipating that demand before the reader even knows it exists.

But dynamic updates seem to mean a lot more for the academic or the student, who might need access to the latest data. For the casual reader, changing the end of a novel would be more of a disruption than a benefit.

Alison Jones, from Palgrave MacMillan, pointed out a few more features that push e-publishing beyond simple digital text. Interestingly, the dynamic update makes an e-book safe(r) from piracy. There’s no point in putting a reference work or journal online if it’s going to be outdated in a week or a month.

But for e-books that are designed to mimic paper, piracy becomes more of an issue. So trade e-books can add digital watermarks that will identify the pirate by their original transaction ID. And text books can come with a digital forum in which each user has to comment in order to prove they purchased their text book.

I wish publishers would put as much energy into creating rental models, resale models and social sharing (digital bookclubs, anyone?) as they do into anti-piracy features, but I do appreciate the respect that e-book publishers show to readers– they clearly value getting their content into the hands and devices of their readers.

Privacy in e-Publishing

With my background in online marketing, I was pretty well aware of the privacy concerns connected to e-readers, and I was thinking about the latest Wall Street Journal report on my way to Day 2 of the Bloomsbury Conference.

So I was a bit surprised to learn there was a whole other set of privacy concerns in e-publishing, not coming from people who want to make money, but from people who just want knowledge.

With open data, Dr. Wissenburg explained, researchers can pull together anonymous data from multiple studies, and with the right skills, identify personal data about research participants. So now, open access isn’t just about the rights of the readers, it’s about the rights of the researched. I suspect it’s going to take a pretty technical solution to make sure that individuals are protected as we move forward with data sharing and collaboration.

I’m realizing this conference is targeted much more towards to the experienced scientist than to me, a social sciences/humanities interested librarian. But I’m still finding value (along with lots to look up afterwards.)

So it’s great to hear from Anne Welsh, who charmingly described the role of the subject librarian in today’s brave new world. She introduced the Future Libraries Project to help academic librarians prepare, and described how librarians can help academics today, by not just “capturing the end-role of scholarship,” but being involved throughout, and helping scholars get their ideas to the right audiences. She mentioned a number of reports, which I plan to read up on soon. You can find them here.

The Future Article

It’s the first day of the Bloomsbury Conference and I’m going to take this post to research a few of the concepts that came up during the presentations.

I’m quickly realizing two important ideas:

1) The focus is on the Article as a key vehicle for scholarly communication.
2) Article Impact is code for ‘who cares about what you have to say?’ But it doesn’t influence the publishing market… yet.

A few days ago, we talked about the Article of the Future, which adds links and metrics to the basic article, and Brian Hole showed us a more equitable publishing model for open access.

Today we added on to the idea of what an open access article could and should be.

Lee Ann Coleman from the British Library mentioned Who Needs Access, a site with a great case for why the public should be able to access tax-funded research. She also talked about the role of the library, not just as educators, not just as providers, but as advocates for access.

David De Roure gave us a good picture of the future scientific article– one that is data-driven, public and powered by computers that enable us to think through our new digital world and collaborate in new ways. He introduced the idea of the social machine. I’m not sure I fully understand, but I think we’re talking about tools that allow all of us to participate in scientific research.

The example du jour is Galaxy Zoo, and I can understand why. I especially like that the research done from these images is easy to find and publicly available. The pictures are nice-looking too!

Published with Creative Commons permission from Galaxy Zoo

Mike Taylor brought up new measures for impact, like altmetrics that look at weblinks, mass media, tweets and usage counts. But do academic publishers look at this kind of thing to determine who gets published and how the public can access them?

There seems to be a lag between the value of research and the value assigned by publishers.

Hard Truths

Yesterday was all sweetness, light and lovely old books. Today, reality sets in.

We began talking about the major issues affecting e-publishing in academia. How do you provide quality access to quality content for everyone who wants it while still making money?

I’ve been thinking about this fundamental problem a lot this week, after reading this Open Letter to Emily, the NPR intern who recently confessed to essentially stealing many GBs of music. David Lowery makes an incredibly sane argument against file sharing, in a world when it’s seemingly quite easy to justify. But unlike stealing pennies from the pockets of my favorite musicians, it’s a lot harder to quantify who’s getting cheated in the world of academic publishing.

This is a world where academics are willing to pay to publish if it means open access. I can’t imagine any indie band paying for people to download their songs. Radiohead, maybe, but not someone who just cut their first album.

We talked about a few different ways to solve the access problem.
1) A combination of subscription and open access to share costs
2) A more open publishing platform in which the best articles get ranked and rise to the top, but everyone pays to publish

Both of these seem kind of haphazard– like they’ll satisfy publishers and academics without really fixing the problem of how to provide quality digital content and quality access. Both solutions really exclude a significant person in this equation. Me!

As a grad student, I pay tuition and I want to research. But both of these systems assume zero student demand. Which is strange, considering that I’ve already payed for the rights to read these articles; I’m just counting on my school’s library to buy access to the right journals. But if these journals can figure out how to leverage demand (maybe with some exciting discovery layers!), by actually promoting their articles among a community outside of high level experts, it might make this dilemma a bit easier to solve.

Hopefully, we’ll hear some thoughts on this at tomorrow’s Bloomsbury Conference.

