I ran over the comment limits on David Rosenthal’s blog when I tried to reply to his reply to my comment on his blog. I’ve included my reply below instead.
The problem I see is that we fundamentally disagree on the framing of the digital preservation challenge. I meant to reply to your last “refutation” of Jeff Rothenberg’s presentation at Future Perfect 2012 but hadn’t gotten around to it yet. Perhaps now is a good time. I was the one that organised Jeff’s visit and presentation and I talked with him about his views both before and after so I have a pretty good idea of what he was trying to say. I won’t try to put words into his mouth though and will instead give my (similar) views below.
The digital preservation challenge, as I see it, is to preserve digitally stored or accessed content over time. I think we can both agree that if we aren’t leaving something unchanged then we aren’t preserving anything. So, to me, the digital preservation challenge requires that we ensure that the content is unchanged over time
Now I’m not sure if you would agree that that is what we are trying to do. If you do, then it seems we disagree on what the content is that we are trying to preserve. If you disagree that that is what we are trying to do then at least we might be able to make some progress on figuring out what the disagreement stems from.
So if you can at least understand my perspective I’d also like to address your comments about format obsolesce. I’m not a proponent of the idea of format obsolescence. The idea makes little sense to me. However I am a proponent of a weak form of the idea of software obsolescence and, more importantly, the associated idea of content loss due to software obsolescence.
The weaker form of the idea of software obsolescence that I’m a proponent of is that because of hardware changes, software loss and loss of understanding about how to use software, software becomes unusable using current technology without active intervention.
The associated idea of content loss that I am a proponent of is the idea that to successfully preserve many types of content you need to preserve software that that content relies upon in order to be presented to users and interacted with. A stronger way of putting that is to say that in many cases, the thing to be preserved is so inextricably connected to the software that the software is part of that thing.
If you take that leap to accepting (whether fully or in order to simplify the explanation) that the software is part of the thing to be preserved, then it becomes obvious that practitioners who are only doing migration are in many cases not doing real preservation as they are not preserving the entirety of the objects. Hence Jeff’s presentation in which he reprimanded the community for not really making progress since the early 2000s. Almost nobody is preserving the software functionality.
As it is relevant to your post and comments, I’ll use a web page as an example to illustrate what I mean. The content presented to users for interaction with by a traditional web page, is presented using a number of digital files including the server hosted files, e.g. the web server & applications, the html/XHTML pages, scripts, images, audio, and the locally hosted files such as the browser, fonts, browser skins, extensions etc. The combination of these files mediated by usually at least two computers (the server and the client) together present content to the user that the user can interact with it. Changing any one of the files involved in this process may change the content presented to the user. To preserve such a page it is my view that we need to start by deciding what content makes up the page so that we can both begin to preserve it and so that we can also confirm that that content has been preserved and is still there in an unchanged form at a point in the future. In most cases it’s likely that all that needs to be preserved is the basic text and images in the page and their general layout. If this is all then migration techniques may well be appropriate if the browser ever becomes unable to render the text and images (though I agree with you that that doesn’t seem necessary yet or likely to be necessary in a hurry). However there are two difficulties with this scenario:
- There will be many cases where the content includes interactive components and/or things that include software dependencies.
- When you don’t know, or can’t affordably identify the content to be preserved, preserving as much as possible, cheaply, is your best option.
(A) means that you will require some solution that involved preserving the software’s functionality, and I believe that (B) means you should use an emulation based technique to preserve the content.
Emulation based techniques are highly scalable (across many pieces of digital content) and so benefit from economies of scale. Emulation strategies and tools, once fully realised, I believe will provide a cheaper option when you factor in the cost of confirming the preservation of the content.
It’s a bit like the global warming problem. Most products and services do not include the carbon cost in them. If they did they would likely be much more expensive. Well I believe digital preservation solutions are similar: if you factor in the costs of confirming/verifying the preservation of the content you are trying to preserve, then many solutions are likely to be prohibitively expensive as they will require manual intervention at the individual object level. Emulation solutions, on the other hand, can be verified at the environment level and applied across many objects, greatly reducing costs.
So as I see it, it is not about format obsolescence, it is about (a weak form of) software obsolescence and preservation of content that can’t be separated from software.
In your post you seemed to be suggesting something similar, that content needed to be preserved that was heavily reliant upon browsers and server based applications. You also discussed a number of approaches including some that involved creating and maintain virtual machines, and followed that with the statement that: “the most important thing going forward will be to deploy a variety of approaches”. I took that to mean you had softened a little in your attitude towards using emulation to preserve content over time<a« «.
Sorry, I seem to have misunderstood.