We won’t repeat it often enough, safeguarding and archiving are not the same thing! It may seem absurd to compare the archiving with legal value that must be carried out by a company on accounting documents, with the safeguarding of its family photos and videos. However, my experience, professionally in the first area, and personally in the second, has taught me some lessons on certain points of convergence, which I would like to share in a series of articles.
Archive important documents… with fewer resources!
In the same way that a company has documents with legal value which are legally more important than others, and the loss of which can have economic consequences, each individual has documents that they consider important:
- Close to a company, for administrative purposes (tax slips, diplomas, pay slips)
- But also emotionally: photos and videos. To patiently collect family photos over the years to pass them on to one’s descendants, and then suddenly see them disappear due to a hard drive crash, would be a disaster for many.
It is also interesting to note that often, within the framework of family structures, a person in the household takes care not only of their own documents, but also of those of their family: there is therefore a delegation of responsibility, much like a company with its archiving service!
These reasons should lead everyone to take a moment to set up a coherent backup “strategy”, like an archiving strategy, and think about the maximum budget to commit to it. 5 € / month? 20 € / month? Depending on this budget, the possible choices will not be the same and concessions will have to be made.
Before archiving, you have to sort!
The first step is the choice of documents to safeguard:
- First of all, their nature. We think primarily of our photos and videos, administrative documents and office files, but we sometimes forget elements that we will be happy to find later: emails, SMS, browser favorites, voice messages from loved ones, paper memories. A piece of advice, write down in black and white the list of all the categories you want to keep. This list will then be used to track their backup status.
- Next, the actual selection of the documents concerned. Keeping “too many” documents has many disadvantages, which we do not necessarily think of. The most obvious is the large volume which will affect the storage price and the backup time, not to mention the ecological cost! But we can also notice that too large a mass of preserved documents will “drown out” the important documents (those that you would like your descendants to notice, for example) among the others.
But what criteria can guide us to sort out documents? We can cite:
- The document is not personal and has a public copy online (user manual, cooking recipe, video…). For photos, for example, do you really think that this photo of the Eiffel Tower or a hill in the Ardèche will interest your grandchildren? Be careful though, an online document may not stay online forever. If you believe the value of the document is critical to you, saving may be warranted.
- The document is a draft or a working copy of another. Keep only the final version!
- The document is of poor quality (slightly blurred photo, low resolution video). It is an elimination criterion. This article https://digital-photography-school.com/taking-out-the-garbage-7-tips-for-choosing-your-best-photos-fast/ is very informative on photo selection.
Storage choice, local or cloud
If someone wants to imagine more precisely the advantage of the number of copies, a simple mathematical calculation is sufficient to reassure oneself (see for example “Probability and Statistics for Computer Scientists”, Michael Baron https://tinyurl.com/ye23tt6b : if a hard disk has a risk of data loss of 1%, and it has two copies each with a risk of loss of 2%, the probability that the data will not be lost is 1-0.010.20.02, i.e. 99.9996% as opposed to “only” 99% with a single hard disk).
Another subject of convergence is the choice of the number and nature of data copies, which are one of the major subjects when developing a storage policy.
The best-known rule is the “3-2-1” rule formulated by the photographer Peter Krogh: 3 copies on 2 different media including 1 “off-site”. The 3 copies including the original version, such as photos taken on a smartphone.
Offsite is crucial. Indeed, we know that a copy, if it is kept on the same physical site as the original data, is not very safe: a fire, a flood… and your backup will be of little use. Having a copy on a hard disk or a USB key regularly brought to another person in their family, for example, can be a good solution (provided that this person stores it in good conditions!).
Another solution that seems ideal to us is to have cloud/physical heterogeneity. A major flaw in your cloud provider? Your disk copy remains intact. Conversely, your laptop is stolen? No problem, the cloud takes over.
If you wish to represent the benefit of the number of copies more precisely, a simple mathematical calculation is enough to reassure you (see for example “Probability and Statistics for Computer Scientists”, Michael Baron https://tinyurl.com/ye23tt6b: if a disk hard has a 1% risk of data loss, and it has two copies each at 2% loss, the probability of data not being lost is 1-0.01*0.0.2*0.02 or 99.9996 % versus “only” 99% with a single hard drive).
Pay attention to integrity!
At the heart of archiving is the theme of the integrity of its data, that is to say their conservation in their exact characteristics of origin, to the nearest byte.
To this end, choosing a digital safe to send your documents to is a good idea. There, no concern for integrity: the supplier guarantees the integrity of the files. On the other hand, let’s be realistic: using a digital safe to store hundreds of gigabytes of photos and videos is in 2023, unless you are very wealthy, a solution that is difficult to envisage for an individual.
Cloud storage providers sometimes offer such comfortable sorting and manipulation tools that you may be tempted to use them as a primary repository for your data. If we take the example of photos, Google Photos is thus becoming an increasingly powerful tool for manipulating them, sorting them, classifying them in albums… Other providers are more “basic” and simply provide inexpensive cloud storage (AWS…).
But these cloud spaces are not safes: they provide no guarantee on the degradation of files. With photos, we often know that by default, your Android phone will send “degraded” photos to your Google Photos space, so as not to consume too much space. Google cryptically advises on its page “This option is recommended for photos larger than 16MP and videos larger than 1080p.” Which smartphone in 2023, produces photos of less than 16 Mpx?…
To come back to the subject of the cost, it is clear that apart from offering variable prices according to the volume, the suppliers do not particularly help the individual to project the price of their subscription according to his activity and his equipment (a change of smartphone which can generate a much higher resolution of photos and therefore an explosion of the cost!).
Reversibility: not imprisoning yourself with a supplier
Whether using a digital safe or simple cloud storage, the crucial question of reversibility arises, especially if we want to apply the previously given principle of physical backup on a hard disk: can you recover your cloud data? with a single click? It may seem obvious, but providers seem to be deliberately putting a spoke in the wheel for this (no doubt to “lock” their customers into their system). Before choosing a supplier, you should always inspect the reversibility measures it implements.
Thus, to continue the example of Google, its “massive recovery” tool Google Takeout is, of public notoriety (https://tinyurl.com/3b624txn), very impractical to use (and we will quickly resolve to a full backup; having a very high speed connection is highly recommended!). Nevertheless, it can be emphasized that the recovery format is simple and usable (folders with metadata files and “raw” data).
To be continued…
In the next article, we will look at other essential aspects that must also be taken into account: storage migration, retention period, the importance of metadata and the durability of formats.
Do not hesitate in the meantime to give your opinion on the suggestions presented in this article!
Mikaël Mechoulam