Myth: “A good scanning job must have zero missed pages”.
Fact: WYGIWYP (What You Get Is What You Paid). 100% perfection requires negotiated 100% QC and more. If the scanning was performed using automatic feeder mechanisms, some pages may be missed no matter how sophisticated the equipment may be. Options available to reduce this risk include: counting pages, visually matching originals against captured images, video capturing the prepping phase, numbering/endorsing scanned pages and more.
Myth: “It is fine to name Microsoft Windows folders and files after metadata”.
Fact: Not at all. However simple and popular, this approach has many side effects, including: metadata frequently needs changes and corrections that are risky to effect on folder and file names; special characters and symbols used in metadata are considered illegal by Microsoft Windows, if not today they may be tomorrow; duplicate names are not allowed, forcing to bastardize names with meaningless suffixes; IT security may not even allow folders or file names to be changed; establishing security to control exposure of sensitive data is cumbersome or impossible in Windows folders and names; and the list goes on and on…
Myth: “After a thermonuclear war, when no electricity is available, we can still rebuild using records on microfilm or microfiche…all we need is a bit of sunlight and a broken glass to view it”.
Fact: Good luck rebuilding without electricity. Good luck finding the roll or fiche. Good luck sharing the fiche or film with all involved. Good luck viewing the scratched or melted frames.
Myth: “We should fund and complete our ECM first, then we can discuss budgets for the backfile conversion”.
Fact: Your users are being asked to let go of paper and start depending on computers and software. Still, you start with an empty database. You need a critical mass of clean, well indexed documents in the system when you launch your new ECM to manage the human resistance to change.
Myth: “Before we start scanning, we need to remove all duplicates and junk”.
Fact: It is much safer and often cheaper to do that digitally, i.e. scan all no matter what and then use smart technology to deduplicate and remove junk. Errors and omissions in selective capture are irreversible, while a digital “soft delete” is not.
Myth: “OCR can compensate for poor or inadequate indexing”.
Fact: OCR overpromises and underdelivers. Still, can be extremely useful, but never as a substitute for good structured indexing or classification.
Myth: “Records Management solutions belong to the IT domain”
Fact: Not quite. RIM (Records and Information Management) requires knowledge and expertise that often collides with IT management culture, priorities and interests.
Myth: Blank back pages from duplex paper scanning should not be billable.
Fact: Quite the contrary. Duplex scanning produces back pages that require extra effort and human oversight to safely remove. If the job involves near perfect paper, then this extra charge may be waived. However, hole punches, bleed-through, smudges, background colors, dust, broken edges, etc. can make blank page detection not trivial, risking losing valuable data and/or coping with lots of unnecessary semi-blank images.
Myth: Blank frames from duplex microfilm scanning should not be billable.
Fact: True only if 100% of the back frames are known to be blank, which is not unusual. Otherwise hole punches, bleed-through, smudges, background colors, dust, broken edges, etc. can make blank page detection not trivial, risking losing valuable data and/or coping with lots of unnecessary semi-blank images.
Myth: The indexing component makes digitizing large collections of microfilm/microfiche an expensive proposition.
Fact: Not necessarily. Most of an overwhelming number of advantages of digitizing film and/or fiche will be accomplished before a dime is spent on indexing. Misinformation and greed often contribute to wasteful spending on profusely indexing the digitized collection. If the user controls his greed and settles for a digital imitation of the unwieldy process used to search when was still on film, the indexing cost could be negligible. This approach can also be wisely used as a stepping stone towards a more ambitious data capture effort.
Myth: Indexing a collection of documents must include all possible fields that could eventually be used for searching.
Fact: Blatantly against best practices. Granted, searches may require a vast diversity of data fields, but these should not necessarily be captured during the indexing phase of a collection being digitized. A common mistake is to populate indexing fields with data coming from one or more lookup datasets. This proliferation of redundant data contradicts best practices such as database normalization and more. A best practices implementation would be to capture from the collection only index data that uniquely identifies each record and link these to the corporate datasets. Data mining, not indexing, should be funded by separate budgets, possibly in other cost centers.
Myth: Bitonal (black and white) images are smaller, faster and cheaper.
Fact: No longer accurate, as today’s scanning and storage technology makes color and grayscale images much safer, affordable, better and more pleasant to work with.
Myth: The micrographic industry is in demise.
Fact: Not quite, although it is not a growth market. It is a narrow niche in a slowly shrinking market share. The concept of Microfilming relates to both the media and the capture process. Micrographic media appeals to people into historical preservation and with concerns about technological evolution. Cost-effectiveness, longevity and quality are no longer exclusive to microfilm, as its digital counterpart enjoys very good practical answers to all such challenges. Micrographic capture, praised by the benefits of its planetary nature, has now been replaced by overhead and transport based digital scanners. Digital capture and digital repositories have prevailed, in part thanks to PDF/A, born-digital ingestion, instant QC, instant gratification, automation, risk and cost reductions and more.
Myth: Estimating volumes of pages for budgetary purposes is simple.
Fact: Not quite. If done by service provider, overestimations may cost them the job, while underestimations may create trust problems and project disruptions. If done by client and proven wrong, it may also create trust issues such as …“you (provider) are supposed to be the expert, why didn’t you warn me?”. The following table shows useful metrics. However, to take this effort seriously, someone must decide on sample diversity and sizes, count carefully and extrapolate. Double sided pages are not trivial to account for, as it requires that the samples used must be representative of the entire collection