Please sail through the topics below and pick one or more to send us your comments!
MYTH: “To improve searching of documents we must use many index fields”.
FACT: Not quite. The fewer index fields the better, but smartly. Focus on true index fields that individually or collectively represent the identity of a document. Avoid: a) fields that can be derived from true index fields via lookups or screen scraping; b) fields needed just to narrow down a large number of pages of a retrieved document; c) fields that may be handled using brute force visual browsing; d) fields that can reliably find documents using Full Text Search.
MYTH: “A good scanning job must yield zero missed pages”.
FACT: WYGIWYP (What You Get Is What You Paid). 100% perfection requires negotiated 100% QC and more. If the scanning was performed using automatic feeder mechanisms, some pages may be missed no matter how
sophisticated the equipment may be. Options available to reduce this risk include: counting pages, visually matching originals against captured images, video capturing the prepping phase, numbering/endorsing
scanned pages and more.
MYTH: “It is fine to name Microsoft Windows folders and files after metadata”.
FACT: Not at all. However simple and popular, this approach has many side effects, including: metadata frequently needs changes and corrections that are risky to effect on folder and file names; special characters and symbols used in metadata are considered illegal by Microsoft Windows, if not today they may be tomorrow; duplicate names are not allowed, forcing to bastardize names with meaningless suffixes; IT security may not even allow folders or file names to be changed;
establishing security to control exposure of sensitive data is cumbersome or impossible in Windows folders and names; and the list goes on and on…
MYTH: “After a thermonuclear war, when no electricity is available, we can still rebuild using records on microfilm or microfiche…all we need is a bit of sunlight and a broken glass to view it”.
FACT: Good luck rebuilding without electricity. Good luck finding the roll or fiche. Good luck sharing the fiche or film with all involved. Good luck viewing the scratched or melted frames.
MYTH: “We should fund and complete our ECM first, then we can discuss budgets for a backfile conversion”.
FACT: Your users are being asked to let go of paper and start depending on computers and software. Still, you start with an empty database. You need a critical mass of clean, well indexed documents in the system when
you launch your new ECM to manage the human resistance to change.
MYTH: “Before we start scanning, we need to remove all duplicates and junk”.
FACT: It is much safer and often cheaper to do that digitally, i.e. scan all no matter what and then use smart technology to deduplicate and remove junk. Errors and omissions in selective capture are irreversible, while a digital “soft delete” is not.
MYTH: “OCR can compensate for poor or inadequate indexing”.
FACT: OCR over promises and under delivers. Still, can be extremely useful, but never as a substitute for good structured indexing or classification.
MYTH: “Records Management solutions belong to the IT domain”
FACT: Not quite. RIM (Records and Information Management) requires knowledge and expertise that often collides with IT management culture, priorities and interests.
MYTH: Blank back pages from duplex paper scanning should not be billable.
FACT: Quite the contrary. Duplex scanning produces back pages that require extra effort and human oversight to safely remove. If the job involves near perfect paper, then this extra charge may be waived.
However, hole punches, bleed-through, smudges, background colors, dust, broken edges, etc. can make blank page detection not trivial, risking losing valuable data and/or coping with lots of unnecessary
MYTH: Blank frames from duplex microfilm scanning should not be billable.
FACT: True only if 100% of the back frames are known to be blank, which is not unusual. Otherwise hole punches, bleed-through, smudges, background colors, dust, broken edges, etc. can make blank page detection not trivial, risking losing valuable data and/or coping with
lots of unnecessary semi-blank images.
MYTH: The indexing component makes digitizing large collections of microfilm/microfiche an expensive proposition.
FACT: Not necessarily. Most of an overwhelming number of advantages of digitizing film and/or fiche will be accomplished before a dime is spent on indexing. Misinformation and greed often contribute to wasteful spending on profusely indexing the digitized collection. If the user controls his greed and settles for a digital imitation of the unwieldy process used to search when was still on film, the indexing cost could be negligible. This approach can also be wisely used as a stepping stone towards a more ambitious data capture effort.
MYTH: Indexing a collection of documents must include all possible fields that could eventually be used for searching.
FACT: Blatantly against best practices. Granted, searches may require a vast diversity of data fields, but these should not necessarily be captured during the indexing phase of a collection being digitized. A common mistake is to populate indexing fields with data coming from
one or more lookup datasets. This proliferation of redundant data contradicts best practices such as database normalization and more. A best practices implementation would be to capture from the collection
only index data that uniquely identifies each record and link these to the corporate datasets. Data mining, not indexing, should be funded by separate budgets, possibly in other cost centers.
MYTH: Bitonal (black and white) images are smaller, faster and cheaper.
FACT: No longer accurate, as today’s scanning and storage technology makes color and grayscale images much safer, affordable, better and more pleasant to work with.
MYTH: The micrographic industry is in demise.
FACT: Not quite, although it is not a growth market. It is a narrow niche in a slowly shrinking market share. The concept of Microfilming relates to both the _media_ and the _capture_ process. Micrographic _media_ appeals to people into historical preservation and with concerns about technological evolution. Cost-effectiveness, longevity and quality are no longer exclusive to microfilm, as its digital counterpart enjoys very good practical answers to all such challenges. Micrographic _capture,_ praised by the benefits of its planetary nature, has now been replaced by overhead and transport based digital scanners. Digital capture and digital repositories have prevailed, in part thanks to PDF/A, born-digital ingestion, instant QC, instant gratification, automation, risk and cost reductions and more.
MYTH: Estimating volumes of pages for budgetary purposes is simple.
FACT: Not quite. If done by service provider, overestimations may cost them the job, while underestimations may create trust problems and project disruptions. If done by client and proven wrong, it may also create trust issues such as …“you (provider) are supposed to be the expert, why didn’t you warn me?”. The following table shows useful metrics. However, to take this effort seriously, someone must decide on sample diversity and sizes, count carefully and extrapolate. Double sided pages are not trivial to account for, as it requires that the samples used must be representative of the entire collection.
Unbound Paper: 150-175 Pages per Inch
Standard File Box : 2,000 – 2,500 Pages per Box
Banker Style Boxes: 4,500 – 5,000 Pages per Box
Vertical File Cabinet : 3,500 – 4,000 Pages per Drawer
Slides, Photos: 100 – 120 Photos Per Inch
Open Shelving: 1,500 Pages Per Horizontal Foot
Lateral File Cabinet: 5,000 to 6,000 Pages per Drawer
Drawings (Flat, unfolded): 125 – 150 Sheets per Inch
16 mm microfilm roll: 2,000 to 3,000 frames per roll
35 mm microfilm roll: 200 to 400 frames per roll
Microfiche: naked 150 per inch; on envelopes: 80 per inch.