Please sail through the topics below and pick one or more to send us your comments!
MYTH: “A good scanning job must have zero missed pages”.
FACT: WYGIWYP (What You Get Is What You Paid). 100% perfection requires
negotiated 100% QC and more. If the scanning was performed using
automatic feeder mechanisms, some pages may be missed no matter how
sophisticated the equipment may be. Options available to reduce this
risk include: counting pages, visually matching originals against
captured images, video capturing the prepping phase, numbering/endorsing
scanned pages and more.
MYTH: “It is fine to name Microsoft Windows folders and files after
FACT: Not at all. However simple and popular, this approach has many
side effects, including: metadata frequently needs changes and
corrections that are risky to effect on folder and file names; special
characters and symbols used in metadata are considered illegal by
Microsoft Windows, if not today they may be tomorrow; duplicate names
are not allowed, forcing to bastardize names with meaningless suffixes;
IT security may not even allow folders or file names to be changed;
establishing security to control exposure of sensitive data is
cumbersome or impossible in Windows folders and names; and the list goes
on and on…
MYTH: “After a thermonuclear war, when no electricity is available, we
can still rebuild using records on microfilm or microfiche…all we need
is a bit of sunlight and a broken glass to view it”.
FACT: Good luck rebuilding without electricity. Good luck finding the
roll or fiche. Good luck sharing the fiche or film with all involved.
Good luck viewing the scratched or melted frames.
MYTH: “We should fund and complete our ECM first, then we can discuss
budgets for the backfile conversion”.
FACT: Your users are being asked to let go of paper and start depending
on computers and software. Still, you start with an empty database. You
need a critical mass of clean, well indexed documents in the system when
you launch your new ECM to manage the human resistance to change.
MYTH: “Before we start scanning, we need to remove all duplicates and
FACT: It is much safer and often cheaper to do that digitally, i.e. scan
all no matter what and then use smart technology to deduplicate and
remove junk. Errors and omissions in selective capture are irreversible,
while a digital “soft delete” is not.
MYTH: “OCR can compensate for poor or inadequate indexing”.
FACT: OCR over promises and under delivers. Still, can be extremely
useful, but never as a substitute for good structured indexing or
MYTH: “Records Management solutions belong to the IT domain”
FACT: Not quite. RIM (Records and Information Management) requires
knowledge and expertise that often collides with IT management culture,
priorities and interests.
MYTH: Blank back pages from duplex paper scanning should not be
FACT: Quite the contrary. Duplex scanning produces back pages that
require extra effort and human oversight to safely remove. If the job
involves near perfect paper, then this extra charge may be waived.
However, hole punches, bleed-through, smudges, background colors,
dust, broken edges, etc. can make blank page detection not trivial,
risking losing valuable data and/or coping with lots of unnecessary
MYTH: Blank frames from duplex microfilm scanning should not be
FACT: True only if 100% of the back frames are known to be blank,
which is not unusual. Otherwise hole punches, bleed-through, smudges,
background colors, dust, broken edges, etc. can make blank page
detection not trivial, risking losing valuable data and/or coping with
lots of unnecessary semi-blank images.
MYTH: The indexing component makes digitizing large collections of
microfilm/microfiche an expensive proposition.
FACT: Not necessarily. Most of an overwhelming number of advantages of
digitizing film and/or fiche will be accomplished before a dime is
spent on indexing. Misinformation and greed often contribute to
wasteful spending on profusely indexing the digitized collection. If
the user controls his greed and settles for a digital imitation of the
unwieldy process used to search when was still on film, the indexing
cost could be negligible. This approach can also be wisely used as a
stepping stone towards a more ambitious data capture effort.
MYTH: Indexing a collection of documents must include all possible
fields that could eventually be used for searching.
FACT: Blatantly against best practices. Granted, searches may require
a vast diversity of data fields, but these should not necessarily be
captured during the indexing phase of a collection being digitized. A
common mistake is to populate indexing fields with data coming from
one or more lookup datasets. This proliferation of redundant data
contradicts best practices such as database normalization and more. A
best practices implementation would be to capture from the collection
only index data that uniquely identifies each record and link these to
the corporate datasets. Data mining, not indexing, should be funded by
separate budgets, possibly in other cost centers.
MYTH: Bitonal (black and white) images are smaller, faster and
FACT: No longer accurate, as today’s scanning and storage technology
makes color and grayscale images much safer, affordable, better and
more pleasant to work with.
MYTH: The micrographic industry is in demise.
FACT: Not quite, although it is not a growth market. It is a narrow
niche in a slowly shrinking market share. The concept of Microfilming
relates to both the _media_ and the _capture_ process. Micrographic
_media_ appeals to people into historical preservation and with
concerns about technological evolution. Cost-effectiveness, longevity
and quality are no longer exclusive to microfilm, as its digital
counterpart enjoys very good practical answers to all such challenges.
Micrographic _capture,_ praised by the benefits of its planetary
nature, has now been replaced by overhead and transport based digital
scanners. Digital capture and digital repositories have prevailed, in
part thanks to PDF/A, born-digital ingestion, instant QC, instant
gratification, automation, risk and cost reductions and more.
MYTH: Estimating volumes of pages for budgetary purposes is simple.
FACT: Not quite. If done by service provider, overestimations may cost them the job, while underestimations may create trust problems and project disruptions. If done by client and proven wrong, it may also create trust issues such as …“you (provider) are supposed to be the expert, why didn’t you warn me?”. The following table shows useful metrics. However, to take this effort seriously, someone must decide on sample diversity and sizes, count carefully and extrapolate. Double sided pages are not trivial to account for, as it requires that the samples used must be representative of the entire collection.
Unbound Paper: 150-175 Pages per Inch
Standard File Box : 2,000 – 2,500 Pages per Box
Banker Style Boxes: 4,500 – 5,000 Pages per Box
Vertical File Cabinet : 3,500 – 4,000 Pages per Drawer
Slides, Photos: 100 – 120 Photos Per Inch
Open Shelving: 1,500 Pages Per Horizontal Foot
Lateral File Cabinet: 5,000 to 6,000 Pages per Drawer
Drawings (Flat, unfolded): 125 – 150 Sheets per Inch
16 mm microfilm roll: 2,000 to 3,000 frames per roll
35 mm microfilm roll: 200 to 400 frames per roll
Microfiche: naked 150 per inch; on envelopes: 80 per inch.