3003 SW 153rd Drive - #208 | Beaverton, OR 97006 | Phone: 503-574-4542 Fax: 503-619-0021



Many organizations are faced with soaring storage costs and may be looking to invest in an electronic solution to records management. Scan-it, Inc. believes that even basic information can be useful in moving from paper based records management. You may be familiar with some of the definitions we are listing, and hope that this list of definitions is helpful in understanding some of the terms used in document imaging.



ARCHIVING - Storing documents in a way that they can be retrieved as the need arises. Archiving involves a coherent strategy, a good quality assurance plan, and well documented procedures.

ASP (Application Service Provider) - A company that provides software services over the internet. Many ASPs buy software and license its use to clients. In the document management area, an ASP can maintain the document database for a client. The advantage for the client is that they can hire expertise on a part-time basis.

BACKFILE CONVERSION - The job of scanning preexisting paper archives to a digital form. This is often the most expensive and time-consuming part of an attempt to adopt electronic document storage. Issues include who will do the scanning and verification (either in-house or from an outside service provider), how far back the conversion process should go, and transition strategies.

BARCODE - A system of displaying data in a series of lines of varying widths so it can be read by a laser. A barcode is often used to encode indexing information.

BARCODE RECOGNITION - The ability to recognize and interpret barcodes is a feature of many document scanners. Barcode recognition can facilitate the indexing of scanned documents.

BATCH PROCESSING - The ability to process a number of paper documents into digital form with minimal operator intervention. This is a feature found on most document scanners. It is less common on standard desktop scanners.

CD (Compact Disc) - An optical disc format invented by Sony and Philips Electronics in the early 1980s. A CD 12 centimeters in width and can hold up to either 572MB or 741MB, depending on formatting. CDs are standard for the distribution of music, where they are also termed CD-DAs (Digital Audio). There are several derivatives from the original CD, described below.

CD-R (CD-Recordable) - This technology allows you to record CDs from the desktop. This is a "one-off" process, as opposed to multiple CDs distributed by software and music companies, but it acts the same. To produce a CD-R, you need a special drive and formatting software. You can only write a CD-R once, and you have a choice of two sizes. CD-R drives (which can also read CD-ROMs and audio CDs) are sometimes standard with new computers, through DVD-R drives have become more common.

CD-ROM (CD Read Only Memory) - The type of CD used to publish data. CD-ROMs come with data encoded on them. They cannot be written using a PC and are produced instead at a factory where they are "hot-pressed."

CD-RW (CD-Rewriteable) - A form of CD that can be written to many times. In other words, you can erase tracks by writing over them. A CD-RW drive can also write directly to CD-R discs in write-once mode, and it can read other types of CDs. In the last few years, CD-RW has overtaken CD-R in popularity and has narrowed the price gap.

DATA PREP - manual preparation of documents for scanning, including unfolding crinkled paper, removal of staples, and repairing of tears.

DAY-FORWARD CONVERSION - Starting an imaging effort by capturing data from the current date on, as opposed to backfile conversion.

DESKEW - Deskewing is a feature that slightly rotates a scanned image to make it perfectly straight. This compensates for slight offsets caused by the document- feeding process.

DESPECKLE - Despeckling is similar to noise removal, that is, eliminating specks and other minor imperfections from the original during the scanning process.

DIRECT PRINTING - The ability to print a file saved in a format such as TIFF or PDF by outputting it directly to the printer without having to be converted on your computer. A growing number of laser printers are offering these capabilities.

DOCUMENT MANAGEMENT - The handling of documents in electronic format. Includes scanning, indexing, archiving, and retrieving. Document management can be used as a synonym for imaging.

DOUBLE FEED - When two pages are fed at once through a document feeder. Some scanner vendors offer features that detect when this happens, as well as ways of validating that all document pages have been captured. This sometimes involves a postscan endorser.

DPI - (dots per inch) - See resolution

DUPLEX - The ability to scan both sides of an original in one pass. Duplex scanners have a second scanner array, so they cost more. However, they can have a major impact on throughput. In many scanners, duplexing runs twice as fast as simplex scanning.

DVD (Digital Video Disc or Digital Versatile Disc) - An optical disc format created by an industry consortium in 1995. The original concept was to have an optical disc with sufficient capacity for recording a motion picture. DVDs come now in four formats, ranging from 4.7GB to 17GB.

DVD-R - A DVD format that allows you to write once on a disc, without the ability to change it later.

DVD-RAM - A DVD technology that allows you to write and erase many times on the same disc.

DVD-ROM - A preformatted (pressed) DVD, such as distributed by movie studios.

DVD-RW - An erasable DVD format that competes with DVD-RAM.

DVD-RW - A writeable DVD format used mostly for multimedia.

DYNAMIC THRESHOLDING - An image enhancement feature used to increase contrast between items on a scanned page, which helps to make text and other objects stand out.

ENDORSER - Scanner accessory that automatically prints an ID number on documents as they are scanned. The endorser is a miniature ink jet printer. It can print either prescan, so that indexing information (for example, sequence number) can be scanned in, or postscan, to validate that a record has been scanned. Endorsers may print on the front of back of a page, or both.

FILE COMPRESSION - Technology used to reduce the size of scanned images before storing them. Compression works by taking our redundant scan information, such as sections of white space. PDF files are automatically compressed, and JPEG files are compressed even further.

GRAYSCALE - The number of graduations of black in any scanned image. Normally, grayscale is divided into 256 levels. Each pixel is measured by the scanner and assigned a number from 0 to 255 depending on the darkness of the shading.

HTML (Hypertext Markup Language) - This is the standard programming language used to describe Web pages. HTML is a set of editable commands that place the elements on the page and describe their characteristics.

IMAGE CAPTURE - Scanning documents to convert them from hard copy paper to digital form.

IMAGE ENHANCEMENT - A technique for cleaning up a scanned image so that it is visually clearer than the original. Image enhancement tools include noise removal, dynamic thresholding, and smoothing.

IMAGING - With reference to document management in electronic formats, including scanning, archiving, retrieving, and indexing. Imaging in this sense, is a synonym for document management.

INDEXING - Indexing involves recording information about each scanned document in a manner so that it can be easily found later. Indexing can be manual (very time consuming), or automatic (often less satisfactory). Some indexing can be done by barcodes or imprinted sequence numbers from, say an endorser.

INTRANET - A "local" Internet established within a corporation and perhaps also with important customers and/or suppliers. An intranet uses Internet technology for exchanging and sending information, including captured documents.

JPEG (Joint Photographic Experts Group) - A commonly used color graphics compression format. JPEG allows users to maintain all or most of the detail of a color original while reducing the file size dramatically. Typically, users reduce images to one-quarter the size in megabytes of the original file with no apparent loss of detail.

OCR (Optical Character Recognition) - OCR is the ability to recognize and translate printed or written text that has been scanned to a computer. The computer converts the scanned input into a digital text file. Normally this is a software operation performed after the scan. Success varies, depending on the quality of the original and the level of the software.

OUTSOURCING - Hiring another company to do your imaging project. For example, using an Outsourcing Bureau to do a backfile conversion.

PDF (Portable Document Format) - File format that allows users to capture all the visual elements of a document as an electronic image that you can view, print, or archive on any computer. PDF was created by Adobe Systems, which supplies tools including a free program (Acrobat Reader) for reading the files. The advantage of PDF is that files are small, yet retain the appearance of the original image. Plus they are platform-independent.

PIXEL - A pixel (from "picture element") is the smallest discrete area of any color or gray on a monitor, printer, or a scanned image. The physical size of a pixel depends on the resolution of the device. A computer deals with an image file as an array of pixels. Resolution is often measured in terms of dots or pixels per inch.

RAID (Redundant Array of Independent or Inexpensive - Disks) - A system for combining three or more hard disks to improve performance and make it easier to recover from hardware problems thanks to data redundancy. RAID has become the preferred method of setting up hard disk storage of data.

SCSI (Small Computer System Interface) - A computer-industry interface standard for transmission of data between a computer and a peripheral, such as a scanner. Currently, various versions of SCSI-2, which are called Fast, Wide, and Fast-and-Wide, are most common. A newer standard, called Ultra SCSI, can support up to 15 devices.

SECURITY - A hot issue in document imaging. Having documents available to your employees makes them more available to hackers and other prying eyes. Security issues involve better access security, encryption, and storage reliability.

SIMPLEX - The ability to scan only one side of a document at a time. If you have two sided documents, you'll have to turn the document over and scan it a second time. Simplex scanners are generally less expensive than duplex ones.

SKEW - The result when a page is misfed into the scanner, so an image is produced that is not square with the page. See also deskewing.

TIFF (Tagged Image File Format) - TIFF is a file format that describes all images as an array of pixels with numeric values associated with each of them. TIFF files can be color, grayscale, or bitonal. The number of pixels in the file depends on the resolution of the scan and the size of the image. TIFF is the standard format for saving image files on computers, though PDF is gaining fast.

TWAIN - TWAIN is an industry software/hardware interface that comes standard with most computers and most scanners. It lets users scan directly from applications (for example, page layout or OCR programs), This allows the user to import a scan directly without having to open a special scanning program. While standard with all document scanners, this is not a major feature for document scanning. The ISIS is more important these days. DOCUWARE has developed ISIS PRO to perform this task in conjunction with the DOCUWARE product. Rumor has it that the acronym stands for: Technology Without An Interesting Name.

XML (Extensive Markup Language) - A computer language that is used to describe both pages and data structures. It is similar to HTML (see HTML) that is used to describe Internet pages, but it goes much further. The language is extensible in that users can define new data types. This is becoming an important method for programming forms and databases for efficient data management over the Internet.