Friday, August 19, 2011

SCANNERS


Digital imaging has come of age. Equipment that was once reserved for the wealthiest bureaux is now commonplace on the desktop. The powerful PCs required to manipulate digital images are now considered entry level, so it comes as no surprise to learn that scanners, the devices used to get images into a PC, are one of the fastest growing markets today.

At its most basic level, a scanner is just another input device, much like a keyboard or mouse, except that it takes its input in graphical form. These images could be photographs for retouching, correction or use in DTP. They could be hand-drawn logos required for document letterheads. They could even be pages of text which suitable software could read and save as an editable text file.

The list of scanner applications is almost endless, and has resulted in products evolving to meet specialist requirements:

Ø          high-end drum scanners, capable of scanning both reflective art and transparencies, from 35mm slides to 16-foot x 20in material at high (10,000dpi+) resolutions

Ø      compact document scanners, designed exclusively for OCR and document management

Ø      dedicated photo scanners, which work by moving a photo over a stationary light source

Ø      slide/transparency scanners, which work by passing light through an image rather than reflecting light off it

Ø      Handheld scanners, for the budget end of the market or for those with little desk space.

However, flatbed scanners are the most versatile and popular format. These are capable of capturing color pictures, documents, pages from books and magazines, and, with the right attachments, even scan transparent photographic film.

Operation

On the simplest level, a scanner is a device, which converts light (which we see when we look at something) into 0s and 1s (a computer-readable format). In other word, scanners convert analogue data into digital data.


All scanners work on the same principle of reflectance or transmission. The image is placed before the carriage, consisting of a light source and sensor; in the case of a digital camera, the light source could be the sun or artificial lights. When desktop scanners were first introduced, many manufacturers used fluorescent bulbs as light sources. While good enough for many purposes, fluorescent bulbs have two distinct weaknesses: they rarely emit consistent white light for long, and while they're on they emit heat which can distort the other optical components. For these reasons, most manufacturers have moved to 'cold-cathode' bulbs. These differ from standard fluorescent bulbs in that they have no filament. They therefore operate at much lower temperatures and, as a consequence, are more reliable. Standard fluorescent bulbs are now found primarily on low-cost units and older models.

By late 2000, Xenon bulbs had emerged as an alternative light source. Xenon produces a very stable, full-spectrum light source that's both long lasting and quick to initiate. However, xenon light sources do consume power at a higher rate than cold cathode tubes.


To direct light from the bulb to the sensors that read light values, CCD scanners use prisms, lenses, and other optical components. Like eyeglasses and magnifying glasses, these items can vary quite a bit in quality. A high-quality scanner will use high-quality glass optics that are color-corrected and coated for minimum diffusion. Lower-end models will typically skimp in this area, using plastic components to reduce costs.

The amount of light reflected by or transmitted through the image and picked up by the sensor, is then converted to a voltage proportional to the light intensity - the brighter the part of the image, the more light is reflected or transmitted, resulting in a higher voltage. This analogue-to-digital conversion (ADC) is a sensitive process, and one that is susceptible to electrical interference and noise in the system. In order to protect against image degradation, the best scanners on the market today use an electrically isolated analogue-to-digital converter that process data away from the main circuitry of the scanner. However, this introduces additional costs to the manufacturing process, so many low-end models include integrated analogue-to-digital converters that are built into the scanner's primary circuit board.

The sensor component itself is implemented using one of three different types of technology:

Ø          PMT (Photo Multiplier Tube), a technology inherited from the drum scanners of last year

Ø          CCD (Charge-coupled device), the type of sensor used in desktop scanners

Ø          CIS (Contact image sensor), a newer technology which integrates scanning functions into fewer components, allowing scanners to be more compact in size.

CCD

CCD technology is responsible for having made scanning a desktop application and has been in use for a number of years in devices such as fax machines and digital cameras. A charge-coupled device is a solid state electronic device that converts light into an electric charge. A desktop scanner sensor typically has thousands of CCD elements arranged in a long thin line. The scanner shines light through red, green and blue filters and the reflected light are directed into the CCD array via a system of mirrors and lenses. The CCD acts as a photometer, converting the measured reflectance into an analogue voltage, which can then be sampled and changed to discrete digital values by an analogue-to-digital converter (ADC).



CIS

CIS is a relatively new sensor technology which began to appear at the budget end of the flatbed scanner market in the late 1990s. CIS scanners employ dense banks of red, green and blue LEDs to produce white light and replace the mirrors and lenses of a CCD scanner with a single row of sensors placed extremely close to the source image. The result is a scanner that is thinner and lighter, more energy efficient and cheaper to manufacture than a traditional CCD-based device - but that is not, as yet, capable producing as good results.

The technology employed by its sensor mechanism is not, however, the only factor that governs a scanner level of performance. The following are equally important aspects of a given unit's specification:

Ø          resolution

Ø          bit depth

Ø          dynamic range.

Resolution


Resolution relates to the fineness of detail that a scanner can achieve, and is usually measures in dots per inch (dpi). The more dots per inch a scanner can resolve, the more detail the resulting image will have. The typical resolution of an inexpensive desktop scanner in the late 1990s was 300 x 300.

A typical flatbed scanner has a CCD element for each pixel, so for a desktop scanner claiming a horizontal optical resolution of 600dpi (dots per inch) - alternatively referred to as 600ppi (pixels per inch) - and a maximum document width of 8.5in there’ll be an array of 5,100 CCD elements in what’s known as the scan head.

The scan head is mounted on a transport, which is moved across the target object. Although the process may appear to be a continuous movement, the head moves a fraction of an inch at a time, taking a reading between each movement. In the case of a flatbed scanner, the head is driven by a stepper motor, a device, which turns a predefined amount, and no more, each time an electrical pulse is fed.

The number of physical elements in a CCD array determines the x-direction-sampling rate, and the number of stops per inch determines the y-direction-sampling rate. Although these are conveniently referred to as a scanner’s ‘resolution’, the term is not strictly accurate. The resolution is the scanner’s ability to determine detail in an object and is defined by the quality of electronics, optics, filters and motor control, as well as the sampling rate.

The actual scan head, though capable of reading a raster line 8.5in wide, will be much smaller than that, typically around 4in wide. The reflected light is presented to the scan head through a lens, and the quality of the optics can have a greater effect on the resolution of the scan than the sampling rate. High-resolution optics in a 400dpi scanner is likely to produce better results than a 600dpi device with poor optics.

By late 1998 the physical limit as to how many CCD elements could be placed side by side in one inch stood at 600. It is, however, possible for the apparent resolution to be increased using a technique known as interpolation, which under software or hardware control guesses intermediate values and inserts them between the real ones. Some scanners do this much more effectively than others.

Colour scanners


Colour scanners have three light sources, one for each of red, green and blue primary. Some scanning heads contain a single fluorescent tube with three filtered CCDs, while others have three coloured tubes and a single CCD. The former produce the entire colour image in a single pass, the target being illuminated by the three rapidly changing lights, while the latter have to go back-and-forth three times.

Single-pass scanners have problems with the stability of light levels when they’re being turned on and off rapidly. Older three-pass scanners used to suffer from registration problems along with being slow. More modern three-pass units are much improved and able to match some single-passers for speed. However, by the late 1990s most colour scanners were single-pass devices.

These scanners use one of two methods for reading light values: beam splitter or coated CCDs. When a beam splitter is used, light passes through a prism and separates into the three primary scanning colours, which are each read by a different CCD. This is generally considered the best way to process reflected light, but to bring down costs many manufacturers use three CCDs, each of which is coated with a film so that it reads only one of the primary scanning colours from an unsplit beam. While technically not as accurate, this second method usually produces results that are difficult to distinguish from those of a scanner with a beam splitter.

Bit-depth


When a scanner converts something into digital form, it looks at the image pixel by pixel and records what it sees. That part of the process is simple enough, but different scanners record different amounts of information about each pixel. How much information a given scanner records is measured by its bit-depth.

The simplest kind of scanner only records black and white, and is sometimes known as a 1-bit scanner because each bit can only express two values, on and off. In order to see the many tones in between black and white, a scanner needs to be at least 4-bit (for up to 16 tones) or 8-bit (for up to 256 tones). The higher the scanner's bit-depth, the more accurately it can describe what it sees when it looks at a given pixel. This, in turn, makes for a higher quality scan.

Most modern colour scanners are at least 24-bit, meaning that they collect 8 bits of information about each of the primary scanning colours: red, blue, and green. A 24-bit unit can theoretically capture over 16 million different colours, though in practice the number is usually quite smaller. This is near-photographic quality, and is therefore commonly referred to as 'true colour' scanning.

Recently, an increasing number of manufacturers are offering 30-bit and 36-bit scanners, which can theoretically capture billions of colours. The only problem is that very few graphics software packages can handle anything larger than a 24-bit scan, because of limitations in the design of personal computers. Still, those extra bits are worth having. When a software program opens a 30-bit or 36-bit image, it can use the extra data to correct for noise in the scanning process and other problems that hurt the quality of the scan. As a result, scanners with higher bit-depths tend to produce better colour images.

Dynamic range


Dynamic range is somewhat similar to bit-depth in that it measures how wide a range of tones the scanner can record. It is a function of the scanner's analogue-to-digital converter - along with the purity of the illuminating light and coloured filters and any system noise.

Dynamic range is measured on scale from 0.0 (perfect white) to 4.0 (perfect black), and the single number given for a particular scanner tells how much of that range the unit can distinguish. Most colour flatbeds have difficulty perceiving the subtle differences between the dark and light colours at either end of the range, and tend to have a dynamic range of about 2.4. That's fairly limited, but it's usually sufficient for projects where perfect colour isn't a concern. For greater dynamic range, the next step up is a top-quality colour flatbed scanner with extra bit-depth and improved optics. These high-end units are usually capable of a dynamic range between 2.8 and 3.2, and are well suited to more demanding tasks like standard colour pre press. For the ultimate in dynamic range, the only alternative is a drum scanner. These units frequently have a dynamic range of 3.0 to 3.8, and deliver all the colour quality one could ask of a desktop scanner. Although they are overkill for most projects, drum scanners do offer high quality in exchange for their high price.

In theory, a 24-bit scanner offers an 8-bit range (256 levels) for each primary colour - the difference between 256 levels is commonly accepted to be indiscernible to the human eye. Unfortunately, a few of the least significant bits are lost in noise, while any post-scanning tonal corrections reduce the range still further. That’s why it’s best to make any brightness and colour corrections in one go from the scanner driver before making the final scan itself. More expensive scanners with 30- or 36-bit depths have a much wider range to start with, offering better detail in the shadow and highlight areas, allowing you to make tonal corrections and still end up with a decent 24-bits at the end. A 30-bit scanner collects 10-bits of data each for the red, green and blue colour components while 36-bit scanners collect 12-bits for each. The scanner driver allows the operator to control which 24 of those 30 or 36 bits are kept and which ones are discarded - this adjustment being made by changing the Gamma Curve, accessed through the TWAIN driver's Tonal Adjustment control.

Scan resolution


Prior to scanning any image, it is necessary to determine what resolution to scan at. Since modern advertising has conditioned us to think that more is always better, it is not difficult to understand why many users have a tendency to scan at too high a resolution. The scan resolution should always be determined by the capability of the output device - and for all practical purposes it is rarely necessary to scan at higher than 240dpi.

Printed images use a technique called “half toning” to reproduce different levels of colour. In magazines an ordered halftone is used, where regular dots of differing sizes produce the varying levels of colour. Most inkjet printers use dithering, where the dots are scattered across the area of each pixel. This produces better-looking results at lower resolutions. The use of “half toning” means that the number of pixels per inch the printer can reproduce is lower than its stated 'dpi' resolution.

The rule of thumb for printing at 24-bit colour is that the number of pixels per inch is 16 times less than the resolution. This means that for a 600dpi printer a scan resolution of 40 pixels per inch is appropriate. The typesetters used in offset lithography - the technology used for printing glossy magazines - are capable of printing at 133 lines per inch. This technology is not quite the same as laser or inkjet printer technology and the general rule here is for layout artists to scan at 1.5 times the printing resolution - an equivalent of 200dpi.

When scanning for output on an inkjet printer, a commonly used rule of thumb is to scan at 1/3 of the resolution it is intended to print at. So, for typical modern inkjet printer with a maximum resolution of 720dpi, 240dpi are an appropriate scan resolution. Attempting to print at a printer's maximum resolution on ordinary plain paper is not, however, recommended. In this case, 360dpi is a more suitable print resolution - and correspondingly 120dpi a more appropriate scan resolution. If scanning in “grey scale” or line art, its better to use the full resolution of the printer without dividing it by three.

When scanning images for inclusion in Web pages or for displaying directly on a PC monitor, the scan resolution is chosen based on the desired size of the displayed image. Graphics cards are capable of different display modes - 640x480, 800x600, 1024x768 etc. - and monitors come with a number of different screen sizes. However, as a general rule of thumb, images for subsequent display on a PC monitor should be scanned at a resolution of around 72dpi.

Scan modes


PCs represent pictures in a variety of ways - the most common methods being are line art, halftone, greyscale, and colour:

Ø          Line art is the smallest of all the image formats. Since only black and white information is stored, the computer represents black with a 1 and white with a 0. It only takes 1 bit of data to store each dot of a black and white scanned image. Line art is most useful when scanning text or line drawing. Pictures do not scan well in line art mode

Ø          While computers can store and show greyscale images, most printers are unable to print different shades of grey. They use a trick called halftoning. Halftones use patterns of dots to fool the eye into believing it is seeing greyscale information

Ø          Greyscale images are the simplest of images for the computer to store. Humans can perceive about 255 different shades of grey - represented in a PC by a single byte of data with the value 0 to 255. A greyscale image can be thought of as equivalent to a black and white photograph

Ø          True colour images are the largest and most complex images to store, PCs using 8-bits (1 byte) to represent each of the colour components (red, green, and blue) and therefore 24-bits in total to represent the entire colour spectrum.

File formats


The format in which a scanned image is saved can have a significant effect on file size. File size is an important consideration when scanning, since the high resolutions supported by many modern scanners can result in the creation of image files as large as 30MB for an A4 page.

Windows bitmap (BMP) files are the largest, since they store the image in full colour without compression or in 256 colours with simple run-length encoding (RLE) compression. Images to be used as Windows wallpaper have to be saved in BMP format, but for most other cases it can be avoided.

Tagged image file format (TIFF) files are the most flexible, since they can store images in RGB mode for screen display, or CMYK for printing. TIFF also supports LZW compression, which can reduce the file size significantly without any loss of quality. This is based on two techniques introduced by Jacob Ziv and Abraham Lempel in 1977 and subsequently refined by Unisys researcher Terry Welch. LZ77 creates pointers back to repeating data, and LZ78 creates a dictionary of repeating phrases with pointers to those phrases.

CompuServe’s Graphics Interchange Format (GIF) stores images using indexed colour. A total of 256 colours are available in each image, although what these colours are can change from image to image. A table of RGB values for each index colour is stored at the start of the image file. GIFs tend to be smaller than most other file formats because of this decreased colour depth, making them a good choice for use in WWW-published material.

The PC Paintbrush (PCX) format has fallen into disuse, but offers a compressed format at 24-bit colour depth. The JPEG file format uses lossy compression and can achieve small file sizes at 24-bit colour depth. The level of compression can be selected - and hence the amount of data loss - but even at the maximum quality setting JPEG loses some detail and is therefore only really suitable for viewing images on-line. The number of levels of compression available depends on the image editing software being used.

Unless there is a need to preserve colour information from the original document, images stored for subsequent OCR processing are best scanned in greyscale. This uses a third of the space of an RGB colour scan. An alternative is to scan in line-art mode - black and white with no greyscales - but this often loses detail, reducing the accuracy of the subsequent OCR process.

The table below illustrates the relative file sizes that can be achieved by the different file formats in storing a 'native' 1MB image, and also indicates the colour depth supported:


File format
Image size
No. of colours
BMP – RGB
1MB
16.7 million
BMP –RLE
83KB
256
GIF
31KB
256
JPEG - min. compression
185KB
16.7 million
JPEG - min. progressive compression
150KB
16.7 million
JPEG - max. compression
20KB
16.7 million
JPEG - max. progressive compression
16KB
16.7 million
PCX
189KB
16.7 million
TIFF
1MB
16.7 million
TIFF - LZW compression
83KB
16.7 million

OCR

When a page of text is scanned into a PC, it is stored as an electronic file made up of tiny dots, or pixels; it is not seen by the computer as text, but rather, as a ‘picture of text’. Word processors are not capable of editing bitmap images. In order to turn the group of pixels into editable words, the image must go through a complex process known as Optical Character Recognition (OCR).

OCR research began in the late 1950s, and since then, the technology has been continually developed and refined. In the 1970s and early 1980s, OCR software was still very limited - it could only work with certain typefaces and sizes. These days, OCR software is far more intelligent, and can recognize practically all typefaces as well as severely degraded document images.

One of the earliest OCR techniques was something-called matrix, or pattern matching. Most text is either in Times, Courier, or Helvetica typefaces in point size between 10 and 14. OCR programs, which use the pattern matching method, have bitmaps stored for every character of each of the different font and type sizes. By comparing a database of stored bitmaps distributed to the bitmaps of the scanned letters the program attempts to recognise the letters. This early system was only really successful using non-proportional fonts like Courier where letters are spaced regularly and are easier to identify. Complex multi-font documents were well beyond its scope and an obvious limitation to this method is that it is only useful for the fonts and sizes stored.
Feature extraction was the next step in OCR’s development. This attempted to recognise characters by identifying their universal features, the goal being to make OCR typeface-independent. If all characters could be identified using rules defining the way that loops and lines join each other, then individual letters could be identified regardless of their typeface. For example: the letter 'a' is made from a circle, a line on the right side and an arc over the middle. The arc over the middle is optional. So, if a scanned letter had these 'features' it would be correctly identified as the letter 'a' by the OCR program.

In terms of research progress, feature extraction was a step forward from matrix matching, but actual results were badly affected by poor-quality print. Extra marks on the page, or stains in the paper, had a dramatic effect on accuracy. The elimination of such ‘noise’ became a whole research area in itself, attempting to determine which bits of print were not parts of individual letters. Once noise can be identified, the reliable character fragments can then be reconstructed into the most likely letter shapes.

No OCR software ever recognises 100% of the scanned letters. Some OCR programs use the matrix/pattern matching and/or feature extraction methods to recognise as many characters as possible - and complement this by using spell checking on the hitherto unrecognized letters. For example: if the OCR program was unable to recognise the letter 'e' in the word 'th~ir', by spell checking 'th~ir' the program could determine the missing letter is an 'e'.

Recent OCR technology is far more sophisticated than the early techniques. Instead of just trying to identify individual characters, modern techniques are able to identify whole words. This technology, developed by Caere, is called Predictive Optical Word Recognition (POWR).

Using higher levels of contextual analysis, POWR is able to virtually eliminate the problems caused by noise. It enables the computer to sift through the thousands or millions of different ways that dots in a word can be assembled into characters. Each possible interpretation is then assigned a probability, and the highest one is selected. POWR uses sophisticated mathematical algorithms, which allow the computer to hone in on the best interpretation without examining each possible version individually.

When probabilities are assigned to individual words, all kinds of contextual information and evidence is taken into account. The technology makes use of neutral networks and predictive modeling techniques taken from research in Artificial Intelligence (AI) and Cognitive Science. This enables POWR to identify words in a way, which more closely resembles human visual recognition. In practice, the technique significantly improves the accuracy of word recognition across all document types. Combining all sources of evidence, from low-level pixel-based information to high-level contextual clues assesses all the possible interpretations of a word. The most probable interpretation is then selected.

Although OCR systems have been around for a long time, their benefits are only just being appreciated. The first offerings were extremely costly, in terms of software and hardware, and they were inaccurate and difficult to use. Consequently many of the early adopters became frustrated with the technology. Over the past few years, however, OCR has been completely transformed. Modern OCR software is highly accurate, easy to use and affordable and for the first time OCR looks set to be adopted in all kinds of work environments on a mass scale.

Unless there is a specific need to preserve colour information from the original document, its best to scan documents for OCR in greyscale. This uses a third of the space of an RGB colour scan. Line-art mode makes for even smaller file sizes - but this often loses detail, reducing the accuracy of subsequent OCR processing.

No comments:

Post a Comment