For effective use of documents it is important not only to scan, but also to be able to search for them. We know two options for solving this problem — documents can be recognized or indexed.
Recognition of documents
Recognition of documents is used for primary sources of good quality — “fresh” book, a magazine, a dictionary, a questionnaire. Without the presence of “artifacts” (garbage, comments, entries in the fields, etc.). For this we use a document recognition system or an OCR- system. This system automatically enters the data into the computer (for example, in a text editor). And after the document is recognized, you can:
- Copy and edit text;
- Work with individual paragraphs of text;
- Search by words and phrases.
With regard to archival documents — the process of establishing recognition takes longer, and the accuracy is a small percentage. In this case, various methods of semi-automatic conversion of documents into electronic form using keywords (indexing) are used.
Indexing of documents
Document indexing — is the process of assigning identification features to scanned documents, allowing you to quickly find the necessary information in the database. Such signs can be: type of document, document number, document date, author of the document, etc.
At the stage of recognition or indexing, we work with the most complex data:
- Handwritten (fully handwritten documents or containing information written by hand);
- Executed on a typewriter;
- Poorly readable (text has faded, information is partially lost, etc.);
- Information from documents of the same type is located in different parts (for example: when changing the order of registration of contracts).
When performing full or partial recognition of documents, we use a multi-level quality check of the array. The creation of a resource also helps us to avoid possible mistakes, where the whole technology is tested on a small amount of information, the fields are consistent, and the quality criteria are determined.
Prices for recognition and indexing are determined individually.