For effective use of documents it is important not only to scan, but also to be able to search for them. We know two options for solving this problem — documents can be recognized or indexed.
Recognition of documents
Recognition of documents is used for primary sources of good quality — “fresh” book, a magazine, a dictionary, a questionnaire. Without the presence of “artifacts” (garbage, comments, entries in the fields, etc.). For this we use a document recognition system or an OCR- system. This system automatically enters the data into the computer (for example, in a text editor). And after the document is recognized, you can:
- Copy and edit text;
- Work with individual paragraphs of text;
- Search by words and phrases.
With regard to archival documents — the process of establishing recognition takes longer, and the accuracy is a small percentage. In this case, various methods of semi-automatic conversion of documents into electronic form using keywords (indexing) are used.
Indexing of documents
Document indexing — is the process of assigning identification features to scanned documents, allowing you to quickly find the necessary information in the database. Such signs can be: type of document, document number, document date, author of the document, etc.
At the stage of recognition or indexing, we work with the most complex data:
- Handwritten (fully handwritten documents or containing information written by hand);
- Executed on a typewriter;
- Poorly readable (text has faded, information is partially lost, etc.);
- Information from documents of the same type is located in different parts (for example, when changing the order of registration of contracts).
When performing full or partial recognition of documents, we use a multi-level quality check of the array. The creation of a resource also helps us to avoid possible mistakes, where the whole technology is tested on a small amount of information, the fields are consistent, and the quality criteria are determined.
Prices for recognition and indexing are determined individually.