Recognition and indexing

For effective use, it is important not only to scan documents, but also be able to search for them. There are several ways to solve this problem — document recognition or indexing.

Recognition of documents

Recognition of documents is used for primary sources of good quality, without the presence of “artifacts” (garbage, comments, entries in the fields, etc.). For this, various document recognition systems or OCR-systems (Optical Character Recognition) are used. Their task is to automatically enter all the data into the computer. This method is used for documents without “artifacts”, for example, pages of a “fresh” book, a magazine, a dictionary, a questionnaire. Recognized document user can copy, work with separate paragraphs of the text, correct them.
As for archival documents, the procedure for establishing the recognition of documents takes longer, and its reliability is a small percentage. In this case, various methods of semiautomatic transformation of documents into an electronic form are used, using keywords (indexing).

Indexing of documents

Indexing of documents is the process of assigning to the documents (their electronic copies or electronic documents) identification features that allow you to quickly find the necessary information in the database. Such indexes can be the type of document, its number, date, author, etc.

The advantage of Digital Country at the stage of data recognition / indexing is the work with the most complex data, in particular:

  • handwritten (fully handwritten documents or containing information written by hand);
  • executed on a typewriter;
  • poorly readable (text has faded, information is partially lost, etc.);
  • information from documents of the same type is located in different parts (for example, when changing the order of registration of contracts).

When performing full or partial recognition of documents, the specialists of our team use a multilevel quality check of the array. Avoiding possible errors is also helped by the creation of a resource, where the whole technology is tested on a small scale, the fields are coordinated, the quality criteria are determined.


