For this dataset, we prepared 10,000 photos of information signs outside of businesses and a text file with the INN (Taxpayer Identification Number) and OGRN (Business Registration Number) codes shown on the signs. This data can be used for training a computer vision model to recognize number sequences in images. The dataset was provided by Yandex Business Directory.
How we collected the data
First we launched a task in the Yandex.Toloka mobile app that asked performers to go to a specific address marked on the map, find the organization, and take a photo of its information sign. We use field tasks like this to keep the Yandex Business Directory updated.
Then the quality of completed tasks was checked by other performers. The photos containing the INN and OGRN codes were sent for reсognition. Toloka performers typed out the numbers from the photos, and then we processed the results and formed a dataset.