After collecting an accurate listing of products, BlueBoard will continue onto the next stage: product search and matching.
The listing is imported into the back office administration tool. From there, the system will run a search for each product separately. It will return all search results and associated URLs; each of them could potentially be a good match. The URLs are ranked by relevance using an in-house natural language processing algorithm.
The international team – located in Vietnam – will then manually check the most relevant results for each product and exclude all false positives. On average, one team member will process about 1500 URLs per day.
For a variety of reasons, some results may be missed during the initial searching phase: the product pages may lack some information, the product may have been added in between or a human mistake was made at any point. In order to overcome this issue and detect all the products, the APM (Automatic Product Matching) is used. It periodically performs additional searches to look for potential matches that were not listed before. The APM will use the information from the initial results and perform searches on each retail website to find potentially missing product pages. Although the APM only surfaces the most relevant results, human judgement is still involved in validating every match to ensure the overall quality.
When a product has been found, BlueBoard starts monitoring it very closely. From then on, it will frequently update in fast speed its key information: this is The Scraping.