An automated solution for tracking the updates of compliance documents of the most important companies
We are experts in the automated web data extraction of legal and technical documents for companies that are in the process of information gathering and compliance.
About the customer
The company offers a web-based research platform to law firms, corporations, government agencies and academic institutions that are seeking legal solutions, news & business insights.
The challenge is automating the process of extracting web data from specific changes in legal or technical documents around the websites of 5K well-known companies listed on the stock exchange, and building a large database updated as often as the company needs.
We achieved a solution that automates the entire notification process, making it simple, secure, and effective. And can handle speedily all the data extraction needs.
- The web scraping system checks with a very high frequency +60K documents hosted on websites, from between 4K and 5K companies, and quickly identifies what changes have occurred in the content of the documents. From this, daily notifications are generated to the client, who can track the information updated and be aware of the changes in the legal conditions of these companies. Among this critical information are codes of ethics, codes of conduct, corporate social responsibility documents, and human resources documents.
- At the same time, an automatic system has been developed to generate a database from different blogs publications of 100 leading law firms worldwide. The system identifies the modifications in contents and saves them in a large database. From this, the client can classify and update the legal information more efficiently.
We manage the complex process of extracting data from the web and make it automatic, taking care of all the hard work for our clients while they focus on what they do best.
- Conversion of a product that usually extracted information from 300 websites with low quality and manual work to a completely automated process of scraping between 4K and 5K websites with all available updates.
- The update frequency was improved and adapted to the frequency required by the business, shortening the time that the client had to have the key information.
- Once the web data was structured in the client’s database, the company has the chance to create agile models for automatic labeling and classification of information using machine learning techniques.