Data augmentation to improve credit scoring system & bureau information
Scraping Pros is the leading provider of comprehensive web data extraction solutions helping clients succeed.
About the customer
This Company manages financial, demographic, employment and marketing information in the U.S., Canada, the UK and LATAM for key commercial uses.
Challenge
The main challenge of the organization lies in obtaining data on the commercial and financial behavior of companies and people, and then, together with risk models, the preparation of commercial reports. Data is an essential input where the company’s core business and new innovation initiatives are supported. The company requires the ingestion and updating of a database of +25M entities between people and companies with information from +20 public sources.
Efforts to keep information up-to-date should be maximized, focusing on quickly adapting scrapers to various website changes and crashes. This process means a significant saving of time and money.
Efforts to keep information up-to-date should be maximized, focusing on quickly adapting scrapers to various website changes and crashes. This process means a significant saving of time and money.
Solution
We have designed a comprehensive data extraction service customized to the demand for every day updating of a big database of +25M entities between people and companies.
- A comprehensive end-to-end ingest-extraction system was designed to manage the huge need for data.
- The Scraping Pros monitoring system was adopted and customized to meet the SLA requirements.
- An extraction prioritization model was designed and implemented according to a freshness ranking.
Results
Our staff have detected the drops and falls of the sources from the monitoring system based on a data quality management model.
- During the production start-up of the system, +200K daily records are extracted and ingested from the +20 selected sources of the project.
- The monitoring system allows for the follow-up of downloads and processing of data registers and effectively detects the discharges and falls of the sources.
- Main achievement: It has been possible to effectively detect the drops and falls of the sources from the monitoring system developed ad-hoc by Scraping Pros.