Algorithms and data mining to solve the data challenges of the Web

Data mining

The immense flow of data that companies must manage today, contemplating the multiple communication offers with the client, what is called omnichannel, presents the challenge of dealing with high data traffic. Even the simplest tasks, like having unique visitors, become complex when we think of millions of users and hundreds of thousands of resources; In this context, it is necessary to resort to a new algorithm to solve these challenges.

data mining

Under the title “Topics of Statistics and Data Mining in Big Data“, Scrapingpros gave a workshop where he presented a compilation of techniques, algorithms, and problems to solve the challenges previously raised. There, emphasis was placed on Big Data, focusing on problems coming from the web.

Traditional data mining algorithms and methods are revisited in order to deal with the characteristics that Big Data challenges us. What happens when data volumes are so large that they cannot be housed on a single hard drive? How can we measure the impact of a campaign when our audience is so volatile that, when we get ready to observe it, it is already gone? Is it possible to use all the available information, or should we choose a sample? And in that case, how do we do it?”, were the points discussed with the workshop attendees.

This workshop was given within the framework of the summer workshop: “Efficient extraction of semantic data”, given in Santiago de Chile on January 16 and 17, 2017, and organized by the Center for Semantic Research of the Web. The purpose of the CIWS is to investigate what is the most efficient way to extract semantic data from the Web and to develop basic tools to make that extraction even more effective. This initiative brings together teachers, researchers, and students from different universities in Chile.


No matter your data needs, we can help.

We identify, extract, clean, filter and deliver the data in the desired format ready for use in your database or in your upload queue. We adapt the data delivery to your integration requirements. We provide a reliable, secure, robust, and traceable response to your web-data need. We have the expertise to solve highly complex extraction tasks: OCRs, multiple-steps-extraction, proxy management, etc. We provide our 7x24 web-data storage and processing infrastructure, SLA 99.999%

    We’ve helped over hundreds of companies with needs of scrapers. Ready to know how we can help you?

    Follow Us