Qingli Niu1, Irfan Ali Kandhro2, Anil Kumar2, Shahnawaz shah3, Muhammad Hasan2, Hifza Mehfooz Ahmed2, and Fei Liang This email address is being protected from spambots. You need JavaScript enabled to view it.1

1College of Information Engineering, Zhengzhou University of Science & Technology, Zhengzhou 450064, China
2Department of Computer Science, Sindh Madressatul Islam University, Karachi, Pakistan
3Department of telecommunication engineering, University of Sindh Jamshoro, Pakistan


Received: February 28, 2022
Accepted: May 8, 2022
Publication Date: June 17, 2022

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202304_26(4).0002  


Web scraping is the process of extracting data from a website in an efficient and fast way. In such a scenario, python programming can offer useful set of methods that help web editors to improve the quality of the provided service. This scraper contains three steps 1) to understand the structure of web page, 2) design regular expression pattern and finally use that pattern to get certain data. In this paper, we also used Flask, Request, JSONify library to get the data, after processing, the data is transformed into the JSON form and ready for CSV with help of API. After generated all required regex patterns, the system uses these patterns as a set of rules, and with this, designed scraper tool works efficiently, and achieved outstanding results with help of support libraries to storing and extracting the news and web-based information. The proposed Web scraping tool eliminates the time and effort of manually collecting or copying data by automating the process. It is found that this designed scraper is easy and direct approach to extract the newspapers, websites, blogs, and images data.

Keywords: web scraping, extracting, retrieving, Python framework, API, manually collecting data


