Fix error in Python-based web scraper wit GUI

300.0 USD

300.0 USD peopleperhour Technology & Programming Overseas
9 days ago

Description

Hello Freelancers,I'm searching for a developer familiar with web scraping and Python to fix an existing web scraper which scrapes product data from products from category links from italian ecommerce-website www.yeppon.it
The script basically works, but it gives an error on certain points when scraping, I think because of a light change in the structure of the website which causes an error when the script tries to scrape a products text description.
Goal of this project is to fix the errors so the script works like it used to again, scraping data of products from given category-URLs from the website and giving out the data in csv-files. I think this won't be too much of an effort because it is basically this one error which needs to be located and fixed, everything else still seems to work fine. Price can be discussed.
Some facts:
1. The web scraper is based on Python with a GUI. It's final version comes as an exe file (therefore I can't attach it in the project description, I will send it in the messages or work stream).2. It scrapes certain product data (like product name, price, description, image links) by category links which can be entered into the GUI. The GUI also has some input fields, these are just for fixed strings which can be entered into the fields and will be given out in the CSV files the script gives the product data in.3. The scraper technically still works, however, it gives an error when scraping certain categories. You can check this by running the tool, filling out the given input fields with the data explained in the "Instructions" tab of the tool and then start scraping. It will produce this error (can be found in the log file):
--------------------------------------------------------------------------------------------------------------2024-04-24 12:44:24,765:ERROR:'descriptionHtml'Traceback (most recent call last): File "async_scraper.py", line 739, in scrape description_html = pdata["pageProps"]["product"][ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^KeyError: 'descriptionHtml'2024-04-24 12:44:24,765:INFO: Finally2024-04-24 12:44:24,765:INFO:
No data found!
------------------------------------------------------------------------------------------------------------------4. The error seems to occur when scraping a products text description. The text description consists of three possible elements:- bulletpoints formatted into an ul element- a text description which is cleaned/has HTML code removed/replaced- scraping data from a table on the website and putting it into a given HTML structure

It was developed by a freelancer from PPH for a colleague of mine, unfortunately I can't reach my colleague for quite some time now to ask for all the details or the freelancers name, so I will post this to the public.
Scraping some categories will result in the error mentioned above, for example:https://www.yeppon.it/c/elettrodomestici/grandi-elettrodomestici/asciugabiancheriaorhttps://www.yeppon.it/c/elettrodomestici/grandi-elettrodomestici/frigoriferi
Others work just fine, like:https://www.yeppon.it/c/telefonia/smartphone/smart-phone

I will attach the files I have about this project from my colleague. As I can't attach exe or rar files, I attached:
- a first version of the Python code (it is a beta version which will give another error which is solved in the final exe file and not the final code, just to give you an impression), as well as the code of the GUI and the requirements. These are async_scraper.txt, gui.txt and requirements.txt

关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now

Similar Teleworks

I am seeking a skilled WordPress developer to implement a booking and membership management system on our website, which caters to driving simulators. This system should enable customers to register and create user profiles to purchase and manage memberships, which will offer a set number of no-additional-cost bookings per month. Additionally, the system must allow hour-by-hour bookings as a guest without registration. Specific System Details: User Registration: Users should be able to register and manage their account, which is necessary for purchasing and managing memberships. Memberships: Example - Basic Membership: Includes 4 bookings per month, valid only on weekdays. This is an example of how memberships might be structured. Each membership should allow the user to utilize their bookings without additional costs, with a system that automatically applies a 100% discount until the number of free bookings is exhausted and resets at the beginning of each subscription month. Guest Bookings: The system should also allow non-members to book simulator sessions by the hour as guests, providing just their name, email, and phone number. Technical Requirements: The system should be easy to manage from the WordPress backend, allowing administrators to adjust the number of bookings and view membership statuses. Preferably, use existing WordPress plugins and/or customized solutions for effective integration. Objective: To create a robust solution that facilitates both registered user and guest session bookings, enhancing user experience and site management. I am looking for proposals that include cost estimates and development timelines. Please detail your previous experience with similar projects and how you would approach this project. I look forward to collaborating with a professional who can provide an effective and efficient solution.
320.0 USD Technology & Programming peopleperhour Overseas
2 days ago