GitHub - karzamisca/WebScraper

Web Scraper with Playwright and Google Translate

WARNING: Use sites with https only

Classes and Functions

Functions

scrape_url(export_html, export_pdf, export_text, export_original_text, url, export_path, target_lang)
- Purpose: Scrapes a single URL and exports the content based on selected options.
- Parameters:
  - export_html: Boolean to determine if HTML should be exported.
  - export_pdf: Boolean to determine if PDF should be exported.
  - export_text: Boolean to determine if translated text should be exported.
  - export_original_text: Boolean to determine if original text should be exported.
  - url: The URL to scrape.
  - export_path: Path where the exported files will be saved.
  - target_lang: Target language code for text translation.
scrape(export_html, export_pdf, export_text, export_original_text, urls, export_path, target_lang)
- Purpose: Manages the scraping of multiple URLs concurrently.
- Parameters:
  - export_html: Boolean to determine if HTML should be exported.
  - export_pdf: Boolean to determine if PDF should be exported.
  - export_text: Boolean to determine if translated text should be exported.
  - export_original_text: Boolean to determine if original text should be exported.
  - urls: List of URLs to scrape.
  - export_path: Path where the exported files will be saved.
  - target_lang: Target language code for text translation.
choose_url_file()
- Purpose: Opens a file dialog to select a text file containing URLs.
- Parameters: None (uses Tkinter file dialog).
choose_export_path()
- Purpose: Opens a directory dialog to select the export directory.
- Parameters: None (uses Tkinter directory dialog).
start_scraping()
- Purpose: Initiates the scraping process based on user input from the GUI.
- Parameters: None (uses values from Tkinter GUI elements).

Main GUI Elements

root: The main Tkinter window for the GUI.
html_var: Boolean variable for the HTML export option.
pdf_var: Boolean variable for the PDF export option.
text_var: Boolean variable for the translated text export option.
original_text_var: Boolean variable for the original text export option.
url_file_entry: Entry field for the URL text file path.
export_path_entry: Entry field for the export directory path.
language_var: String variable for the target language selection.
language_dropdown: Dropdown menu for selecting the target language.
start_button: Button to start the scraping process.

Usage

Select URL File: Click "Browse..." to choose a text file containing URLs to scrape.
Select Export Directory: Click "Browse..." to choose the directory where files will be saved.
Choose Export Options: Check the boxes for HTML, PDF, translated text, and/or original text.
Select Target Language: Use the dropdown menu to select the language for text translation.
Start Scraping: Click "Start Scraping" to begin the process.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitattributes		.gitattributes
CHANGELOG.md		CHANGELOG.md
Compile_exe_guide.md		Compile_exe_guide.md
README.md		README.md
known_problems.md		known_problems.md
main.py		main.py
work_ethnic.md		work_ethnic.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraper with Playwright and Google Translate

WARNING: Use sites with https only

Classes and Functions

Functions

Main GUI Elements

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Scraper with Playwright and Google Translate

WARNING: Use sites with https only

Classes and Functions

Functions

Main GUI Elements

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages