Make sure you have the following installed:
- Python 3.8 or later
- MongoDB (running locally or remotely)
Install the required Python packages using pip:
pip install pymongo crawl4aiSet the API key for your AI model (e.g., OpenAI API) as an environment variable in your operating system:
export API_KEY="your_llm_api_key_here"Specify the websites to crawl by editing the urls parameter in ListPageExtractor.py:
async def extract_news_list(self):
urls = [
# Add website URLs here
]Add the recipient email addresses in the recipients parameter in EmailService.py:
recipients = [
# Add email addresses here
]Use the scheduler.py script to manage the service:
-
Start the Scheduler:
python scheduler.py start
-
Stop the Scheduler:
python scheduler.py stop
-
Set Crawler Execution Time (multiple times can be specified):
python scheduler.py set_crawler_time 08:00 09:30
-
Set Email Sending Time (multiple times can be specified):
python scheduler.py set_email_time 16:00 17:30
Contributions are welcome! Feel free to fork this repository and submit a pull request.
This project is licensed under the MIT License.
Let me know if you need additional sections like examples or FAQs!