GitHub - Altonormz/NASDAQ

Authors: Alot Mecilati & Jonathan Schwarz

NASDAQ Market News and Insights

Nasdaq is a global electronic marketplace that permits investors and traders to buy and sell securities. NASDAQ is one of the leading stock exchanges in the world and the world’s first electronic stock exchange. Its performance can provide insights into the overall health of the stock market. Market news report insights and analysis on the transformative forces shaping the global economy like Mergers, Acquisitions or FDA approvals, they would likely be seen as a bullish indicator as a result of that material event. Overall, scraping NASDAQ news can provide valuable insights into market trends, competitor activities, and potential risks or challenges, allowing individuals and organizations to make more informed decisions about their investments and strategies.

the data scraped:

We chose to scrape the following information from these news articles:

Dates
Titles
Authors
Content of articles
Stocks discussed (tickers)

All the info is gathered inorder to better understand stocks trends and their effect on the market.

Objectives:

successfully scrape relevant data from NASDAQ market news (including all information mentioned above).
process the data scraped to filter noise and focus on what's important.
Market statistical analysis: analyse the data scraped and cleaned to test hypothesis theories and learn what we can from each piece of the puzzle
create a large DB of information about market trends for future use (Prediction algorithms).

Methods and Used Libraries

Generally the code is divided into a few main stages:

parse the user arguments and filters using argparse library (for further information please see "User Arguments section below)
Creating the DB (for more information about the DB structure and content please see "Database Structure and Features" section below)
Acquiring the urls for each article from the "Recent Markets headlines" pages (one thousand pages overall), the requests were done using grequests (in batches of 10) and searching each page once a response was received. The urls are searched and acquired using an HTML class (unique to the article pages) taking the href attribute using bs4. The urls are transferred to the second phase - the article scraping.
Acquiring the urls from the previous stage, the next stage makes use of an object called Article. Requests (using grequests threads in batches of 10) are made to the server using the urls read and for each url. If the request was successful, with the help of bs4, data regrading the date, title, authors, tags and the text is collected into the object attributes. All the info mentioned is then transferred to the last stage.
The last step takes a list of article objects and inserts the data within them to the DB created in stage 2. The data is checked, and if not found within the DB the data is inserted.

User Arguments:

3 filters arguments are available:

scrape_all - scraping all data from all pages and article with no filter (when used no other argument may be used).
pages - scrape x number of pages starting from the first (*see note 1 below) the default value is 1000 pages.
time - given a date and possibly time the program will scrape any article from
update - updates the stock prices in the database from the API today till the date and time mentioned (*see notes below). For full list of options for this argument please see: https://dateparser.readthedocs.io/en/latest/
- note 1: the pages and time arguments may be operated together, however, the first to reach its limit will determine the amount of articles scraped.
- note 2: depending on the recent news the date of the last article may vary, meaning that the date entered may scrape more, less or all data.

example:

`python NASDAQ_scraper.py --scrape_all -pages 500 -time 2023-04-04T05:00:00`

Installations Required

All the installations required including versions can be found in the "requirements.txt" file

Database Structure and Features

Authors:

author_id [int] - primary key generated automatically
author_name [varchar] - author name

Articles:

article_id [int] - primary key generated automatically
author_id [int] - Foreign key from Authors
title [varchar] - the title of the article
article_content [varchar] - the article text
url [varchar] - the article url
published_date [datetime] - thr date and time the article was published

Article_Tags:

article_tag_id [int] - primary key generated automatically
article_id [int] - Foreign key from Articles
tag_id [int] - Foreign key from Tags

Tags:

tag_id [int] - primary key generated automatically
tag_name [varchar] - the tag name (text)

Stock_Articles:

stock_article_id [int] - primary key generated automatically
stock_id [int] - Foreign key from Stocks
article_id [int] - Foreign key from Articles

Stocks:

stock_id [int] - primary key generated automatically
stock_tick [varchar] - stock tick (symbol)
name [varchar] - the company name
currency [varchar] - the currency the stock is traded with
country [varchar] - the country the company is listed in
sector [varchar] - the sector in the industry the company belongs to
industry [varchar] - the industry the company belongs to

Stocks_Prices:

stock_price_id [int] - primary key generated automatically
stock_id [int] - Foreign key from Stocks
date [datetime] - the date of prices record
open [int] - opening stock price
high [int] - highest stock price in that date
low [int] - lowest stock price in that date
close [int] - closing stock price
volume [int] - number of shares

Running the program

To start the program please run "NASDAQ_scraper.py" (with the arguments of your choice).

Please download "Class_Article.py", "NASDAQ_datacollecter.py" and "conf.json", and make sure to follow the "requirements.txt" installations before running the program.

Before running the program please change the detail of the MySQL username and password in "conf.json" file:

"USER": "root"

"PASSWORD": "root"

API - Alphavantage

Link: https://www.alphavantage.co/documentation/

Alphavantage is stock market API with a very large amount of data about the stocks, companies, currencies, news, cryptocurrencies, commodities and much more! In this project the API was used to collect data regarding the stocks (and companies) and stock prices. The data the scaper does not collect and is added through the API script:

In Stocks table: name, currency, country, sector, industry.
In the Stock_Prices table - all fields.

The API has free limited access per day - only 500 queries and each stock, for each subject, requires 1 query. We use 2 API keys inorder to:

Update 250 newly added stocks (in Stocks table) and add prices of the last 100 days of these stocks to the Stocks_Prices table (500 API calls in total).
Update previously added stocks with new dates and prices records (in the Stocks_Prices table). The update is done by checking the 500 earliest dates each of the stocks were last updated in, and update these stocks with all dates in between the last date and the currant date (500 calls in total).

The program first updates the stocks info of newly discovered NASDAQ scraped stocks and the initial prices (100 days back) and only then updates previously add dates.

It is important to provide 2 API keys in the configuration file ("conf.json").

"info_token" : Your first token

"price_token" : You second token

"queries" : number of queries according to the number of tokens, your pricing program and records you wish to update.

Note: when reaching the limit of the token the program will stop automatically and log a message. Please see the log file if you suspect that is the case.

Authors:

Alot Mecilati:

Git

LinkIn

Jonathan Schwarz:

Git

LinkIn

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.idea		.idea
Images		Images
logs		logs
.gitignore		.gitignore
API_datacollector.py		API_datacollector.py
Class_Article.py		Class_Article.py
NASDAQ_datacollecter.py		NASDAQ_datacollecter.py
NASDAQ_scraper.py		NASDAQ_scraper.py
NASDAQ_sql_schema.sql		NASDAQ_sql_schema.sql
README.md		README.md
conf.json		conf.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NASDAQ Market News and Insights

the data scraped:

Objectives:

Methods and Used Libraries

User Arguments:

Installations Required

Database Structure and Features

Running the program

API - Alphavantage

Authors:

Alot Mecilati:

Jonathan Schwarz:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NASDAQ Market News and Insights

the data scraped:

Objectives:

Methods and Used Libraries

User Arguments:

Installations Required

Database Structure and Features

Running the program

API - Alphavantage

Authors:

Alot Mecilati:

Jonathan Schwarz:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages