It analyse YouTube data and gives most popular genres on YouTube based on views and uploads.
-
GBvideos.csv (Dataset)
-
YouTube Data Analysis (Implementation MapReduce model to find the most popular genre on YouTube based on uploads)
-
Top Viewed Categories (Implementation MapReduce model to find the most popular genre on YouTube based on views)
-
Top Categories Output (Output files)
The output is obtained by creating a .jar file using the following lines of code on Linux terminal
- Make an input directory in Hadoop filesystem:
hdfs dfs -mkdir /YouTubeInput- Put input data from Linux filesystem to Hadoop DFS:
hdfs dfs -put /Downloads/YouTubeDataAnalysis/GBvideos.csv /YouTubeInput- Create and execute a jar file and save results in ouptut directory in hdfs:
hadoop jar /home/hadoop/TopViewedCategories.jar TopCategoryDriver /YouTubeInput /YouTubeOutput- To view results:
hdfs dfs -cat /YouTubeOutput/*- Get results from Hadoop DFS to Linux filesystem:
hdfs dfs -get /YouTubeOutput/* /Downloads/YouTubeAnalysis/TopCategoryOutput