diff --git a/Mini_Project_SQL_with_Spark 5.6.ipynb b/Mini_Project_SQL_with_Spark 5.6.ipynb
new file mode 100644
index 000000000..e1bd71e6c
--- /dev/null
+++ b/Mini_Project_SQL_with_Spark 5.6.ipynb
@@ -0,0 +1 @@
+{"cells":[{"cell_type":"markdown","source":["## SQL at Scale with Spark SQL\n\nWelcome to the SQL mini project. For this project, you will use the Databricks Platform and work through a series of exercises using Spark SQL. The dataset size may not be too big but the intent here is to familiarize yourself with the Spark SQL interface which scales easily to huge datasets, without you having to worry about changing your SQL queries. \n\nThe data you need is present in the mini-project folder in the form of three CSV files. This data will be imported in Databricks to create the following tables under the __`country_club`__ database.\n\n \n1. The __`bookings`__ table,\n2. The __`facilities`__ table, and\n3. The __`members`__ table.\n\nYou will be uploading these datasets shortly into the Databricks platform to understand how to create a database within minutes! Once the database and the tables are populated, you will be focusing on the mini-project questions.\n\nIn the mini project, you'll be asked a series of questions. You can solve them using the databricks platform, but for the final deliverable,\nplease download this notebook as an IPython notebook (__`File -> Export -> IPython Notebook`__) and upload it to your GitHub."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"7dc8cef6-8322-4e3a-950b-757de959bbd7","inputWidgets":{},"title":""}}},{"cell_type":"markdown","source":["### Creating the Database\n\nWe will first create our database in which we will be creating our three tables of interest"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"3bd664ca-d7cc-4b4d-9c35-9957dd665c78","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql \ndrop database if exists country_club cascade;\ncreate database country_club;\nshow databases;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"98ba3faa-c4e8-48ef-9cc8-e2226e31582d","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["country_club"],["default"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"databaseName","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
databaseName country_club default
"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Creating the Tables\n\nIn this section, we will be creating the three tables of interest and populate them with the data from the CSV files already available to you. \nTo get started, first upload the three CSV files to the DBFS as depicted in the following figure\n\n\n\n\nOnce you have done this, please remember to execute the following code to build the dataframes which will be saved as tables in our database"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"89fe2dd6-f130-4979-abff-3cfd7eefc14f","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["# File location and type\nfile_location_bookings = \"/FileStore/tables/Bookings.csv\"\nfile_location_facilities = \"/FileStore/tables/Facilities.csv\"\nfile_location_members = \"/FileStore/tables/Members.csv\"\n\nfile_type = \"csv\"\n\n# CSV options\ninfer_schema = \"true\"\nfirst_row_is_header = \"true\"\ndelimiter = \",\"\n\n# The applied options are for CSV files. For other file types, these will be ignored.\nbookings_df = (spark.read.format(file_type) \n .option(\"inferSchema\", infer_schema) \n .option(\"header\", first_row_is_header) \n .option(\"sep\", delimiter) \n .load(file_location_bookings))\n\nfacilities_df = (spark.read.format(file_type) \n .option(\"inferSchema\", infer_schema) \n .option(\"header\", first_row_is_header) \n .option(\"sep\", delimiter) \n .load(file_location_facilities))\n\nmembers_df = (spark.read.format(file_type) \n .option(\"inferSchema\", infer_schema) \n .option(\"header\", first_row_is_header) \n .option(\"sep\", delimiter) \n .load(file_location_members))"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"936f355f-a485-4d3c-9a04-87bb55965d65","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["### Viewing the dataframe schemas\n\nWe can take a look at the schemas of our potential tables to be written to our database soon"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"f10ed1f5-65a6-4bc3-a902-606102a12222","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["print('Bookings Schema')\nbookings_df.printSchema()\nprint('Facilities Schema')\nfacilities_df.printSchema()\nprint('Members Schema')\nmembers_df.printSchema()"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"45dd3bb9-3cc9-415b-a0a7-891c8a0ade8c","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":"Bookings Schema\nroot\n |-- bookid: integer (nullable = true)\n |-- facid: integer (nullable = true)\n |-- memid: integer (nullable = true)\n |-- starttime: timestamp (nullable = true)\n |-- slots: integer (nullable = true)\n\nFacilities Schema\nroot\n |-- facid: integer (nullable = true)\n |-- name: string (nullable = true)\n |-- membercost: double (nullable = true)\n |-- guestcost: double (nullable = true)\n |-- initialoutlay: integer (nullable = true)\n |-- monthlymaintenance: integer (nullable = true)\n\nMembers Schema\nroot\n |-- memid: integer (nullable = true)\n |-- surname: string (nullable = true)\n |-- firstname: string (nullable = true)\n |-- address: string (nullable = true)\n |-- zipcode: integer (nullable = true)\n |-- telephone: string (nullable = true)\n |-- recommendedby: integer (nullable = true)\n |-- joindate: timestamp (nullable = true)\n\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["Bookings Schema\nroot\n |-- bookid: integer (nullable = true)\n |-- facid: integer (nullable = true)\n |-- memid: integer (nullable = true)\n |-- starttime: timestamp (nullable = true)\n |-- slots: integer (nullable = true)\n\nFacilities Schema\nroot\n |-- facid: integer (nullable = true)\n |-- name: string (nullable = true)\n |-- membercost: double (nullable = true)\n |-- guestcost: double (nullable = true)\n |-- initialoutlay: integer (nullable = true)\n |-- monthlymaintenance: integer (nullable = true)\n\nMembers Schema\nroot\n |-- memid: integer (nullable = true)\n |-- surname: string (nullable = true)\n |-- firstname: string (nullable = true)\n |-- address: string (nullable = true)\n |-- zipcode: integer (nullable = true)\n |-- telephone: string (nullable = true)\n |-- recommendedby: integer (nullable = true)\n |-- joindate: timestamp (nullable = true)\n\n"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Create permanent tables\nWe will be creating three permanent tables here in our __`country_club`__ database as we discussed previously with the following code"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"8766081c-ff5f-4bfa-870c-dcb7f9d1698c","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["permanent_table_name_bookings = \"country_club.Bookings1\"\nbookings_df.write.format(\"parquet\").saveAsTable(permanent_table_name_bookings)\n\npermanent_table_name_facilities = \"country_club.Facilities1\"\nfacilities_df.write.format(\"parquet\").saveAsTable(permanent_table_name_facilities)\n\npermanent_table_name_members = \"country_club.Members1\"\nmembers_df.write.format(\"parquet\").saveAsTable(permanent_table_name_members)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"a989021a-29b8-4159-9a8d-5f3a707379e3","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["### Refresh tables and check them"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"a8d01df0-94bc-4097-845e-02e7e1637e4f","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nuse country_club;\nREFRESH table bookings1;\nREFRESH table facilities1;\nREFRESH table members1;\nshow tables;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"11600185-2386-4341-89cb-66d50b9a29ee","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["country_club","bookings1",false],["country_club","facilities1",false],["country_club","members1",false]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"database","type":"\"string\"","metadata":"{}"},{"name":"tableName","type":"\"string\"","metadata":"{}"},{"name":"isTemporary","type":"\"boolean\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["database tableName isTemporary country_club bookings1 false country_club facilities1 false country_club members1 false
"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Test a sample SQL query\n\n__Note:__ You can use __`%sql`__ at the beginning of a cell and write SQL queries directly as seen in the following cell. Neat isn't it!"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"fdae66bd-5e7a-48f5-b715-ac4f761050ae","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nselect * from bookings1 limit 3"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"4339b5ab-b006-4458-aad5-b1f0a5c1ec87","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[0,3,1,"2012-07-03T11:00:00.000+0000",2],[1,4,1,"2012-07-03T08:00:00.000+0000",2],[2,6,0,"2012-07-03T18:00:00.000+0000",2]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"bookid","type":"\"integer\"","metadata":"{}"},{"name":"facid","type":"\"integer\"","metadata":"{}"},{"name":"memid","type":"\"integer\"","metadata":"{}"},{"name":"starttime","type":"\"timestamp\"","metadata":"{}"},{"name":"slots","type":"\"integer\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["bookid facid memid starttime slots 0 3 1 2012-07-03T11:00:00.000+0000 2 1 4 1 2012-07-03T08:00:00.000+0000 2 2 6 0 2012-07-03T18:00:00.000+0000 2
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q1: Some of the facilities charge a fee to members, but some do not. Please list the names of the facilities that do."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"17c520af-243e-4a39-8a24-ea6aa3b6a368","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT name \nFROM facilities1 \nWHERE membercost = 0;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"21f137b3-edf7-4c65-853a-42b836fa3481","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Badminton Court"],["Table Tennis"],["Snooker Table"],["Pool Table"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"name","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["name Badminton Court Table Tennis Snooker Table Pool Table
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q2: How many facilities do not charge a fee to members?"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"79bf7b92-87d8-4efb-ba7d-f0edcb59cc4b","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT COUNT(*) AS Count \nFROM facilities1 \nWHERE membercost = 0;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"0b10a941-41e4-4145-b853-801859a6bfa5","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[4]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Count","type":"\"long\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":[""]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q3: How can you produce a list of facilities that charge a fee to members, where the fee is less than 20% of the facility's monthly maintenance cost? \n#### Return the facid, facility name, member cost, and monthly maintenance of the facilities in question."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"bc6cd845-0be6-4c95-ade4-7d52c3a13cc8","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facid, \nname, \nmembercost, \nmonthlymaintenance \nFROM facilities1 \nWHERE (membercost > 0) \nAND (membercost < monthlymaintenance * .2)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"d35a57f8-07ea-42dc-9f4f-53694daefff1","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[0,"Tennis Court 1",5.0,200],[1,"Tennis Court 2",5.0,200],[4,"Massage Room 1",9.9,3000],[5,"Massage Room 2",9.9,3000],[6,"Squash Court",3.5,80]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"facid","type":"\"integer\"","metadata":"{}"},{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"membercost","type":"\"double\"","metadata":"{}"},{"name":"monthlymaintenance","type":"\"integer\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["facid name membercost monthlymaintenance 0 Tennis Court 1 5.0 200 1 Tennis Court 2 5.0 200 4 Massage Room 1 9.9 3000 5 Massage Room 2 9.9 3000 6 Squash Court 3.5 80
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q4: How can you retrieve the details of facilities with ID 1 and 5? Write the query without using the OR operator."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"9bc31a3f-ab2c-413c-9b99-46581023ae0c","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT * \nFROM facilities1 \nWHERE facid IN (1, 5)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"cb034b10-5840-43e9-a25a-62503daa7c09","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[1,"Tennis Court 2",5.0,25.0,8000,200],[5,"Massage Room 2",9.9,80.0,4000,3000]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"facid","type":"\"integer\"","metadata":"{}"},{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"membercost","type":"\"double\"","metadata":"{}"},{"name":"guestcost","type":"\"double\"","metadata":"{}"},{"name":"initialoutlay","type":"\"integer\"","metadata":"{}"},{"name":"monthlymaintenance","type":"\"integer\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["facid name membercost guestcost initialoutlay monthlymaintenance 1 Tennis Court 2 5.0 25.0 8000 200 5 Massage Room 2 9.9 80.0 4000 3000
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q5: How can you produce a list of facilities, with each labelled as 'cheap' or 'expensive', depending on if their monthly maintenance cost is more than $100? \n#### Return the name and monthly maintenance of the facilities in question."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"0e0302a2-2911-41be-9599-e12323e7f23c","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT name, \nmonthlymaintenance, \nCASE WHEN monthlymaintenance > 100 \nTHEN \"expensive\" \nELSE \"cheap\" END AS value \nFROM facilities1;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"41373ea4-9038-4c8f-842f-8aae7b074809","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Tennis Court 1",200,"expensive"],["Tennis Court 2",200,"expensive"],["Badminton Court",50,"cheap"],["Table Tennis",10,"cheap"],["Massage Room 1",3000,"expensive"],["Massage Room 2",3000,"expensive"],["Squash Court",80,"cheap"],["Snooker Table",15,"cheap"],["Pool Table",15,"cheap"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"monthlymaintenance","type":"\"integer\"","metadata":"{}"},{"name":"value","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["name monthlymaintenance value Tennis Court 1 200 expensive Tennis Court 2 200 expensive Badminton Court 50 cheap Table Tennis 10 cheap Massage Room 1 3000 expensive Massage Room 2 3000 expensive Squash Court 80 cheap Snooker Table 15 cheap Pool Table 15 cheap
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q6: You'd like to get the first and last name of the last member(s) who signed up. Do not use the LIMIT clause for your solution."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"30f9e29f-9608-4c5d-a371-fc4cb22f9ea2","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT firstname, \nsurname \nFROM members1 \nWHERE joindate in (SELECT MAX(joindate) FROM members1)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"74bf3f5b-924d-4d90-b978-ffd456c22f43","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Darren","Smith"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"firstname","type":"\"string\"","metadata":"{}"},{"name":"surname","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["firstname surname Darren Smith
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q7: How can you produce a list of all members who have used a tennis court?\n- Include in your output the name of the court, and the name of the member formatted as a single column. \n- Ensure no duplicate data\n- Also order by the member name."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"ded40971-9804-46e8-a647-5b9cefce363e","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT DISTINCT facilities1.name AS Court_Name, \nCONCAT(members1.firstname, \" \", members1.surname) AS Member_Name \nFROM ((bookings1 \nINNER JOIN members1 \nON bookings1.memid = members1.memid) \nINNER JOIN facilities1 \nON bookings1.facid = facilities1.facid) \nWHERE facilities1.name LIKE \"Tennis Court%\" \nORDER BY Member_Name;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"879ff42d-7f1d-47e6-a828-5cd82775c0ee","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Tennis Court 2","Anne Baker"],["Tennis Court 1","Anne Baker"],["Tennis Court 2","Burton Tracy"],["Tennis Court 1","Burton Tracy"],["Tennis Court 1","Charles Owen"],["Tennis Court 2","Charles Owen"],["Tennis Court 2","Darren Smith"],["Tennis Court 2","David Farrell"],["Tennis Court 1","David Farrell"],["Tennis Court 2","David Jones"],["Tennis Court 1","David Jones"],["Tennis Court 1","David Pinker"],["Tennis Court 1","Douglas Jones"],["Tennis Court 1","Erica Crumpet"],["Tennis Court 1","Florence Bader"],["Tennis Court 2","Florence Bader"],["Tennis Court 1","GUEST GUEST"],["Tennis Court 2","GUEST GUEST"],["Tennis Court 2","Gerald Butters"],["Tennis Court 1","Gerald Butters"],["Tennis Court 2","Henrietta Rumney"],["Tennis Court 1","Jack Smith"],["Tennis Court 2","Jack Smith"],["Tennis Court 2","Janice Joplette"],["Tennis Court 1","Janice Joplette"],["Tennis Court 2","Jemima Farrell"],["Tennis Court 1","Jemima Farrell"],["Tennis Court 1","Joan Coplin"],["Tennis Court 1","John Hunt"],["Tennis Court 2","John Hunt"],["Tennis Court 1","Matthew Genting"],["Tennis Court 2","Millicent Purview"],["Tennis Court 2","Nancy Dare"],["Tennis Court 1","Nancy Dare"],["Tennis Court 1","Ponder Stibbons"],["Tennis Court 2","Ponder Stibbons"],["Tennis Court 1","Ramnaresh Sarwin"],["Tennis Court 2","Ramnaresh Sarwin"],["Tennis Court 1","Tim Boothe"],["Tennis Court 2","Tim Boothe"],["Tennis Court 2","Tim Rownam"],["Tennis Court 1","Tim Rownam"],["Tennis Court 2","Timothy Baker"],["Tennis Court 1","Timothy Baker"],["Tennis Court 2","Tracy Smith"],["Tennis Court 1","Tracy Smith"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Court_Name","type":"\"string\"","metadata":"{}"},{"name":"Member_Name","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["Court_Name Member_Name Tennis Court 2 Anne Baker Tennis Court 1 Anne Baker Tennis Court 2 Burton Tracy Tennis Court 1 Burton Tracy Tennis Court 1 Charles Owen Tennis Court 2 Charles Owen Tennis Court 2 Darren Smith Tennis Court 2 David Farrell Tennis Court 1 David Farrell Tennis Court 2 David Jones Tennis Court 1 David Jones Tennis Court 1 David Pinker Tennis Court 1 Douglas Jones Tennis Court 1 Erica Crumpet Tennis Court 1 Florence Bader Tennis Court 2 Florence Bader Tennis Court 1 GUEST GUEST Tennis Court 2 GUEST GUEST Tennis Court 2 Gerald Butters Tennis Court 1 Gerald Butters Tennis Court 2 Henrietta Rumney Tennis Court 1 Jack Smith Tennis Court 2 Jack Smith Tennis Court 2 Janice Joplette Tennis Court 1 Janice Joplette Tennis Court 2 Jemima Farrell Tennis Court 1 Jemima Farrell Tennis Court 1 Joan Coplin Tennis Court 1 John Hunt Tennis Court 2 John Hunt Tennis Court 1 Matthew Genting Tennis Court 2 Millicent Purview Tennis Court 2 Nancy Dare Tennis Court 1 Nancy Dare Tennis Court 1 Ponder Stibbons Tennis Court 2 Ponder Stibbons Tennis Court 1 Ramnaresh Sarwin Tennis Court 2 Ramnaresh Sarwin Tennis Court 1 Tim Boothe Tennis Court 2 Tim Boothe Tennis Court 2 Tim Rownam Tennis Court 1 Tim Rownam Tennis Court 2 Timothy Baker Tennis Court 1 Timothy Baker Tennis Court 2 Tracy Smith Tennis Court 1 Tracy Smith
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q8: How can you produce a list of bookings on the day of 2012-09-14 which will cost the member (or guest) more than $30? \n\n- Remember that guests have different costs to members (the listed costs are per half-hour 'slot')\n- The guest user's ID is always 0. \n\n#### Include in your output the name of the facility, the name of the member formatted as a single column, and the cost.\n\n- Order by descending cost, and do not use any subqueries."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"eb23ed45-ca1c-46b3-9371-ccf3d2904fb9","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facilities1.name AS Facility_Name,\nCONCAT(members1.firstname, \" \", members1.surname) AS Member_Name,\nCASE WHEN bookings1.memid = 0 \nTHEN facilities1.guestcost * bookings1.slots \nELSE facilities1.membercost * bookings1.slots END AS Total_Cost \nFROM ((bookings1 \nINNER JOIN facilities1 \nON bookings1.facid = facilities1.facid) \nINNER JOIN members1 \nON bookings1.memid = members1.memid) \nWHERE bookings1.starttime LIKE \"2012-09-14%\" \nAND CASE WHEN bookings1.memid = 0 \nTHEN facilities1.guestcost * bookings1.slots > 30 \nELSE facilities1.membercost * bookings1.slots > 30 END \nORDER BY Total_Cost desc;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"3ec2175c-8f0f-45fd-ae9a-414a6fb3ce28","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Massage Room 2","GUEST GUEST",320.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Tennis Court 2","GUEST GUEST",150.0],["Tennis Court 2","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Squash Court","GUEST GUEST",70.0],["Massage Room 1","Jemima Farrell",39.6],["Squash Court","GUEST GUEST",35.0],["Squash Court","GUEST GUEST",35.0]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Facility_Name","type":"\"string\"","metadata":"{}"},{"name":"Member_Name","type":"\"string\"","metadata":"{}"},{"name":"Total_Cost","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["Facility_Name Member_Name Total_Cost Massage Room 2 GUEST GUEST 320.0 Massage Room 1 GUEST GUEST 160.0 Massage Room 1 GUEST GUEST 160.0 Massage Room 1 GUEST GUEST 160.0 Tennis Court 2 GUEST GUEST 150.0 Tennis Court 2 GUEST GUEST 75.0 Tennis Court 1 GUEST GUEST 75.0 Tennis Court 1 GUEST GUEST 75.0 Squash Court GUEST GUEST 70.0 Massage Room 1 Jemima Farrell 39.6 Squash Court GUEST GUEST 35.0 Squash Court GUEST GUEST 35.0
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q9: This time, produce the same result as in Q8, but using a subquery."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"757c6468-3d07-42e2-b2b9-59e82b96350a","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facilities1.name AS Facility_Name,\nCONCAT(members1.firstname, \" \",members1.surname) AS Member_Name,\nCASE WHEN booking.memid = 0 \nTHEN facilities1.guestcost * booking.slots \nELSE facilities1.membercost * booking.slots END AS Total_Cost \nFROM \n(((SELECT * \nFROM bookings1 \nWHERE starttime LIKE \"2012-09-14%\") AS booking \nINNER JOIN facilities1 \nON booking.facid = facilities1.facid) \nINNER JOIN members1 \nON booking.memid = members1.memid) \nWHERE CASE WHEN booking.memid = 0 THEN facilities1.guestcost * booking.slots > 30 ELSE facilities1.membercost * booking.slots > 30 END \nORDER BY Total_Cost desc;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"72f9d8b6-2d51-4af1-9fa1-d183a0369d30","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Massage Room 2","GUEST GUEST",320.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Tennis Court 2","GUEST GUEST",150.0],["Tennis Court 2","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Squash Court","GUEST GUEST",70.0],["Massage Room 1","Jemima Farrell",39.6],["Squash Court","GUEST GUEST",35.0],["Squash Court","GUEST GUEST",35.0]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Facility_Name","type":"\"string\"","metadata":"{}"},{"name":"Member_Name","type":"\"string\"","metadata":"{}"},{"name":"Total_Cost","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["Facility_Name Member_Name Total_Cost Massage Room 2 GUEST GUEST 320.0 Massage Room 1 GUEST GUEST 160.0 Massage Room 1 GUEST GUEST 160.0 Massage Room 1 GUEST GUEST 160.0 Tennis Court 2 GUEST GUEST 150.0 Tennis Court 2 GUEST GUEST 75.0 Tennis Court 1 GUEST GUEST 75.0 Tennis Court 1 GUEST GUEST 75.0 Squash Court GUEST GUEST 70.0 Massage Room 1 Jemima Farrell 39.6 Squash Court GUEST GUEST 35.0 Squash Court GUEST GUEST 35.0
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q10: Produce a list of facilities with a total revenue less than 1000.\n- The output should have facility name and total revenue, sorted by revenue. \n- Remember that there's a different cost for guests and members!"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"dc14e8de-3daa-4339-b78c-2a8d78e599d1","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facilities1.name,\nSUM(CASE WHEN bookings1.memid = 0 \nTHEN facilities1.guestcost * bookings1.slots \nELSE facilities1.membercost * bookings1.slots END) AS Total_Revenue \nFROM \n((bookings1 \nINNER JOIN facilities1 \nON bookings1.facid = facilities1.facid) \nINNER JOIN members1 \nON bookings1.memid = members1.memid) \nGROUP BY facilities1.name \nHAVING SUM(CASE WHEN bookings1.memid = 0 THEN facilities1.guestcost * bookings1.slots ELSE facilities1.membercost * bookings1.slots END) < 1000 \nORDER BY Total_Revenue;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"53422808-236b-4ebd-af5f-abc9c1bb70de","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Table Tennis",180.0],["Snooker Table",240.0],["Pool Table",270.0]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"Total_Revenue","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["name Total_Revenue Table Tennis 180.0 Snooker Table 240.0 Pool Table 270.0
"]}}],"execution_count":0},{"cell_type":"code","source":[""],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"1ff7e759-05a7-4e6f-8e4f-9cc05a74316c","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0}],"metadata":{"name":"Mini_Project_SQL_with_Spark","notebookId":1931807081501742,"application/vnd.databricks.v1+notebook":{"notebookName":"Mini_Project_SQL_with_Spark","dashboards":[],"notebookMetadata":{"pythonIndentUnit":4,"mostRecentlyExecutedCommandWithImplicitDF":{"commandId":551598812990950,"dataframes":["_sqldf"]}},"language":"python","widgets":{},"notebookOrigID":551598812990935}},"nbformat":4,"nbformat_minor":0}
diff --git a/mec-3.4.1-api-mini-project/.env b/mec-3.4.1-api-mini-project/.env
index 5d011ea41..8b1378917 100644
--- a/mec-3.4.1-api-mini-project/.env
+++ b/mec-3.4.1-api-mini-project/.env
@@ -1 +1 @@
-NASDAQ_API_KEY=KRfk96yoWvruWZ-LjPb
+
diff --git a/mec-3.4.1-api-mini-project/api_data_wrangling_mini_project.ipynb b/mec-3.4.1-api-mini-project/api_data_wrangling_mini_project.ipynb
index 0d34bd5cc..e0b34166d 100755
--- a/mec-3.4.1-api-mini-project/api_data_wrangling_mini_project.ipynb
+++ b/mec-3.4.1-api-mini-project/api_data_wrangling_mini_project.ipynb
@@ -2,147 +2,151 @@
"cells": [
{
"cell_type": "markdown",
+ "metadata": {},
"source": [
"This exercise will require you to pull some data from https://data.nasdaq.com/ (formerly Quandl API)."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {},
"source": [
"As a first step, you will need to register a free account on the https://data.nasdaq.com/ website."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {},
"source": [
- "After you register, you will be provided with a unique API key, that you should store:\r\n",
- "\r\n",
- "*Note*: Use a `.env` file and put your key in there and `python-dotenv` to access it in this notebook. \r\n",
- "\r\n",
- "The code below uses a key that was used when generating this project but has since been deleted. Never submit your keys to source control. There is a `.env-example` file in this repository to illusrtate what you need. Copy that to a file called `.env` and use your own api key in that `.env` file. Make sure you also have a `.gitignore` file with a line for `.env` added to it. \r\n",
- "\r\n",
+ "After you register, you will be provided with a unique API key, that you should store:\n",
+ "\n",
+ "*Note*: Use a `.env` file and put your key in there and `python-dotenv` to access it in this notebook. \n",
+ "\n",
+ "The code below uses a key that was used when generating this project but has since been deleted. Never submit your keys to source control. There is a `.env-example` file in this repository to illusrtate what you need. Copy that to a file called `.env` and use your own api key in that `.env` file. Make sure you also have a `.gitignore` file with a line for `.env` added to it. \n",
+ "\n",
"The standard Python gitignore is [here](https://github.com/github/gitignore/blob/master/Python.gitignore) you can just copy that. "
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": 5,
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "7MadrSm5uJz-31r7rF4z\n"
+ ]
+ }
+ ],
"source": [
"# get api key from your .env file\n",
"import os\n",
"from dotenv import load_dotenv # if missing this module, simply run `pip install python-dotenv`\n",
"\n",
"load_dotenv()\n",
- "API_KEY = os.getenv('NASDAQ_API_KEY')\n",
+ "API_KEY = os.getenv('API_KEY')\n",
"\n",
"print(API_KEY)"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "KRfk96yoWvruWZ-LjPbo\n"
- ]
- }
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {},
"source": [
"Nasdaq Data has a large number of data sources, but, unfortunately, most of them require a Premium subscription. Still, there are also a good number of free datasets."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {},
"source": [
"For this mini project, we will focus on equities data from the Frankfurt Stock Exhange (FSE), which is available for free. We'll try and analyze the stock prices of a company called Carl Zeiss Meditec, which manufactures tools for eye examinations, as well as medical lasers for laser eye surgery: https://www.zeiss.com/meditec/int/home.html. The company is listed under the stock ticker AFX_X."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {},
"source": [
"You can find the detailed Nasdaq Data API instructions here: https://docs.data.nasdaq.com/docs/in-depth-usage"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {},
"source": [
"While there is a dedicated Python package for connecting to the Nasdaq API, we would prefer that you use the *requests* package, which can be easily downloaded using *pip* or *conda*. You can find the documentation for the package here: http://docs.python-requests.org/en/master/ "
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {},
"source": [
"Finally, apart from the *requests* package, you are encouraged to not use any third party Python packages, such as *pandas*, and instead focus on what's available in the Python Standard Library (the *collections* module might come in handy: https://pymotw.com/3/collections/).\n",
"Also, since you won't have access to DataFrames, you are encouraged to us Python's native data structures - preferably dictionaries, though some questions can also be answered using lists.\n",
"You can read more on these data structures here: https://docs.python.org/3/tutorial/datastructures.html"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {},
"source": [
"Keep in mind that the JSON responses you will be getting from the API map almost one-to-one to Python's dictionaries. Unfortunately, they can be very nested, so make sure you read up on indexing dictionaries in the documentation provided above."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": 6,
- "source": [
- "# First, import the relevant modules"
- ],
+ "execution_count": 4,
+ "metadata": {},
"outputs": [],
- "metadata": {}
+ "source": [
+ "import requests"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {},
"source": [
- "Note: API's can change a bit with each version, for this exercise it is reccomended to use the nasdaq api at `https://data.nasdaq.com/api/v3/`. This is the same api as what used to be quandl so `https://www.quandl.com/api/v3/` should work too.\r\n",
- "\r\n",
+ "Note: API's can change a bit with each version, for this exercise it is reccomended to use the nasdaq api at `https://data.nasdaq.com/api/v3/`. This is the same api as what used to be quandl so `https://www.quandl.com/api/v3/` should work too.\n",
+ "\n",
"Hint: We are looking for the `AFX_X` data on the `datasets/FSE/` dataset."
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "source": [
- "# Now, call the Nasdaq API and pull out a small sample of the data (only one day) to get a glimpse\n",
- "# into the JSON structure that will be returned"
- ],
- "outputs": [],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": 9,
- "source": [
- "# Inspect the JSON structure of the object you created, and take note of how nested it is,\n",
- "# as well as the overall structure"
- ],
+ "execution_count": 50,
+ "metadata": {},
"outputs": [
{
- "output_type": "stream",
"name": "stdout",
+ "output_type": "stream",
"text": [
- "{'dataset': {'id': 10095370, 'dataset_code': 'AFX_X', 'database_code': 'FSE', 'name': 'Carl Zeiss Meditec (AFX_X)', 'description': 'Stock Prices for Carl Zeiss Meditec (2020-11-02) from the Frankfurt Stock Exchange. Trading System: Xetra ISIN: DE0005313704', 'refreshed_at': '2020-12-01T14:48:09.907Z', 'newest_available_date': '2020-12-01', 'oldest_available_date': '2000-06-07', 'column_names': ['Date', 'Open', 'High', 'Low', 'Close', 'Change', 'Traded Volume', 'Turnover', 'Last Price of the Day', 'Daily Traded Units', 'Daily Turnover'], 'frequency': 'daily', 'type': 'Time Series', 'premium': False, 'limit': None, 'transform': None, 'column_index': None, 'start_date': '2021-01-03', 'end_date': '2020-12-01', 'data': [], 'collapse': None, 'order': None, 'database_id': 6129}}\n"
+ "\n"
]
}
],
- "metadata": {}
+ "source": [
+ "#1. Collect data from the Franfurt Stock Exchange, for the ticker AFX_X, for the whole year 2017 (keep in mind that the date format is YYYY-MM-DD).\n",
+ "fse = requests.get(f'https://data.nasdaq.com/api/v3/datasets/FSE/AFX_X.json?api_key={API_KEY}')\n",
+ "fse = requests.get(f'https://data.nasdaq.com/api/v3/datasets/FSE/VNA_X?start_date=2017-01-01&end_date=2017-12-31&api_key={API_KEY}')\n",
+ "#2. Convert the returned JSON object into a Python dictionary.\n",
+ "json = fse.json()\n",
+ "print(type(json))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 79,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": []
},
{
"cell_type": "markdown",
+ "metadata": {},
"source": [
"These are your tasks for this mini project:\n",
"\n",
@@ -153,28 +157,449 @@
"5. What was the largest change between any two days (based on Closing Price)?\n",
"6. What was the average daily trading volume during this year?\n",
"7. (Optional) What was the median trading volume during this year. (Note: you may need to implement your own function for calculating the median.)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 58,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "['Date', 'Open', 'High', 'Low', 'Close', 'Change', 'Traded Volume', 'Turnover', 'Last Price of the Day', 'Daily Traded Units', 'Daily Turnover']\n"
+ ]
+ }
],
- "metadata": {}
+ "source": [
+ "print(json['dataset']['column_names'])"
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
- "source": [],
+ "execution_count": 81,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[['2017-12-29', 41.225, 41.425, 41.145, 41.39, None, 601057.0, 24840221.0, None, None, None], ['2017-12-28', 41.3, 41.34, 41.095, 41.22, None, 608053.0, 25062545.0, None, None, None], ['2017-12-27', 41.01, 41.335, 40.815, 41.335, None, 732911.0, 30168070.0, None, None, None], ['2017-12-22', 40.64, 40.975, 40.585, 40.975, None, 843468.0, 34444774.0, None, None, None], ['2017-12-21', 41.085, 41.1, 40.565, 40.7, None, 1384516.0, 56441896.0, None, None, None], ['2017-12-20', 41.715, 41.895, 40.935, 41.035, None, 1411562.0, 58322057.0, None, None, None], ['2017-12-19', 41.95, 42.215, 41.625, 41.64, None, 1314959.0, 55010330.0, None, None, None], ['2017-12-18', 41.5, 42.05, 40.92, 41.88, None, 2098187.0, 87316028.0, None, None, None], ['2017-12-15', 40.72, 41.35, 40.68, 41.35, None, 2733044.0, 112478446.0, None, None, None], ['2017-12-14', 40.79, 41.0, 40.63, 40.94, None, 1243035.0, 50820573.0, None, None, None], ['2017-12-13', 41.205, 41.22, 40.81, 40.845, None, 1036110.0, 42434183.0, None, None, None], ['2017-12-12', 41.36, 41.435, 40.845, 41.08, None, 1477381.0, 60712041.0, None, None, None], ['2017-12-11', 41.175, 41.355, 41.04, 41.125, None, 1043727.0, 42973562.0, None, None, None], ['2017-12-08', 41.13, 41.45, 40.99, 41.22, None, 1289642.0, 53150892.0, None, None, None], ['2017-12-07', 40.74, 41.23, 40.57, 40.84, None, 1376033.0, 56276968.0, None, None, None], ['2017-12-06', 40.54, 40.64, 40.165, 40.6, None, 1205912.0, 48764021.0, None, None, None], ['2017-12-05', 40.05, 40.775, 39.97, 40.665, None, 1948545.0, 78968966.0, None, None, None], ['2017-12-04', 39.79, 40.11, 39.585, 39.77, None, 1321182.0, 52627362.0, None, None, None], ['2017-12-01', 39.545, 39.845, 39.35, 39.44, None, 1495815.0, 59136311.0, None, None, None], ['2017-11-30', 38.95, 39.7, 38.915, 39.545, None, 3040962.0, 120157342.0, None, None, None], ['2017-11-29', 39.95, 40.025, 38.91, 39.025, None, 1493605.0, 58739775.0, None, None, None], ['2017-11-28', 39.825, 39.905, 39.5, 39.7, None, 1018883.0, 40423811.0, None, None, None], ['2017-11-27', 39.7, 40.04, 39.63, 39.8, None, 1095660.0, 43654747.0, None, None, None], ['2017-11-24', 39.595, 39.92, 39.355, 39.65, None, 999933.0, 39641078.0, None, None, None], ['2017-11-23', 39.275, 39.62, 39.21, 39.52, None, 1161907.0, 45849993.0, None, None, None], ['2017-11-22', 39.955, 40.03, 39.365, 39.365, None, 985055.0, 39043700.0, None, None, None], ['2017-11-21', 39.445, 40.23, 39.33, 39.93, None, 1531366.0, 61107977.0, None, None, None], ['2017-11-20', 39.245, 39.43, 39.14, 39.3, None, 1057105.0, 41546170.0, None, None, None], ['2017-11-17', 39.7, 39.79, 39.375, 39.375, None, 1048127.0, 41459027.0, None, None, None], ['2017-11-16', 39.45, 39.785, 39.3, 39.705, None, 1010748.0, 40046253.0, None, None, None], ['2017-11-15', 39.495, 39.495, 38.77, 39.265, None, 1408279.0, 55104780.0, None, None, None], ['2017-11-14', 39.53, 39.63, 39.285, 39.5, None, 1103999.0, 43589611.0, None, None, None], ['2017-11-13', 39.26, 39.465, 39.01, 39.43, None, 1423857.0, 55975814.0, None, None, None], ['2017-11-10', 39.24, 39.445, 38.935, 39.145, None, 2133301.0, 83564952.0, None, None, None], ['2017-11-09', 39.245, 39.29, 38.805, 39.115, None, 1762497.0, 68815373.0, None, None, None], ['2017-11-08', 39.22, 39.54, 38.205, 39.165, None, 1841312.0, 72049610.0, None, None, None], ['2017-11-07', 39.0, 39.165, 38.73, 39.05, None, 1378274.0, 53770377.0, None, None, None], ['2017-11-06', 38.8, 38.975, 38.68, 38.81, None, 773139.0, 30002877.0, None, None, None], ['2017-11-03', 38.72, 38.815, 38.55, 38.73, None, 1147296.0, 44421907.0, None, None, None], ['2017-11-02', 38.165, 38.69, 38.145, 38.49, None, 1744271.0, 67156177.0, None, None, None], ['2017-11-01', 38.1, 38.315, 37.54, 38.2, None, 1810355.0, 69008167.0, None, None, None], ['2017-10-30', 37.32, 37.76, 37.25, 37.76, None, 1259359.0, 47379195.0, None, None, None], ['2017-10-27', 36.9, 37.33, 36.79, 37.18, None, 1362626.0, 50636915.0, None, None, None], ['2017-10-26', 36.425, 36.775, 36.355, 36.67, None, 1773417.0, 64876374.0, None, None, None], ['2017-10-25', 36.5, 36.54, 36.21, 36.28, None, 1146901.0, 41641699.0, None, None, None], ['2017-10-24', 36.755, 37.1, 36.515, 36.64, None, 1182044.0, 43411836.0, None, None, None], ['2017-10-23', 37.63, 37.635, 36.855, 36.875, None, 1281333.0, 47497515.0, None, None, None], ['2017-10-20', 37.93, 37.96, 37.27, 37.45, None, 1180003.0, 44288477.0, None, None, None], ['2017-10-19', 37.885, 38.13, 37.55, 37.77, None, 1481919.0, 55992823.0, None, None, None], ['2017-10-18', 37.285, 37.835, 37.2, 37.685, None, 1179431.0, 44414462.0, None, None, None], ['2017-10-17', 37.19, 37.25, 36.99, 37.125, None, 959055.0, 35623810.0, None, None, None], ['2017-10-16', 36.91, 37.145, 36.695, 37.12, None, 796432.0, 29483832.0, None, None, None], ['2017-10-13', 36.93, 36.955, 36.665, 36.77, None, 1030081.0, 37900999.0, None, None, None], ['2017-10-12', 36.5, 36.995, 36.425, 36.88, None, 1131777.0, 41704147.0, None, None, None], ['2017-10-11', 36.28, 36.49, 36.095, 36.45, None, 741342.0, 26962219.0, None, None, None], ['2017-10-10', 36.36, 36.5, 36.08, 36.31, None, 1184571.0, 43063747.0, None, None, None], ['2017-10-09', 36.045, 36.145, 35.895, 36.125, None, 798410.0, 28795537.0, None, None, None], ['2017-10-06', 36.55, 36.585, 35.885, 35.97, None, 1780610.0, 64285160.0, None, None, None], ['2017-10-05', 36.82, 36.84, 36.35, 36.56, None, 1088678.0, 39791298.0, None, None, None], ['2017-10-04', 36.435, 36.945, 36.135, 36.745, None, 2021500.0, 74201081.0, None, None, None], ['2017-10-02', 36.17, 36.325, 36.0, 36.235, None, 981385.0, 35516802.0, None, None, None], ['2017-09-29', 35.845, 36.01, 35.75, 36.0, None, 1574374.0, 56587390.0, None, None, None], ['2017-09-28', 35.85, 35.885, 35.455, 35.805, None, 1391082.0, 49641573.0, None, None, None], ['2017-09-27', 36.4, 36.4, 35.675, 35.7, None, 1265947.0, 45369154.0, None, None, None], ['2017-09-26', 35.82, 36.515, 35.805, 36.25, None, 1461266.0, 53049635.0, None, None, None], ['2017-09-25', 35.505, 35.93, 35.45, 35.795, None, 864934.0, 30942056.0, None, None, None], ['2017-09-22', 35.41, 35.66, 35.32, 35.505, None, 848527.0, 30112639.0, None, None, None], ['2017-09-21', 35.815, 35.815, 35.35, 35.45, None, 1169809.0, 41477784.0, None, None, None], ['2017-09-20', 35.91, 36.01, 35.7, 35.91, None, 772531.0, 27712890.0, None, None, None], ['2017-09-19', 36.215, 36.26, 35.915, 35.97, None, 867630.0, 31271571.0, None, None, None], ['2017-09-18', 36.635, 36.69, 36.2, 36.22, None, 897147.0, 32668669.0, None, None, None], ['2017-09-15', 36.5, 36.52, 36.125, 36.435, None, 3900972.0, 142003486.0, None, None, None], ['2017-09-14', 36.395, 36.625, 36.275, 36.41, None, 1326481.0, 48313680.0, None, None, None], ['2017-09-13', 36.365, 36.575, 36.25, 36.48, None, 1222826.0, 44582558.0, None, None, None], ['2017-09-12', 36.48, 36.595, 36.36, 36.475, None, 1132496.0, 41307182.0, None, None, None], ['2017-09-11', 36.375, 36.52, 36.255, 36.4, None, 873615.0, 31805844.0, None, None, None], ['2017-09-08', 35.935, 36.305, 35.935, 36.195, None, 980317.0, 35462482.0, None, None, None], ['2017-09-07', 35.8, 35.995, 35.585, 35.935, None, 1244383.0, 44616052.0, None, None, None], ['2017-09-06', 35.0, 35.675, 34.925, 35.55, None, 1262464.0, 44764575.0, None, None, None], ['2017-09-05', 35.425, 35.455, 35.01, 35.195, None, 843906.0, 29727013.0, None, None, None], ['2017-09-04', 35.13, 35.385, 35.1, 35.3, None, 556925.0, 19645220.0, None, None, None], ['2017-09-01', 35.605, 35.685, 35.36, 35.42, None, 786143.0, 27887648.0, None, None, None], ['2017-08-31', 35.435, 35.69, 35.435, 35.505, None, 1190945.0, 42323783.0, None, None, None], ['2017-08-30', 35.16, 35.5, 35.005, 35.285, None, 1117782.0, 39481349.0, None, None, None], ['2017-08-29', 34.885, 35.07, 34.76, 34.945, None, 1153084.0, 40294995.0, None, None, None], ['2017-08-28', 35.02, 35.095, 34.765, 35.035, None, 575476.0, 20128131.0, None, None, None], ['2017-08-25', 35.25, 35.31, 34.985, 35.065, None, 978790.0, 34347195.0, None, None, None], ['2017-08-24', 35.25, 35.525, 35.21, 35.21, None, 808758.0, 28566900.0, None, None, None], ['2017-08-23', 35.445, 35.49, 35.185, 35.265, None, 762435.0, 26899770.0, None, None, None], ['2017-08-22', 35.45, 35.605, 35.4, 35.405, None, 765764.0, 27158693.0, None, None, None], ['2017-08-21', 35.115, 35.36, 34.97, 35.305, None, 778364.0, 27429654.0, None, None, None], ['2017-08-18', 35.305, 35.48, 35.19, 35.255, None, 1156950.0, 40858779.0, None, None, None], ['2017-08-17', 35.425, 35.81, 35.425, 35.49, None, 948061.0, 33743508.0, None, None, None], ['2017-08-16', 35.73, 35.925, 35.41, 35.47, None, 912185.0, 32414293.0, None, None, None], ['2017-08-15', 35.76, 35.885, 35.445, 35.605, None, 1031791.0, 36719404.0, None, None, None], ['2017-08-14', 35.275, 35.89, 35.2, 35.7, None, 1268298.0, 45247331.0, None, None, None], ['2017-08-11', 35.5, 35.64, 34.88, 35.09, None, 1376672.0, 48398546.0, None, None, None], ['2017-08-10', 35.64, 35.7, 35.315, 35.55, None, 954629.0, 33905737.0, None, None, None], ['2017-08-09', 35.54, 35.81, 35.415, 35.64, None, 1032164.0, 36780128.0, None, None, None], ['2017-08-08', 35.565, 35.79, 35.405, 35.71, None, 813306.0, 28992490.0, None, None, None], ['2017-08-07', 36.0, 36.0, 35.4, 35.625, None, 993748.0, 35384039.0, None, None, None], ['2017-08-04', 35.58, 36.005, 35.56, 35.9, None, 1048069.0, 37610121.0, None, None, None], ['2017-08-03', 35.4, 35.845, 35.335, 35.585, None, 1232461.0, 43873041.0, None, None, None], ['2017-08-02', 35.5, 35.77, 35.23, 35.44, None, 2159905.0, 76652742.0, None, None, None], ['2017-08-01', 34.185, 35.03, 34.185, 35.015, None, 1428589.0, 49719043.0, None, None, None], ['2017-07-31', 34.525, 34.625, 34.255, 34.255, None, 1226265.0, 42149590.0, None, None, None], ['2017-07-28', 34.91, 34.91, 34.32, 34.56, None, 1291761.0, 44621693.0, None, None, None], ['2017-07-27', 34.48, 35.345, 34.48, 35.005, None, 1345183.0, 47108133.0, None, None, None], ['2017-07-26', 34.315, 34.55, 34.19, 34.49, None, 1100747.0, 37890368.0, None, None, None], ['2017-07-25', 34.465, 34.57, 34.275, 34.39, None, 831703.0, 28627354.0, None, None, None], ['2017-07-24', 34.475, 34.705, 34.295, 34.38, None, 1065404.0, 36708290.0, None, None, None], ['2017-07-21', 34.42, 34.65, 34.275, 34.465, None, 1219659.0, 42017779.0, None, None, None], ['2017-07-20', 34.67, 34.845, 34.38, 34.43, None, 1120392.0, 38692096.0, None, None, None], ['2017-07-19', 34.585, 34.675, 34.425, 34.555, None, 933429.0, 32231381.0, None, None, None], ['2017-07-18', 34.575, 34.78, 34.48, 34.595, None, 1178023.0, 40774008.0, None, None, None], ['2017-07-17', 35.05, 35.125, 34.55, 34.65, None, 994306.0, 34508852.0, None, None, None], ['2017-07-14', 34.9, 35.175, 34.715, 35.01, None, 1164391.0, 40704830.0, None, None, None], ['2017-07-13', 34.87, 34.995, 34.7, 34.84, None, 909972.0, 31695835.0, None, None, None], ['2017-07-12', 34.16, 34.92, 34.13, 34.815, None, 1203034.0, 41655881.0, None, None, None], ['2017-07-11', 34.59, 34.59, 33.96, 34.17, None, 1193468.0, 40775450.0, None, None, None], ['2017-07-10', 34.145, 34.585, 34.145, 34.43, None, 1113906.0, 38372776.0, None, None, None], ['2017-07-07', 34.07, 34.08, 33.78, 33.95, None, 1155041.0, 39171093.0, None, None, None], ['2017-07-06', 34.525, 34.59, 33.74, 34.1, None, 1661288.0, 56571459.0, None, None, None], ['2017-07-05', 34.34, 34.45, 34.055, 34.445, None, 907138.0, 31133385.0, None, None, None], ['2017-07-04', 34.415, 34.54, 34.25, 34.405, None, 929186.0, 31955883.0, None, None, None], ['2017-07-03', 34.895, 34.91, 34.3, 34.53, None, 1081145.0, 37334057.0, None, None, None], ['2017-06-30', 34.49, 34.83, 34.205, 34.765, None, 1743689.0, 60419168.0, None, None, None], ['2017-06-29', 34.92, 35.215, 34.215, 34.545, None, 1702233.0, 58941623.0, None, None, None], ['2017-06-28', 34.96, 35.15, 34.6, 34.8, None, 1352427.0, 47155417.0, None, None, None], ['2017-06-27', 35.5, 35.59, 35.04, 35.125, None, 971323.0, 34191541.0, None, None, None], ['2017-06-26', 35.64, 35.78, 35.45, 35.5, None, 711806.0, 25324155.0, None, None, None], ['2017-06-23', 35.375, 35.74, 35.365, 35.6, None, 605987.0, 21578659.0, None, None, None], ['2017-06-22', 35.52, 35.555, 34.55, 35.4, None, 1133172.0, 40077360.0, None, None, None], ['2017-06-21', 35.995, 35.995, 35.42, 35.5, None, 1109333.0, 39488323.0, None, None, None], ['2017-06-20', 36.38, 36.385, 35.86, 35.965, None, 1102387.0, 39733201.0, None, None, None], ['2017-06-19', 36.45, 36.565, 36.155, 36.27, None, 751509.0, 27310028.0, None, None, None], ['2017-06-16', 36.285, 36.495, 36.035, 36.35, None, 2708923.0, 98420026.0, None, None, None], ['2017-06-15', 36.345, 36.7, 36.125, 36.185, None, 1394942.0, 50666215.0, None, None, None], ['2017-06-14', 35.95, 36.47, 35.81, 36.275, None, 1474349.0, 53508257.0, None, None, None], ['2017-06-13', 35.775, 36.085, 35.65, 35.83, None, 899809.0, 32285012.0, None, None, None], ['2017-06-12', 35.905, 36.0, 35.405, 35.61, None, 964494.0, 34340062.0, None, None, None], ['2017-06-09', 35.8, 36.03, 35.67, 35.995, None, 855293.0, 30725496.0, None, None, None], ['2017-06-08', 36.26, 36.43, 35.855, 35.89, None, 910894.0, 32860828.0, None, None, None], ['2017-06-07', 36.165, 36.56, 36.06, 36.28, None, 1522772.0, 55358099.0, None, None, None], ['2017-06-06', 35.87, 36.25, 35.535, 36.18, None, 2249405.0, 80988781.0, None, None, None], ['2017-06-02', 35.4, 35.915, 35.32, 35.905, None, 1509212.0, 53890609.0, None, None, None], ['2017-06-01', 34.805, 35.625, 34.635, 35.4, None, 2670943.0, 94268991.0, None, None, None], ['2017-05-31', 34.97, 35.275, 34.895, 34.975, None, 1606543.0, 56280617.0, None, None, None], ['2017-05-30', 34.815, 35.245, 34.745, 35.03, None, 1040767.0, 36479165.0, None, None, None], ['2017-05-29', 35.05, 35.065, 34.77, 34.85, None, 479585.0, 16721712.0, None, None, None], ['2017-05-26', 34.99, 35.28, 34.81, 34.98, None, 1060107.0, 37132136.0, None, None, None], ['2017-05-25', 35.1, 35.25, 34.835, 34.965, None, 744523.0, 26040310.0, None, None, None], ['2017-05-24', 34.955, 35.015, 34.305, 34.92, None, 1710780.0, 59474808.0, None, None, None], ['2017-05-23', 34.83, 34.87, 34.28, 34.495, None, 1399576.0, 48264736.0, None, None, None], ['2017-05-22', 34.39, 34.94, 34.38, 34.825, None, 1222210.0, 42488151.0, None, None, None], ['2017-05-19', 34.37, 34.73, 34.27, 34.35, None, 1819807.0, 62695579.0, None, None, None], ['2017-05-18', 34.45, 34.82, 34.185, 34.34, None, 2155789.0, 74159697.0, None, None, None], ['2017-05-17', 34.57, 34.78, 34.255, 34.66, None, 1915980.0, 66259874.0, None, None, None], ['2017-05-16', 35.655, 35.965, 35.54, 35.865, None, 1829267.0, 65483151.0, None, None, None], ['2017-05-15', 36.1, 36.185, 35.445, 35.655, None, 1621456.0, 57963632.0, None, None, None], ['2017-05-12', 35.55, 35.95, 35.505, 35.95, None, 2399764.0, 86010930.0, None, None, None], ['2017-05-11', 34.82, 35.765, 34.78, 35.55, None, 3126178.0, 110720040.0, None, None, None], ['2017-05-10', 34.65, 34.785, 34.415, 34.785, None, 1419844.0, 49222455.0, None, None, None], ['2017-05-09', 34.48, 34.97, 34.35, 34.64, None, 1750992.0, 60807785.0, None, None, None], ['2017-05-08', 34.15, 34.775, 34.15, 34.47, None, 1968354.0, 67951441.0, None, None, None], ['2017-05-05', 33.83, 34.145, 33.675, 34.085, None, 1485165.0, 50496085.0, None, None, None], ['2017-05-04', 33.685, 33.99, 33.42, 33.835, None, 1603664.0, 54226513.0, None, None, None], ['2017-05-03', 33.595, 33.765, 33.465, 33.59, None, 1248177.0, 41940157.0, None, None, None], ['2017-05-02', 33.39, 33.64, 33.045, 33.57, None, 2020909.0, 67578132.0, None, None, None], ['2017-04-28', 33.965, 33.965, 33.125, 33.235, None, 2250037.0, 74945444.0, None, None, None], ['2017-04-27', 33.75, 34.0, 33.615, 33.85, None, 1189265.0, 40232795.0, None, None, None], ['2017-04-26', 33.775, 33.985, 33.44, 33.89, None, 1173019.0, 39739022.0, None, None, None], ['2017-04-25', 33.43, 34.135, 33.33, 33.89, None, 1668864.0, 56484780.0, None, None, None], ['2017-04-24', 33.505, 33.61, 33.225, 33.38, None, 2505690.0, 83631889.0, None, None, None], ['2017-04-21', 33.645, 33.755, 33.195, 33.505, None, 1830004.0, 61197812.0, None, None, None], ['2017-04-20', 34.125, 34.175, 33.66, 33.7, None, 1464324.0, 49521743.0, None, None, None], ['2017-04-19', 34.315, 34.34, 33.98, 34.16, None, 1497935.0, 51127941.0, None, None, None], ['2017-04-18', 34.42, 34.585, 34.105, 34.375, None, 1456744.0, 49993484.0, None, None, None], ['2017-04-13', 34.3, 34.55, 34.25, 34.55, None, 1130793.0, 39005532.0, None, None, None], ['2017-04-12', 34.155, 34.525, 34.1, 34.355, None, 1418600.0, 48742545.0, None, None, None], ['2017-04-11', 33.905, 34.175, 33.89, 34.15, None, 1334812.0, 45501164.0, None, None, None], ['2017-04-10', 33.9, 33.95, 33.605, 33.945, None, 976008.0, 33022431.0, None, None, None], ['2017-04-07', 33.555, 33.89, 33.55, 33.81, None, 1245354.0, 42058867.0, None, None, None], ['2017-04-06', 33.44, 33.88, 33.44, 33.655, None, 1279422.0, 43062063.0, None, None, None], ['2017-04-05', 33.595, 33.67, 33.405, 33.51, None, 1247907.0, 41841367.0, None, None, None], ['2017-04-04', 33.14, 33.625, 33.12, 33.52, None, 1412273.0, 47301542.0, None, None, None], ['2017-04-03', 33.2, 33.2, 32.95, 33.185, None, 1098924.0, 36377790.0, None, None, None], ['2017-03-31', 32.585, 33.045, 32.55, 33.03, None, 1400053.0, 46049974.0, None, None, None], ['2017-03-30', 32.8, 32.805, 32.555, 32.715, None, 1138883.0, 37212643.0, None, None, None], ['2017-03-29', 32.45, 32.765, 32.295, 32.765, None, 1366373.0, 44540644.0, None, None, None], ['2017-03-28', 32.67, 32.765, 32.32, 32.46, None, 1358644.0, 44132290.0, None, None, None], ['2017-03-27', 32.6, 32.77, 32.405, 32.595, None, 990944.0, 32283093.0, None, None, None], ['2017-03-24', 32.52, 32.76, 32.485, 32.68, None, 948258.0, 30996490.0, None, None, None], ['2017-03-23', 32.445, 32.59, 32.32, 32.59, None, 1372593.0, 44624296.0, None, None, None], ['2017-03-22', 32.395, 32.61, 32.395, 32.445, None, 1331064.0, 43221125.0, None, None, None], ['2017-03-21', 32.58, 32.58, 32.4, 32.505, None, 1034376.0, 33593372.0, None, None, None], ['2017-03-20', 32.645, 32.745, 32.52, 32.55, None, 933804.0, 30443315.0, None, None, None], ['2017-03-17', 32.525, 32.745, 32.38, 32.71, None, 2486474.0, 81149012.0, None, None, None], ['2017-03-16', 32.49, 32.65, 32.43, 32.585, None, 2091661.0, 68106355.0, None, None, None], ['2017-03-15', 32.27, 32.27, 31.985, 32.215, None, 1154450.0, 37132114.0, None, None, None], ['2017-03-14', 32.025, 32.29, 31.92, 32.185, None, 1367474.0, 43883777.0, None, None, None], ['2017-03-13', 32.0, 32.28, 32.0, 32.1, None, 1344172.0, 43149844.0, None, None, None], ['2017-03-10', 32.765, 32.765, 32.005, 32.05, None, 2805533.0, 90397453.0, None, None, None], ['2017-03-09', 32.55, 32.87, 32.43, 32.655, None, 1521825.0, 49731230.0, None, None, None], ['2017-03-08', 32.59, 32.59, 32.27, 32.55, None, 1583434.0, 51386372.0, None, None, None], ['2017-03-07', 32.8, 33.18, 32.435, 32.62, None, 2264252.0, 74099675.0, None, None, None], ['2017-03-06', 32.665, 32.905, 32.575, 32.895, None, 1485150.0, 48649782.0, None, None, None], ['2017-03-03', 33.025, 33.04, 32.66, 32.82, None, 1780099.0, 58419579.0, None, None, None], ['2017-03-02', 33.095, 33.15, 32.89, 33.1, None, 1392995.0, 46026980.0, None, None, None], ['2017-03-01', 33.0, 33.09, 32.81, 33.06, None, 1587643.0, 52361412.0, None, None, None], ['2017-02-28', 33.025, 33.18, 32.625, 32.89, None, 2165586.0, 71214194.0, None, None, None], ['2017-02-27', 33.31, 33.315, 32.87, 33.04, None, 1242571.0, 41015807.0, None, None, None], ['2017-02-24', 33.35, 33.46, 33.085, 33.25, None, 1569880.0, 52174520.0, None, None, None], ['2017-02-23', 33.345, 33.405, 33.15, 33.35, None, 1112515.0, 37041481.0, None, None, None], ['2017-02-22', 33.0, 33.485, 32.97, 33.31, None, 1981963.0, 65992846.0, None, None, None], ['2017-02-21', 32.78, 33.075, 32.75, 32.95, None, 1321345.0, 43526332.0, None, None, None], ['2017-02-20', 33.0, 33.05, 32.835, 32.86, None, 701112.0, 23076010.0, None, None, None], ['2017-02-17', 32.83, 33.065, 32.62, 32.88, None, 1609227.0, 52901994.0, None, None, None], ['2017-02-16', 32.76, 32.94, 32.6, 32.865, None, 1368754.0, 44911948.0, None, None, None], ['2017-02-15', 32.615, 32.885, 32.535, 32.86, None, 1999129.0, 65480253.0, None, None, None], ['2017-02-14', 32.35, 32.515, 32.145, 32.505, None, 1588415.0, 51460458.0, None, None, None], ['2017-02-13', 32.355, 32.395, 32.15, 32.18, None, 1471864.0, 47486838.0, None, None, None], ['2017-02-10', 32.255, 32.29, 32.06, 32.29, None, 1471607.0, 47422355.0, None, None, None], ['2017-02-09', 32.365, 32.385, 31.935, 32.2, None, 2312232.0, 74509222.0, None, None, None], ['2017-02-08', 31.32, 32.32, 31.215, 32.2, None, 2809351.0, 89874852.0, None, None, None], ['2017-02-07', 30.67, 31.375, 30.59, 31.355, None, 1724119.0, 53663330.0, None, None, None], ['2017-02-06', 30.665, 30.835, 30.55, 30.58, None, 1514442.0, 46432461.0, None, None, None], ['2017-02-03', 30.795, 31.015, 30.66, 30.78, None, 1105674.0, 34038970.0, None, None, None], ['2017-02-02', 30.765, 30.915, 30.605, 30.665, None, 1362135.0, 41892577.0, None, None, None], ['2017-02-01', 30.63, 30.95, 30.505, 30.755, None, 1953175.0, 60079315.0, None, None, None], ['2017-01-31', 30.01, 30.365, 29.905, 30.27, None, 1571973.0, 47483499.0, None, None, None], ['2017-01-30', 30.05, 30.19, 29.745, 30.06, None, 1261362.0, 37810228.0, None, None, None], ['2017-01-27', 30.045, 30.08, 29.73, 30.015, None, 1361939.0, 40718988.0, None, None, None], ['2017-01-26', 30.07, 30.12, 29.81, 29.985, None, 1881781.0, 56384464.0, None, None, None], ['2017-01-25', 30.11, 30.145, 29.825, 29.955, None, 1918842.0, 57497812.0, None, None, None], ['2017-01-24', 30.46, 30.515, 30.05, 30.05, None, 1782575.0, 53883315.0, None, None, None], ['2017-01-23', 30.445, 30.555, 30.15, 30.385, None, 1923439.0, 58375436.0, None, None, None], ['2017-01-20', 30.42, 30.54, 30.005, 30.38, None, 1727697.0, 52422042.0, None, None, None], ['2017-01-19', 30.955, 30.955, 30.345, 30.45, None, 1737477.0, 53047844.0, None, None, None], ['2017-01-18', 30.95, 31.065, 30.775, 30.995, None, 1194228.0, 36964666.0, None, None, None], ['2017-01-17', 31.165, 31.17, 30.79, 30.9, None, 1208729.0, 37359507.0, None, None, None], ['2017-01-16', 30.93, 31.35, 30.93, 31.19, None, 920478.0, 28727634.0, None, None, None], ['2017-01-13', 31.215, 31.38, 30.88, 30.95, None, 1134887.0, 35256121.0, None, None, None], ['2017-01-12', 31.4, 31.44, 31.095, 31.16, None, 1269233.0, 39595118.0, None, None, None], ['2017-01-11', 31.39, 31.535, 31.255, 31.4, None, 1181293.0, 37089422.0, None, None, None], ['2017-01-10', 31.09, 31.525, 31.02, 31.475, None, 1241573.0, 38995926.0, None, None, None], ['2017-01-09', 31.38, 31.475, 31.025, 31.025, None, 1007257.0, 31355905.0, None, None, None], ['2017-01-06', 31.425, 31.75, 31.175, 31.25, None, 1236453.0, 38846256.0, None, None, None], ['2017-01-05', 31.065, 31.45, 31.03, 31.405, None, 1652789.0, 51686972.0, None, None, None], ['2017-01-04', 30.85, 30.96, 30.44, 30.8, None, 1265640.0, 38936241.0, None, None, None], ['2017-01-03', 31.48, 31.48, 30.745, 30.8, None, 1613584.0, 49922671.0, None, None, None], ['2017-01-02', 31.05, 31.48, 30.865, 31.35, None, 574317.0, 17953577.0, None, None, None]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "dataset = json['dataset']['data']\n",
+ "print(dataset)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 88,
+ "metadata": {},
"outputs": [],
- "metadata": {}
+ "source": [
+ "#3. Calculate what the highest and lowest opening prices were for the stock in this period.\n",
+ "open_prices = {}\n",
+ "for data in dataset:\n",
+ " open_prices[data[0]] = data[1]\n",
+ "open_max = max(open_prices, key=open_prices.get), ':', max(open_prices.values())\n",
+ "open_min = min(open_prices, key=open_prices.get), ':', min(open_prices.values())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 89,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "('2017-12-19', ':', 41.95)\n",
+ "('2017-01-31', ':', 30.01)\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(open_max)\n",
+ "print(open_min)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 92,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "('2017-11-08', ':', 1.3350000000000009)\n"
+ ]
+ }
+ ],
+ "source": [
+ "#4. What was the largest change in any one day (based on High and Low price)?\n",
+ "change_in_day = {}\n",
+ "for data in dataset:\n",
+ " change_in_day[data[0]] = data[2] - data[3]\n",
+ "max_change_in_day = max(change_in_day, key=change_in_day.get), ':', max(change_in_day.values())\n",
+ "print(max_change_in_day)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 97,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(('2017-05-17', 'to', '2017-05-16'), ':', 1.2050000000000054)\n"
+ ]
+ }
+ ],
+ "source": [
+ "#5. What was the largest change between any two days (based on Closing Price)?\n",
+ "change_btw_days = {}\n",
+ "for i in range(len(dataset) - 1):\n",
+ " change_btw_days[dataset[i][0], 'to', dataset[i+1][0]] = abs(dataset[i][4] - dataset[i+1][4])\n",
+ "max_change_btw_days = max(change_btw_days, key=change_btw_days.get), \":\", max(change_btw_days.values())\n",
+ "print(max_change_btw_days)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 104,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1356351.1746031747\n"
+ ]
+ }
+ ],
+ "source": [
+ "#6. What was the average daily trading volume during this year?\n",
+ "volumes = []\n",
+ "for data in dataset:\n",
+ " volumes.append(data[6])\n",
+ "total_vol = sum(volumes)\n",
+ "trading_days = len(volumes)\n",
+ "average_vol = total_vol / trading_days\n",
+ "print(average_vol)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 116,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1253768.0\n"
+ ]
+ }
+ ],
+ "source": [
+ "#7. Optional) What was the median trading volume during this year. (Note: you may need to implement your own function for calculating the median.)\n",
+ "volumes.sort()\n",
+ "median = (volumes[int(len(volumes)/ 2)] + volumes[int(len(volumes) / 2) - 1]) / 2\n",
+ "print(median)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 117,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[479585.0,\n",
+ " 556925.0,\n",
+ " 574317.0,\n",
+ " 575476.0,\n",
+ " 601057.0,\n",
+ " 605987.0,\n",
+ " 608053.0,\n",
+ " 701112.0,\n",
+ " 711806.0,\n",
+ " 732911.0,\n",
+ " 741342.0,\n",
+ " 744523.0,\n",
+ " 751509.0,\n",
+ " 762435.0,\n",
+ " 765764.0,\n",
+ " 772531.0,\n",
+ " 773139.0,\n",
+ " 778364.0,\n",
+ " 786143.0,\n",
+ " 796432.0,\n",
+ " 798410.0,\n",
+ " 808758.0,\n",
+ " 813306.0,\n",
+ " 831703.0,\n",
+ " 843468.0,\n",
+ " 843906.0,\n",
+ " 848527.0,\n",
+ " 855293.0,\n",
+ " 864934.0,\n",
+ " 867630.0,\n",
+ " 873615.0,\n",
+ " 897147.0,\n",
+ " 899809.0,\n",
+ " 907138.0,\n",
+ " 909972.0,\n",
+ " 910894.0,\n",
+ " 912185.0,\n",
+ " 920478.0,\n",
+ " 929186.0,\n",
+ " 933429.0,\n",
+ " 933804.0,\n",
+ " 948061.0,\n",
+ " 948258.0,\n",
+ " 954629.0,\n",
+ " 959055.0,\n",
+ " 964494.0,\n",
+ " 971323.0,\n",
+ " 976008.0,\n",
+ " 978790.0,\n",
+ " 980317.0,\n",
+ " 981385.0,\n",
+ " 985055.0,\n",
+ " 990944.0,\n",
+ " 993748.0,\n",
+ " 994306.0,\n",
+ " 999933.0,\n",
+ " 1007257.0,\n",
+ " 1010748.0,\n",
+ " 1018883.0,\n",
+ " 1030081.0,\n",
+ " 1031791.0,\n",
+ " 1032164.0,\n",
+ " 1034376.0,\n",
+ " 1036110.0,\n",
+ " 1040767.0,\n",
+ " 1043727.0,\n",
+ " 1048069.0,\n",
+ " 1048127.0,\n",
+ " 1057105.0,\n",
+ " 1060107.0,\n",
+ " 1065404.0,\n",
+ " 1081145.0,\n",
+ " 1088678.0,\n",
+ " 1095660.0,\n",
+ " 1098924.0,\n",
+ " 1100747.0,\n",
+ " 1102387.0,\n",
+ " 1103999.0,\n",
+ " 1105674.0,\n",
+ " 1109333.0,\n",
+ " 1112515.0,\n",
+ " 1113906.0,\n",
+ " 1117782.0,\n",
+ " 1120392.0,\n",
+ " 1130793.0,\n",
+ " 1131777.0,\n",
+ " 1132496.0,\n",
+ " 1133172.0,\n",
+ " 1134887.0,\n",
+ " 1138883.0,\n",
+ " 1146901.0,\n",
+ " 1147296.0,\n",
+ " 1153084.0,\n",
+ " 1154450.0,\n",
+ " 1155041.0,\n",
+ " 1156950.0,\n",
+ " 1161907.0,\n",
+ " 1164391.0,\n",
+ " 1169809.0,\n",
+ " 1173019.0,\n",
+ " 1178023.0,\n",
+ " 1179431.0,\n",
+ " 1180003.0,\n",
+ " 1181293.0,\n",
+ " 1182044.0,\n",
+ " 1184571.0,\n",
+ " 1189265.0,\n",
+ " 1190945.0,\n",
+ " 1193468.0,\n",
+ " 1194228.0,\n",
+ " 1203034.0,\n",
+ " 1205912.0,\n",
+ " 1208729.0,\n",
+ " 1219659.0,\n",
+ " 1222210.0,\n",
+ " 1222826.0,\n",
+ " 1226265.0,\n",
+ " 1232461.0,\n",
+ " 1236453.0,\n",
+ " 1241573.0,\n",
+ " 1242571.0,\n",
+ " 1243035.0,\n",
+ " 1244383.0,\n",
+ " 1245354.0,\n",
+ " 1247907.0,\n",
+ " 1248177.0,\n",
+ " 1259359.0,\n",
+ " 1261362.0,\n",
+ " 1262464.0,\n",
+ " 1265640.0,\n",
+ " 1265947.0,\n",
+ " 1268298.0,\n",
+ " 1269233.0,\n",
+ " 1279422.0,\n",
+ " 1281333.0,\n",
+ " 1289642.0,\n",
+ " 1291761.0,\n",
+ " 1314959.0,\n",
+ " 1321182.0,\n",
+ " 1321345.0,\n",
+ " 1326481.0,\n",
+ " 1331064.0,\n",
+ " 1334812.0,\n",
+ " 1344172.0,\n",
+ " 1345183.0,\n",
+ " 1352427.0,\n",
+ " 1358644.0,\n",
+ " 1361939.0,\n",
+ " 1362135.0,\n",
+ " 1362626.0,\n",
+ " 1366373.0,\n",
+ " 1367474.0,\n",
+ " 1368754.0,\n",
+ " 1372593.0,\n",
+ " 1376033.0,\n",
+ " 1376672.0,\n",
+ " 1378274.0,\n",
+ " 1384516.0,\n",
+ " 1391082.0,\n",
+ " 1392995.0,\n",
+ " 1394942.0,\n",
+ " 1399576.0,\n",
+ " 1400053.0,\n",
+ " 1408279.0,\n",
+ " 1411562.0,\n",
+ " 1412273.0,\n",
+ " 1418600.0,\n",
+ " 1419844.0,\n",
+ " 1423857.0,\n",
+ " 1428589.0,\n",
+ " 1456744.0,\n",
+ " 1461266.0,\n",
+ " 1464324.0,\n",
+ " 1471607.0,\n",
+ " 1471864.0,\n",
+ " 1474349.0,\n",
+ " 1477381.0,\n",
+ " 1481919.0,\n",
+ " 1485150.0,\n",
+ " 1485165.0,\n",
+ " 1493605.0,\n",
+ " 1495815.0,\n",
+ " 1497935.0,\n",
+ " 1509212.0,\n",
+ " 1514442.0,\n",
+ " 1521825.0,\n",
+ " 1522772.0,\n",
+ " 1531366.0,\n",
+ " 1569880.0,\n",
+ " 1571973.0,\n",
+ " 1574374.0,\n",
+ " 1583434.0,\n",
+ " 1587643.0,\n",
+ " 1588415.0,\n",
+ " 1603664.0,\n",
+ " 1606543.0,\n",
+ " 1609227.0,\n",
+ " 1613584.0,\n",
+ " 1621456.0,\n",
+ " 1652789.0,\n",
+ " 1661288.0,\n",
+ " 1668864.0,\n",
+ " 1702233.0,\n",
+ " 1710780.0,\n",
+ " 1724119.0,\n",
+ " 1727697.0,\n",
+ " 1737477.0,\n",
+ " 1743689.0,\n",
+ " 1744271.0,\n",
+ " 1750992.0,\n",
+ " 1762497.0,\n",
+ " 1773417.0,\n",
+ " 1780099.0,\n",
+ " 1780610.0,\n",
+ " 1782575.0,\n",
+ " 1810355.0,\n",
+ " 1819807.0,\n",
+ " 1829267.0,\n",
+ " 1830004.0,\n",
+ " 1841312.0,\n",
+ " 1881781.0,\n",
+ " 1915980.0,\n",
+ " 1918842.0,\n",
+ " 1923439.0,\n",
+ " 1948545.0,\n",
+ " 1953175.0,\n",
+ " 1968354.0,\n",
+ " 1981963.0,\n",
+ " 1999129.0,\n",
+ " 2020909.0,\n",
+ " 2021500.0,\n",
+ " 2091661.0,\n",
+ " 2098187.0,\n",
+ " 2133301.0,\n",
+ " 2155789.0,\n",
+ " 2159905.0,\n",
+ " 2165586.0,\n",
+ " 2249405.0,\n",
+ " 2250037.0,\n",
+ " 2264252.0,\n",
+ " 2312232.0,\n",
+ " 2399764.0,\n",
+ " 2486474.0,\n",
+ " 2505690.0,\n",
+ " 2670943.0,\n",
+ " 2708923.0,\n",
+ " 2733044.0,\n",
+ " 2805533.0,\n",
+ " 2809351.0,\n",
+ " 3040962.0,\n",
+ " 3126178.0,\n",
+ " 3900972.0]"
+ ]
+ },
+ "execution_count": 117,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": []
},
{
"cell_type": "code",
"execution_count": null,
- "source": [],
+ "metadata": {},
"outputs": [],
- "metadata": {}
+ "source": []
}
],
"metadata": {
+ "interpreter": {
+ "hash": "4885f37acae9217c235118400878352aafa7b76e66df698a1f601374f86939a7"
+ },
"kernelspec": {
- "name": "python3",
- "display_name": "Python 3.7.9 64-bit ('springboard': conda)"
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
},
"language_info": {
"codemirror_mode": {
@@ -186,12 +611,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.7.9"
- },
- "interpreter": {
- "hash": "4885f37acae9217c235118400878352aafa7b76e66df698a1f601374f86939a7"
+ "version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
-}
\ No newline at end of file
+}
diff --git a/mec-3.4.1-api-mini-project/python.gitignore b/mec-3.4.1-api-mini-project/python.gitignore
new file mode 100644
index 000000000..1c22fb783
--- /dev/null
+++ b/mec-3.4.1-api-mini-project/python.gitignore
@@ -0,0 +1,160 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+# Usually these files are written by a python script from a template
+# before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# py
+# For a library or package, you might want to ignore these files since the code is
+# intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+# However, in case of collaboration, if having platform-specific dependencies or dependencies
+# having no cross-platform support, pipenv may install dependencies that don't work, or not
+# install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+# This is especially recommended for binary packages to ensure reproducibility, and is more
+# commonly ignored for libraries.
+# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+
+# pdm
+# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+# in version control.
+# https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+# and can be added to the global gitignore or merged into this file. For a more nuclear
+# option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/
\ No newline at end of file
diff --git a/mec-5.3.10-data-wranging-with-pandas-mini-project/Mini_Project_Data_Wrangling_Pandas.ipynb b/mec-5.3.10-data-wranging-with-pandas-mini-project/Mini_Project_Data_Wrangling_Pandas.ipynb
index ed51607a2..0b20583e6 100755
--- a/mec-5.3.10-data-wranging-with-pandas-mini-project/Mini_Project_Data_Wrangling_Pandas.ipynb
+++ b/mec-5.3.10-data-wranging-with-pandas-mini-project/Mini_Project_Data_Wrangling_Pandas.ipynb
@@ -36,14 +36,14 @@
"metadata": {},
"outputs": [
{
- "output_type": "execute_result",
"data": {
"text/plain": [
- "'0.25.3'"
+ "'1.4.4'"
]
},
+ "execution_count": 2,
"metadata": {},
- "execution_count": 2
+ "output_type": "execute_result"
}
],
"source": [
@@ -162,6 +162,13 @@
"movies.head()"
]
},
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
{
"cell_type": "markdown",
"metadata": {},
@@ -176,14 +183,26 @@
},
{
"cell_type": "code",
- "execution_count": 6,
+ "execution_count": 5,
"metadata": {},
"outputs": [
{
- "output_type": "stream",
"name": "stdout",
+ "output_type": "stream",
"text": [
- "\nRangeIndex: 3786176 entries, 0 to 3786175\nData columns (total 6 columns):\ntitle object\nyear int64\nname object\ntype object\ncharacter object\nn float64\ndtypes: float64(1), int64(1), object(4)\nmemory usage: 173.3+ MB\n"
+ "\n",
+ "RangeIndex: 3786176 entries, 0 to 3786175\n",
+ "Data columns (total 6 columns):\n",
+ " # Column Dtype \n",
+ "--- ------ ----- \n",
+ " 0 title object \n",
+ " 1 year int64 \n",
+ " 2 name object \n",
+ " 3 type object \n",
+ " 4 character object \n",
+ " 5 n float64\n",
+ "dtypes: float64(1), int64(1), object(4)\n",
+ "memory usage: 173.3+ MB\n"
]
}
],
@@ -511,9 +530,20 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 9,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "244914"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"len(movies)"
]
@@ -527,9 +557,67 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 10,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Total Batman Movies: 2\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " title \n",
+ " year \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 52734 \n",
+ " Batman \n",
+ " 1943 \n",
+ " \n",
+ " \n",
+ " 150621 \n",
+ " Batman \n",
+ " 1989 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " title year\n",
+ "52734 Batman 1943\n",
+ "150621 Batman 1989"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"batman_df = movies[movies.title == 'Batman']\n",
"print('Total Batman Movies:', len(batman_df))\n",
@@ -545,9 +633,115 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 11,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Total Batman Movies: 35\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " title \n",
+ " year \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 16813 \n",
+ " Batman: Anarchy \n",
+ " 2016 \n",
+ " \n",
+ " \n",
+ " 30236 \n",
+ " Batman Forever \n",
+ " 1995 \n",
+ " \n",
+ " \n",
+ " 31674 \n",
+ " Batman Untold \n",
+ " 2010 \n",
+ " \n",
+ " \n",
+ " 31711 \n",
+ " Scooby-Doo & Batman: the Brave and the Bold \n",
+ " 2018 \n",
+ " \n",
+ " \n",
+ " 41881 \n",
+ " Batman the Rise of Red Hood \n",
+ " 2018 \n",
+ " \n",
+ " \n",
+ " 43484 \n",
+ " Batman: Return of the Caped Crusaders \n",
+ " 2016 \n",
+ " \n",
+ " \n",
+ " 46333 \n",
+ " Batman & Robin \n",
+ " 1997 \n",
+ " \n",
+ " \n",
+ " 51811 \n",
+ " Batman Revealed \n",
+ " 2012 \n",
+ " \n",
+ " \n",
+ " 52734 \n",
+ " Batman \n",
+ " 1943 \n",
+ " \n",
+ " \n",
+ " 56029 \n",
+ " Batman Beyond: Rising Knight \n",
+ " 2014 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " title year\n",
+ "16813 Batman: Anarchy 2016\n",
+ "30236 Batman Forever 1995\n",
+ "31674 Batman Untold 2010\n",
+ "31711 Scooby-Doo & Batman: the Brave and the Bold 2018\n",
+ "41881 Batman the Rise of Red Hood 2018\n",
+ "43484 Batman: Return of the Caped Crusaders 2016\n",
+ "46333 Batman & Robin 1997\n",
+ "51811 Batman Revealed 2012\n",
+ "52734 Batman 1943\n",
+ "56029 Batman Beyond: Rising Knight 2014"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"batman_df = movies[movies.title.str.contains('Batman', case=False)]\n",
"print('Total Batman Movies:', len(batman_df))\n",
@@ -563,9 +757,138 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 12,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " title \n",
+ " year \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 52734 \n",
+ " Batman \n",
+ " 1943 \n",
+ " \n",
+ " \n",
+ " 100056 \n",
+ " Batman and Robin \n",
+ " 1949 \n",
+ " \n",
+ " \n",
+ " 161439 \n",
+ " Batman Dracula \n",
+ " 1964 \n",
+ " \n",
+ " \n",
+ " 84327 \n",
+ " Alyas Batman at Robin \n",
+ " 1965 \n",
+ " \n",
+ " \n",
+ " 68364 \n",
+ " James Batman \n",
+ " 1966 \n",
+ " \n",
+ " \n",
+ " 161527 \n",
+ " Batman: The Movie \n",
+ " 1966 \n",
+ " \n",
+ " \n",
+ " 56159 \n",
+ " Batman Fights Dracula \n",
+ " 1967 \n",
+ " \n",
+ " \n",
+ " 168504 \n",
+ " Fight! Batman, Fight! \n",
+ " 1973 \n",
+ " \n",
+ " \n",
+ " 150621 \n",
+ " Batman \n",
+ " 1989 \n",
+ " \n",
+ " \n",
+ " 156239 \n",
+ " Alyas Batman en Robin \n",
+ " 1991 \n",
+ " \n",
+ " \n",
+ " 156755 \n",
+ " Batman Returns \n",
+ " 1992 \n",
+ " \n",
+ " \n",
+ " 63366 \n",
+ " Batman: Mask of the Phantasm \n",
+ " 1993 \n",
+ " \n",
+ " \n",
+ " 30236 \n",
+ " Batman Forever \n",
+ " 1995 \n",
+ " \n",
+ " \n",
+ " 46333 \n",
+ " Batman & Robin \n",
+ " 1997 \n",
+ " \n",
+ " \n",
+ " 208220 \n",
+ " Batman Begins \n",
+ " 2005 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " title year\n",
+ "52734 Batman 1943\n",
+ "100056 Batman and Robin 1949\n",
+ "161439 Batman Dracula 1964\n",
+ "84327 Alyas Batman at Robin 1965\n",
+ "68364 James Batman 1966\n",
+ "161527 Batman: The Movie 1966\n",
+ "56159 Batman Fights Dracula 1967\n",
+ "168504 Fight! Batman, Fight! 1973\n",
+ "150621 Batman 1989\n",
+ "156239 Alyas Batman en Robin 1991\n",
+ "156755 Batman Returns 1992\n",
+ "63366 Batman: Mask of the Phantasm 1993\n",
+ "30236 Batman Forever 1995\n",
+ "46333 Batman & Robin 1997\n",
+ "208220 Batman Begins 2005"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"batman_df.sort_values(by=['year'], ascending=True).iloc[:15]"
]
@@ -579,55 +902,182 @@
},
{
"cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### How many movies were made in the year 2017?"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "len(movies[movies.year == 2017])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Section I - Q2 : How many movies were made in the year 2015?"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
- },
- {
- "cell_type": "markdown",
+ "execution_count": 13,
"metadata": {},
- "source": [
- "### Section I - Q3 : How many movies were made from 2000 till 2018?\n",
- "- You can chain multiple conditions using OR (`|`) as well as AND (`&`) depending on the condition"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " title \n",
+ " year \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 143147 \n",
+ " Harry Potter and the Deathly Hallows: Part 2 \n",
+ " 2011 \n",
+ " \n",
+ " \n",
+ " 152831 \n",
+ " Harry Potter and the Deathly Hallows: Part 1 \n",
+ " 2010 \n",
+ " \n",
+ " \n",
+ " 109213 \n",
+ " Harry Potter and the Half-Blood Prince \n",
+ " 2009 \n",
+ " \n",
+ " \n",
+ " 50581 \n",
+ " Harry Potter and the Order of the Phoenix \n",
+ " 2007 \n",
+ " \n",
+ " \n",
+ " 187926 \n",
+ " Harry Potter and the Goblet of Fire \n",
+ " 2005 \n",
+ " \n",
+ " \n",
+ " 61957 \n",
+ " Harry Potter and the Prisoner of Azkaban \n",
+ " 2004 \n",
+ " \n",
+ " \n",
+ " 82791 \n",
+ " Harry Potter and the Chamber of Secrets \n",
+ " 2002 \n",
+ " \n",
+ " \n",
+ " 223087 \n",
+ " Harry Potter and the Sorcerer's Stone \n",
+ " 2001 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " title year\n",
+ "143147 Harry Potter and the Deathly Hallows: Part 2 2011\n",
+ "152831 Harry Potter and the Deathly Hallows: Part 1 2010\n",
+ "109213 Harry Potter and the Half-Blood Prince 2009\n",
+ "50581 Harry Potter and the Order of the Phoenix 2007\n",
+ "187926 Harry Potter and the Goblet of Fire 2005\n",
+ "61957 Harry Potter and the Prisoner of Azkaban 2004\n",
+ "82791 Harry Potter and the Chamber of Secrets 2002\n",
+ "223087 Harry Potter and the Sorcerer's Stone 2001"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harry_potter_df = movies[movies.title.str.contains('Harry Potter', case=False)].sort_values(by='year', ascending=False)\n",
+ "harry_potter_df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
"metadata": {},
- "outputs": [],
- "source": []
+ "source": [
+ "### How many movies were made in the year 2017?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "11474"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(movies[movies.year == 2017])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Section I - Q2 : How many movies were made in the year 2015?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "8702"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(movies[movies['year'] == 2015])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Section I - Q3 : How many movies were made from 2000 till 2018?\n",
+ "- You can chain multiple conditions using OR (`|`) as well as AND (`&`) depending on the condition"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "244914"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(movies[(movies['year'] >= 2000) & movies['year'] <= 2018])"
+ ]
},
{
"cell_type": "markdown",
@@ -638,10 +1088,23 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 17,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "20"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(movies[movies['title'] == 'Hamlet'])"
+ ]
},
{
"cell_type": "markdown",
@@ -654,10 +1117,94 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 18,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " title \n",
+ " year \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 55639 \n",
+ " Hamlet \n",
+ " 2000 \n",
+ " \n",
+ " \n",
+ " 1931 \n",
+ " Hamlet \n",
+ " 2009 \n",
+ " \n",
+ " \n",
+ " 227953 \n",
+ " Hamlet \n",
+ " 2011 \n",
+ " \n",
+ " \n",
+ " 178290 \n",
+ " Hamlet \n",
+ " 2014 \n",
+ " \n",
+ " \n",
+ " 186137 \n",
+ " Hamlet \n",
+ " 2015 \n",
+ " \n",
+ " \n",
+ " 191940 \n",
+ " Hamlet \n",
+ " 2016 \n",
+ " \n",
+ " \n",
+ " 244747 \n",
+ " Hamlet \n",
+ " 2017 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " title year\n",
+ "55639 Hamlet 2000\n",
+ "1931 Hamlet 2009\n",
+ "227953 Hamlet 2011\n",
+ "178290 Hamlet 2014\n",
+ "186137 Hamlet 2015\n",
+ "191940 Hamlet 2016\n",
+ "244747 Hamlet 2017"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "hamlet_df = movies[(movies['title'] == 'Hamlet') & (movies['year'] >= 2000)].sort_values(by='year')\n",
+ "hamlet_df"
+ ]
},
{
"cell_type": "markdown",
@@ -670,10 +1217,23 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 19,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "27"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(cast[(cast['title'] == 'Inception') & (cast['n'].isna())])"
+ ]
},
{
"cell_type": "markdown",
@@ -685,10 +1245,23 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 20,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "51"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(cast[(cast['title'] == 'Inception') & cast['n']])"
+ ]
},
{
"cell_type": "markdown",
@@ -701,37 +1274,422 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 21,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " title \n",
+ " year \n",
+ " name \n",
+ " type \n",
+ " character \n",
+ " n \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 590576 \n",
+ " Inception \n",
+ " 2010 \n",
+ " Leonardo DiCaprio \n",
+ " actor \n",
+ " Cobb \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 859993 \n",
+ " Inception \n",
+ " 2010 \n",
+ " Joseph Gordon-Levitt \n",
+ " actor \n",
+ " Arthur \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 3387147 \n",
+ " Inception \n",
+ " 2010 \n",
+ " Ellen Page \n",
+ " actress \n",
+ " Ariadne \n",
+ " 3.0 \n",
+ " \n",
+ " \n",
+ " 940923 \n",
+ " Inception \n",
+ " 2010 \n",
+ " Tom Hardy \n",
+ " actor \n",
+ " Eames \n",
+ " 4.0 \n",
+ " \n",
+ " \n",
+ " 2406531 \n",
+ " Inception \n",
+ " 2010 \n",
+ " Ken Watanabe \n",
+ " actor \n",
+ " Saito \n",
+ " 5.0 \n",
+ " \n",
+ " \n",
+ " 1876301 \n",
+ " Inception \n",
+ " 2010 \n",
+ " Dileep Rao \n",
+ " actor \n",
+ " Yusuf \n",
+ " 6.0 \n",
+ " \n",
+ " \n",
+ " 1615709 \n",
+ " Inception \n",
+ " 2010 \n",
+ " Cillian Murphy \n",
+ " actor \n",
+ " Robert Fischer \n",
+ " 7.0 \n",
+ " \n",
+ " \n",
+ " 183937 \n",
+ " Inception \n",
+ " 2010 \n",
+ " Tom Berenger \n",
+ " actor \n",
+ " Browning \n",
+ " 8.0 \n",
+ " \n",
+ " \n",
+ " 2765969 \n",
+ " Inception \n",
+ " 2010 \n",
+ " Marion Cotillard \n",
+ " actress \n",
+ " Mal \n",
+ " 9.0 \n",
+ " \n",
+ " \n",
+ " 1826027 \n",
+ " Inception \n",
+ " 2010 \n",
+ " Pete Postlethwaite \n",
+ " actor \n",
+ " Maurice Fischer \n",
+ " 10.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " title year name type character n\n",
+ "590576 Inception 2010 Leonardo DiCaprio actor Cobb 1.0\n",
+ "859993 Inception 2010 Joseph Gordon-Levitt actor Arthur 2.0\n",
+ "3387147 Inception 2010 Ellen Page actress Ariadne 3.0\n",
+ "940923 Inception 2010 Tom Hardy actor Eames 4.0\n",
+ "2406531 Inception 2010 Ken Watanabe actor Saito 5.0\n",
+ "1876301 Inception 2010 Dileep Rao actor Yusuf 6.0\n",
+ "1615709 Inception 2010 Cillian Murphy actor Robert Fischer 7.0\n",
+ "183937 Inception 2010 Tom Berenger actor Browning 8.0\n",
+ "2765969 Inception 2010 Marion Cotillard actress Mal 9.0\n",
+ "1826027 Inception 2010 Pete Postlethwaite actor Maurice Fischer 10.0"
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "topten_inception = cast[(cast['title'] == 'Inception')].sort_values(by='n').iloc[:10]\n",
+ "topten_inception"
+ ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "### Section I - Q9:\n",
- "\n",
- "(A) List all movies where there was a character 'Albus Dumbledore' \n",
- "\n",
- "(B) Now modify the above to show only the actors who played the character 'Albus Dumbledore'\n",
- "- For Part (B) remember the same actor might play the same role in multiple movies"
+ "### Section I - Q9:\n",
+ "\n",
+ "(A) List all movies where there was a character 'Albus Dumbledore' \n",
+ "\n",
+ "(B) Now modify the above to show only the actors who played the character 'Albus Dumbledore'\n",
+ "- For Part (B) remember the same actor might play the same role in multiple movies"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " title \n",
+ " year \n",
+ " name \n",
+ " type \n",
+ " character \n",
+ " n \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 704984 \n",
+ " Epic Movie \n",
+ " 2007 \n",
+ " Dane Farwell \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " 17.0 \n",
+ " \n",
+ " \n",
+ " 792421 \n",
+ " Harry Potter and the Goblet of Fire \n",
+ " 2005 \n",
+ " Michael Gambon \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " 37.0 \n",
+ " \n",
+ " \n",
+ " 792423 \n",
+ " Harry Potter and the Order of the Phoenix \n",
+ " 2007 \n",
+ " Michael Gambon \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " 36.0 \n",
+ " \n",
+ " \n",
+ " 792424 \n",
+ " Harry Potter and the Prisoner of Azkaban \n",
+ " 2004 \n",
+ " Michael Gambon \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " 27.0 \n",
+ " \n",
+ " \n",
+ " 947789 \n",
+ " Harry Potter and the Chamber of Secrets \n",
+ " 2002 \n",
+ " Richard Harris \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " 32.0 \n",
+ " \n",
+ " \n",
+ " 947790 \n",
+ " Harry Potter and the Sorcerer's Stone \n",
+ " 2001 \n",
+ " Richard Harris \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1685537 \n",
+ " Ultimate Hero Project \n",
+ " 2013 \n",
+ " George (X) O'Connor \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 2248085 \n",
+ " Potter \n",
+ " 2015 \n",
+ " Timothy Tedmanson \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " title year name \\\n",
+ "704984 Epic Movie 2007 Dane Farwell \n",
+ "792421 Harry Potter and the Goblet of Fire 2005 Michael Gambon \n",
+ "792423 Harry Potter and the Order of the Phoenix 2007 Michael Gambon \n",
+ "792424 Harry Potter and the Prisoner of Azkaban 2004 Michael Gambon \n",
+ "947789 Harry Potter and the Chamber of Secrets 2002 Richard Harris \n",
+ "947790 Harry Potter and the Sorcerer's Stone 2001 Richard Harris \n",
+ "1685537 Ultimate Hero Project 2013 George (X) O'Connor \n",
+ "2248085 Potter 2015 Timothy Tedmanson \n",
+ "\n",
+ " type character n \n",
+ "704984 actor Albus Dumbledore 17.0 \n",
+ "792421 actor Albus Dumbledore 37.0 \n",
+ "792423 actor Albus Dumbledore 36.0 \n",
+ "792424 actor Albus Dumbledore 27.0 \n",
+ "947789 actor Albus Dumbledore 32.0 \n",
+ "947790 actor Albus Dumbledore 1.0 \n",
+ "1685537 actor Albus Dumbledore NaN \n",
+ "2248085 actor Albus Dumbledore NaN "
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "dumbledore = cast[cast['character'] == 'Albus Dumbledore']\n",
+ "dumbledore"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " title \n",
+ " year \n",
+ " name \n",
+ " type \n",
+ " character \n",
+ " n \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 792421 \n",
+ " Harry Potter and the Goblet of Fire \n",
+ " 2005 \n",
+ " Michael Gambon \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " 37.0 \n",
+ " \n",
+ " \n",
+ " 792423 \n",
+ " Harry Potter and the Order of the Phoenix \n",
+ " 2007 \n",
+ " Michael Gambon \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " 36.0 \n",
+ " \n",
+ " \n",
+ " 792424 \n",
+ " Harry Potter and the Prisoner of Azkaban \n",
+ " 2004 \n",
+ " Michael Gambon \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " 27.0 \n",
+ " \n",
+ " \n",
+ " 947789 \n",
+ " Harry Potter and the Chamber of Secrets \n",
+ " 2002 \n",
+ " Richard Harris \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " 32.0 \n",
+ " \n",
+ " \n",
+ " 947790 \n",
+ " Harry Potter and the Sorcerer's Stone \n",
+ " 2001 \n",
+ " Richard Harris \n",
+ " actor \n",
+ " Albus Dumbledore \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " title year name \\\n",
+ "792421 Harry Potter and the Goblet of Fire 2005 Michael Gambon \n",
+ "792423 Harry Potter and the Order of the Phoenix 2007 Michael Gambon \n",
+ "792424 Harry Potter and the Prisoner of Azkaban 2004 Michael Gambon \n",
+ "947789 Harry Potter and the Chamber of Secrets 2002 Richard Harris \n",
+ "947790 Harry Potter and the Sorcerer's Stone 2001 Richard Harris \n",
+ "\n",
+ " type character n \n",
+ "792421 actor Albus Dumbledore 37.0 \n",
+ "792423 actor Albus Dumbledore 36.0 \n",
+ "792424 actor Albus Dumbledore 27.0 \n",
+ "947789 actor Albus Dumbledore 32.0 \n",
+ "947790 actor Albus Dumbledore 1.0 "
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "hp_dumbledore = dumbledore[dumbledore['title'].str.contains('Harry Potter', case=False)]\n",
+ "hp_dumbledore"
]
},
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
- },
{
"cell_type": "markdown",
"metadata": {},
@@ -745,17 +1703,243 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 24,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "62"
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(cast[cast['name'] == 'Keanu Reeves'])"
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 25,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " title \n",
+ " year \n",
+ " name \n",
+ " type \n",
+ " character \n",
+ " n \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 1892390 \n",
+ " The Matrix \n",
+ " 1999 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " Neo \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892397 \n",
+ " The Replacements \n",
+ " 2000 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " Shane Falco \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892358 \n",
+ " Hard Ball \n",
+ " 2001 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " Conor O'Neill \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892383 \n",
+ " Sweet November \n",
+ " 2001 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " Nelson Moss \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892348 \n",
+ " Constantine \n",
+ " 2005 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " John Constantine \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892388 \n",
+ " The Lake House \n",
+ " 2006 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " Alex Wyler \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892382 \n",
+ " Street Kings \n",
+ " 2008 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " Detective Tom Ludlow \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892385 \n",
+ " The Day the Earth Stood Still \n",
+ " 2008 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " Klaatu \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892359 \n",
+ " Henry's Crime \n",
+ " 2010 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " Henry Torne \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892342 \n",
+ " 47 Ronin \n",
+ " 2013 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " Kai \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892361 \n",
+ " John Wick \n",
+ " 2014 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " John Wick \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892366 \n",
+ " Knock Knock \n",
+ " 2015 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " Evan \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892399 \n",
+ " The Whole Truth \n",
+ " 2016 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " Ramsey \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892362 \n",
+ " John Wick: Chapter 2 \n",
+ " 2017 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " John Wick \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1892378 \n",
+ " Siberia \n",
+ " 2018 \n",
+ " Keanu Reeves \n",
+ " actor \n",
+ " Lucas Hill \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " title year name type \\\n",
+ "1892390 The Matrix 1999 Keanu Reeves actor \n",
+ "1892397 The Replacements 2000 Keanu Reeves actor \n",
+ "1892358 Hard Ball 2001 Keanu Reeves actor \n",
+ "1892383 Sweet November 2001 Keanu Reeves actor \n",
+ "1892348 Constantine 2005 Keanu Reeves actor \n",
+ "1892388 The Lake House 2006 Keanu Reeves actor \n",
+ "1892382 Street Kings 2008 Keanu Reeves actor \n",
+ "1892385 The Day the Earth Stood Still 2008 Keanu Reeves actor \n",
+ "1892359 Henry's Crime 2010 Keanu Reeves actor \n",
+ "1892342 47 Ronin 2013 Keanu Reeves actor \n",
+ "1892361 John Wick 2014 Keanu Reeves actor \n",
+ "1892366 Knock Knock 2015 Keanu Reeves actor \n",
+ "1892399 The Whole Truth 2016 Keanu Reeves actor \n",
+ "1892362 John Wick: Chapter 2 2017 Keanu Reeves actor \n",
+ "1892378 Siberia 2018 Keanu Reeves actor \n",
+ "\n",
+ " character n \n",
+ "1892390 Neo 1.0 \n",
+ "1892397 Shane Falco 1.0 \n",
+ "1892358 Conor O'Neill 1.0 \n",
+ "1892383 Nelson Moss 1.0 \n",
+ "1892348 John Constantine 1.0 \n",
+ "1892388 Alex Wyler 1.0 \n",
+ "1892382 Detective Tom Ludlow 1.0 \n",
+ "1892385 Klaatu 1.0 \n",
+ "1892359 Henry Torne 1.0 \n",
+ "1892342 Kai 1.0 \n",
+ "1892361 John Wick 1.0 \n",
+ "1892366 Evan 1.0 \n",
+ "1892399 Ramsey 1.0 \n",
+ "1892362 John Wick 1.0 \n",
+ "1892378 Lucas Hill 1.0 "
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "keanu_reeves = cast[(cast['name'] == 'Keanu Reeves') & (cast['year'] >= 1999) & (cast['n']) & (cast['n'] == 1.0)].sort_values(by='year')\n",
+ "keanu_reeves"
+ ]
},
{
"cell_type": "markdown",
@@ -770,17 +1954,45 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 26,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "234635"
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "fifties = cast[(cast['year'] >= 1950) & (cast['year'] <= 1960)]\n",
+ "len(fifties[(fifties['type'] == 'actress') | (fifties['type'] == 'actor')])"
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 27,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "1452413"
+ ]
+ },
+ "execution_count": 27,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "tenyears = cast[(cast['year'] >= 2007) & (cast['year'] <= 2017)]\n",
+ "len(tenyears[(tenyears['type'] == 'actress') | (tenyears['type'] == 'actor')])"
+ ]
},
{
"cell_type": "markdown",
@@ -797,24 +2009,64 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 28,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "153233"
+ ]
+ },
+ "execution_count": 28,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "millenium = cast[cast['year'] >= 200]\n",
+ "len(millenium[millenium['n'] == 1.0])"
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 29,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "2174370"
+ ]
+ },
+ "execution_count": 29,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(millenium[(millenium['n'].notnull()) & (millenium['n'] != 1.0)])"
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 30,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "1458573"
+ ]
+ },
+ "execution_count": 30,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(millenium[millenium['n'].isna()])"
+ ]
},
{
"cell_type": "markdown",
@@ -832,9 +2084,30 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 31,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Hamlet 20\n",
+ "Carmen 17\n",
+ "Macbeth 16\n",
+ "Maya 12\n",
+ "Temptation 12\n",
+ "The Outsider 12\n",
+ "Freedom 11\n",
+ "The Three Musketeers 11\n",
+ "Honeymoon 11\n",
+ "Othello 11\n",
+ "Name: title, dtype: int64"
+ ]
+ },
+ "execution_count": 31,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"top_ten = movies.title.value_counts()[:10]\n",
"top_ten"
@@ -849,9 +2122,30 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 32,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
"source": [
"top_ten.plot(kind='barh')"
]
@@ -865,10 +2159,28 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 33,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "year\n",
+ "2009 6125\n",
+ "2008 5151\n",
+ "2007 4467\n",
+ "Name: year, dtype: int64"
+ ]
+ },
+ "execution_count": 33,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "top3 = movies[(movies['year'] >= 2000) & (movies['year'] <= 2009)].groupby('year')['year'].count().sort_values(ascending=False).iloc[:3]\n",
+ "top3"
+ ]
},
{
"cell_type": "markdown",
@@ -881,10 +2193,34 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 60,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import math\n",
+ "decade = []\n",
+ "for year in movies['year']:\n",
+ " decade.append(math.floor(year/10) * 10)\n",
+ "movies['decade'] = decade\n",
+ "decade_df = pd.DataFrame(movies.groupby('decade')['title'].count())\n",
+ "plt.barh(y=decade_df.index, width=decade_df['title'])\n",
+ "plt.xlabel('Number of Films')\n",
+ "plt.ylabel('Decade')\n",
+ "plt.ylim(1880, 2030)\n",
+ "plt.title('Number of films released per decade')\n",
+ "plt.show()"
+ ]
},
{
"cell_type": "markdown",
@@ -901,24 +2237,96 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 35,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "character\n",
+ "Himself 20746\n",
+ "Dancer 12477\n",
+ "Extra 11948\n",
+ "Reporter 8434\n",
+ "Student 7773\n",
+ "Doctor 7669\n",
+ "Party Guest 7245\n",
+ "Policeman 7029\n",
+ "Nurse 6999\n",
+ "Bartender 6802\n",
+ "Name: character, dtype: int64"
+ ]
+ },
+ "execution_count": 35,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cast.groupby('character')['character'].count().sort_values(ascending=False).iloc[:10]"
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 36,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "character name name \n",
+ "Herself Queen Elizabeth II Queen Elizabeth II 12\n",
+ " Joyce Brothers Joyce Brothers 9\n",
+ " Luisa Horga Luisa Horga 9\n",
+ " Mar?a Luisa (V) Mart?n Mar?a Luisa (V) Mart?n 9\n",
+ " Hillary Clinton Hillary Clinton 8\n",
+ " Margaret Thatcher Margaret Thatcher 8\n",
+ " In?s J. Southern In?s J. Southern 6\n",
+ " Marta Berrocal Marta Berrocal 6\n",
+ " Oprah Winfrey Oprah Winfrey 6\n",
+ " Marilyn Monroe Marilyn Monroe 6\n",
+ "Name: name, dtype: int64"
+ ]
+ },
+ "execution_count": 36,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cast[cast['character'] == 'Herself'].groupby(['character', 'name'])['name'].value_counts().sort_values(ascending=False).iloc[:10]"
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 37,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "character name name \n",
+ "Himself Adolf Hitler Adolf Hitler 99\n",
+ " Richard Nixon Richard Nixon 44\n",
+ " Ronald Reagan Ronald Reagan 41\n",
+ " John F. Kennedy John F. Kennedy 37\n",
+ " George W. Bush George W. Bush 25\n",
+ " Winston Churchill Winston Churchill 24\n",
+ " Martin Luther King Martin Luther King 23\n",
+ " Bill Clinton Bill Clinton 22\n",
+ " Ron Jeremy Ron Jeremy 22\n",
+ " Franklin D. Roosevelt Franklin D. Roosevelt 21\n",
+ "Name: name, dtype: int64"
+ ]
+ },
+ "execution_count": 37,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cast[cast['character'] == 'Himself'].groupby(['character', 'name'])['name'].value_counts().sort_values(ascending=False).iloc[:10]"
+ ]
},
{
"cell_type": "markdown",
@@ -935,17 +2343,65 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 38,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "character\n",
+ "Zombie 6264\n",
+ "Zombie Horde 206\n",
+ "Zombie - Protestor - Victim 78\n",
+ "Zombie Extra 70\n",
+ "Zombie Dancer 43\n",
+ "Zombie Girl 36\n",
+ "Zombie #1 36\n",
+ "Zombie #2 31\n",
+ "Zombie Vampire 25\n",
+ "Zombie Victim 22\n",
+ "Name: character, dtype: int64"
+ ]
+ },
+ "execution_count": 38,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cast[cast['character'].str.startswith('Zombie')].groupby('character')['character'].count().sort_values(ascending=False).iloc[:10]"
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 39,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "character\n",
+ "Policeman 7029\n",
+ "Police Officer 4808\n",
+ "Police Inspector 742\n",
+ "Police Sergeant 674\n",
+ "Police officer 539\n",
+ "Police 456\n",
+ "Policewoman 415\n",
+ "Police Chief 410\n",
+ "Police Captain 387\n",
+ "Police Commissioner 337\n",
+ "Name: character, dtype: int64"
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cast[cast['character'].str.startswith('Police')].groupby('character')['character'].count().sort_values(ascending=False).iloc[:10]"
+ ]
},
{
"cell_type": "markdown",
@@ -956,10 +2412,53 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 40,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "year\n",
+ "1985 1\n",
+ "1986 3\n",
+ "1988 4\n",
+ "1989 2\n",
+ "1990 2\n",
+ "1991 3\n",
+ "1992 1\n",
+ "1993 4\n",
+ "1994 1\n",
+ "1995 2\n",
+ "1996 2\n",
+ "1997 2\n",
+ "1999 3\n",
+ "2000 3\n",
+ "2001 2\n",
+ "2003 3\n",
+ "2005 3\n",
+ "2006 2\n",
+ "2008 2\n",
+ "2009 1\n",
+ "2010 1\n",
+ "2012 1\n",
+ "2013 2\n",
+ "2014 1\n",
+ "2015 1\n",
+ "2016 5\n",
+ "2017 3\n",
+ "2018 1\n",
+ "2019 1\n",
+ "Name: year, dtype: int64"
+ ]
+ },
+ "execution_count": 40,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cast[cast['name'] == 'Keanu Reeves'].groupby('year')['year'].count()"
+ ]
},
{
"cell_type": "markdown",
@@ -970,10 +2469,28 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 41,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "keanu_reeves = cast[cast['name'] == 'Keanu Reeves']\n",
+ "plt.scatter(x=keanu_reeves['year'], y=keanu_reeves['n'])\n",
+ "plt.xlabel('Year')\n",
+ "plt.ylabel('N')\n",
+ "plt.title('Keanu Reeves cast postions through his career over the years')\n",
+ "plt.show()"
+ ]
},
{
"cell_type": "markdown",
@@ -984,10 +2501,28 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 69,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "hamlet_df = pd.DataFrame(movies[movies['title'].str.contains('Hamlet', case=False)].groupby('decade')['title'].count())\n",
+ "plt.barh(y=hamlet_df.index, width=hamlet_df['title'])\n",
+ "plt.title('Hamlet films made by each decade')\n",
+ "plt.xlabel('Number of Films')\n",
+ "plt.ylabel('Decade')\n",
+ "plt.show()"
+ ]
},
{
"cell_type": "markdown",
@@ -1004,17 +2539,43 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 71,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "11823"
+ ]
+ },
+ "execution_count": 71,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(cast[(cast['year'] >= 1960) & (cast['year'] <= 1969) & (cast['n'] == 1.0) & (cast['type'].isin(['actor', 'actress']))])"
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 72,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "26344"
+ ]
+ },
+ "execution_count": 72,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(cast[(cast['year'] >= 2000) & (cast['year'] <= 2009) & (cast['n'] == 1.0) & (cast['type'].isin(['actor', 'actress']))])"
+ ]
},
{
"cell_type": "markdown",
@@ -1025,25 +2586,220 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 82,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " character \n",
+ " \n",
+ " \n",
+ " title \n",
+ " year \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " The Muppet Movie \n",
+ " 1979 \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ " An American Werewolf in London \n",
+ " 1981 \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " The Great Muppet Caper \n",
+ " 1981 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " The Dark Crystal \n",
+ " 1982 \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " The Muppets Take Manhattan \n",
+ " 1984 \n",
+ " 7 \n",
+ " \n",
+ " \n",
+ " Follow That Bird \n",
+ " 1985 \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " The Muppet Christmas Carol \n",
+ " 1992 \n",
+ " 7 \n",
+ " \n",
+ " \n",
+ " Muppet Treasure Island \n",
+ " 1996 \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " Muppets from Space \n",
+ " 1999 \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " The Adventures of Elmo in Grouchland \n",
+ " 1999 \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " character\n",
+ "title year \n",
+ "The Muppet Movie 1979 8\n",
+ "An American Werewolf in London 1981 2\n",
+ "The Great Muppet Caper 1981 6\n",
+ "The Dark Crystal 1982 2\n",
+ "The Muppets Take Manhattan 1984 7\n",
+ "Follow That Bird 1985 3\n",
+ "The Muppet Christmas Carol 1992 7\n",
+ "Muppet Treasure Island 1996 4\n",
+ "Muppets from Space 1999 4\n",
+ "The Adventures of Elmo in Grouchland 1999 3"
+ ]
+ },
+ "execution_count": 82,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "frank_oz = cast[cast['name'] == 'Frank Oz']\n",
+ "frank_oz_more_than_one = pd.DataFrame(frank_oz.groupby(['title', 'year'])['character'].count())\n",
+ "frank_oz_more_than_one[frank_oz_more_than_one['character'] > 1].sort_index(axis='index', level='year')"
+ ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "### Section II - Q10: List each of the characters that Frank Oz has portrayed at least twice"
+ "### Section II - Q10: List each of the characters that Frank Oz has portrayed at least twice"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 85,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " title \n",
+ " \n",
+ " \n",
+ " character \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " Animal \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " Bert \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " Cookie Monster \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " Fozzie Bear \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " Grover \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " Miss Piggy \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " Sam the Eagle \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " Yoda \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " title\n",
+ "character \n",
+ "Animal 6\n",
+ "Bert 3\n",
+ "Cookie Monster 5\n",
+ "Fozzie Bear 4\n",
+ "Grover 2\n",
+ "Miss Piggy 6\n",
+ "Sam the Eagle 5\n",
+ "Yoda 6"
+ ]
+ },
+ "execution_count": 85,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "frank_oz_twice = pd.DataFrame(frank_oz.groupby('character')['title'].count())\n",
+ "frank_oz_twice[frank_oz_twice['title'] >= 2]"
]
},
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
- },
{
"cell_type": "markdown",
"metadata": {},
@@ -1063,9 +2819,30 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 43,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 43,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
"source": [
"christmas = release_dates[(release_dates.title.str.contains('Christmas')) & (release_dates.country == 'USA')]\n",
"christmas.date.dt.month.value_counts().sort_index().plot(kind='bar')"
@@ -1083,10 +2860,28 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 86,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "summer = release_dates[(release_dates['title'].str.contains('Summer', case=False)) & (release_dates['country'] == 'USA')]\n",
+ "plt.hist(x=summer['date'].dt.month)\n",
+ "plt.title('Frequency by month of movies released in the USA with \"Summer\" in the title')\n",
+ "plt.ylabel('Frequency')\n",
+ "plt.xlabel('Month Number')\n",
+ "plt.show()"
+ ]
},
{
"cell_type": "markdown",
@@ -1100,10 +2895,28 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 87,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "action = release_dates[(release_dates['title'].str.contains('Action', case=False)) & (release_dates['country'] == 'USA')]\n",
+ "plt.hist(x=action['date'].dt.isocalendar().week)\n",
+ "plt.title('Frequency by week of movies released in the USA with \"Action\" in the title')\n",
+ "plt.ylabel('Frequency')\n",
+ "plt.xlabel('Week Number')\n",
+ "plt.show()"
+ ]
},
{
"cell_type": "markdown",
@@ -1115,11 +2928,291 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 98,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " title \n",
+ " name \n",
+ " n \n",
+ " country \n",
+ " date \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 17 \n",
+ " Speed \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1922-10-22 \n",
+ " \n",
+ " \n",
+ " 18 \n",
+ " Speed \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1936-05-08 \n",
+ " \n",
+ " \n",
+ " 21 \n",
+ " Sweet November \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1968-02-08 \n",
+ " \n",
+ " \n",
+ " 27 \n",
+ " The Night Before \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1988-04-15 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " Bill & Ted's Excellent Adventure \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1989-02-17 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " Bill & Ted's Bogus Journey \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1991-07-19 \n",
+ " \n",
+ " \n",
+ " 14 \n",
+ " Little Buddha \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1994-05-25 \n",
+ " \n",
+ " \n",
+ " 19 \n",
+ " Speed \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1994-06-10 \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " Johnny Mnemonic \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1995-05-26 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " A Walk in the Clouds \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1995-08-11 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " Chain Reaction \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1996-08-02 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " Feeling Minnesota \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1996-09-13 \n",
+ " \n",
+ " \n",
+ " 24 \n",
+ " The Devil's Advocate \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1997-10-17 \n",
+ " \n",
+ " \n",
+ " 26 \n",
+ " The Matrix \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 1999-03-31 \n",
+ " \n",
+ " \n",
+ " 28 \n",
+ " The Replacements \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 2000-08-11 \n",
+ " \n",
+ " \n",
+ " 22 \n",
+ " Sweet November \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 2001-02-16 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " Hard Ball \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 2001-09-14 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " Constantine \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 2005-02-18 \n",
+ " \n",
+ " \n",
+ " 25 \n",
+ " The Lake House \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 2006-06-16 \n",
+ " \n",
+ " \n",
+ " 20 \n",
+ " Street Kings \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 2008-04-11 \n",
+ " \n",
+ " \n",
+ " 23 \n",
+ " The Day the Earth Stood Still \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 2008-12-12 \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 47 Ronin \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 2013-12-25 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " John Wick \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 2014-10-24 \n",
+ " \n",
+ " \n",
+ " 12 \n",
+ " Knock Knock \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 2015-10-09 \n",
+ " \n",
+ " \n",
+ " 10 \n",
+ " John Wick: Chapter 2 \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 2017-02-10 \n",
+ " \n",
+ " \n",
+ " 13 \n",
+ " Knock Knock \n",
+ " Keanu Reeves \n",
+ " 1.0 \n",
+ " USA \n",
+ " 2017-10-06 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " title name n country date\n",
+ "17 Speed Keanu Reeves 1.0 USA 1922-10-22\n",
+ "18 Speed Keanu Reeves 1.0 USA 1936-05-08\n",
+ "21 Sweet November Keanu Reeves 1.0 USA 1968-02-08\n",
+ "27 The Night Before Keanu Reeves 1.0 USA 1988-04-15\n",
+ "3 Bill & Ted's Excellent Adventure Keanu Reeves 1.0 USA 1989-02-17\n",
+ "2 Bill & Ted's Bogus Journey Keanu Reeves 1.0 USA 1991-07-19\n",
+ "14 Little Buddha Keanu Reeves 1.0 USA 1994-05-25\n",
+ "19 Speed Keanu Reeves 1.0 USA 1994-06-10\n",
+ "11 Johnny Mnemonic Keanu Reeves 1.0 USA 1995-05-26\n",
+ "1 A Walk in the Clouds Keanu Reeves 1.0 USA 1995-08-11\n",
+ "4 Chain Reaction Keanu Reeves 1.0 USA 1996-08-02\n",
+ "6 Feeling Minnesota Keanu Reeves 1.0 USA 1996-09-13\n",
+ "24 The Devil's Advocate Keanu Reeves 1.0 USA 1997-10-17\n",
+ "26 The Matrix Keanu Reeves 1.0 USA 1999-03-31\n",
+ "28 The Replacements Keanu Reeves 1.0 USA 2000-08-11\n",
+ "22 Sweet November Keanu Reeves 1.0 USA 2001-02-16\n",
+ "7 Hard Ball Keanu Reeves 1.0 USA 2001-09-14\n",
+ "5 Constantine Keanu Reeves 1.0 USA 2005-02-18\n",
+ "25 The Lake House Keanu Reeves 1.0 USA 2006-06-16\n",
+ "20 Street Kings Keanu Reeves 1.0 USA 2008-04-11\n",
+ "23 The Day the Earth Stood Still Keanu Reeves 1.0 USA 2008-12-12\n",
+ "0 47 Ronin Keanu Reeves 1.0 USA 2013-12-25\n",
+ "9 John Wick Keanu Reeves 1.0 USA 2014-10-24\n",
+ "12 Knock Knock Keanu Reeves 1.0 USA 2015-10-09\n",
+ "10 John Wick: Chapter 2 Keanu Reeves 1.0 USA 2017-02-10\n",
+ "13 Knock Knock Keanu Reeves 1.0 USA 2017-10-06"
+ ]
+ },
+ "execution_count": 98,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
- " "
+ "keanu_lead = cast[(cast['name'] == 'Keanu Reeves') & (cast['n'] == 1.0)]\n",
+ "usa = release_dates[release_dates['country'] == 'USA']\n",
+ "keanu_merge = pd.merge(keanu_lead, usa, how='left', on='title')\n",
+ "keanu_merge_usa = keanu_merge[keanu_merge['country'] == 'USA']\n",
+ "keanu_merge_usa[['title', 'name', 'n', 'country', 'date']].sort_values(by='date')"
]
},
{
@@ -1131,10 +3224,44 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 110,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/var/folders/b6/4qc_2zbx4bg37ybn_70yv7xc0000gn/T/ipykernel_27695/3150025360.py:4: SettingWithCopyWarning: \n",
+ "A value is trying to be set on a copy of a slice from a DataFrame.\n",
+ "Try using .loc[row_indexer,col_indexer] = value instead\n",
+ "\n",
+ "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
+ " keanu_movies_usa['month'] = keanu_movies_usa['date'].dt.month\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "keanu = cast[cast['name'] == 'Keanu Reeves']\n",
+ "keanu_movies = pd.merge(keanu, release_dates, how='left', on='title')\n",
+ "keanu_movies_usa = keanu_movies[keanu_movies['country'] == 'USA']\n",
+ "keanu_movies_usa['month'] = keanu_movies_usa['date'].dt.month\n",
+ "keanu_month_df = pd.DataFrame(keanu_movies_usa.groupby('month')['title'].count())\n",
+ "plt.bar(x=keanu_month_df.index, height=keanu_month_df['title'])\n",
+ "plt.title('Number of Keanu Reeves movies release in the USA by month')\n",
+ "plt.xlabel('Month Number')\n",
+ "plt.ylabel('Number of Films')\n",
+ "plt.show()"
+ ]
},
{
"cell_type": "markdown",
@@ -1143,6 +3270,35 @@
"### Section III - Q5: Make a bar plot showing the years in which movies with Ian McKellen tend to be released in the USA?"
]
},
+ {
+ "cell_type": "code",
+ "execution_count": 122,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "ian_df = cast[cast['name'] == 'Ian McKellen']\n",
+ "ian_merge = pd.merge(ian_df, release_dates, how='left', on='title')\n",
+ "ian_usa = ian_merge[ian_merge['country'] == 'USA']\n",
+ "ian_usa\n",
+ "ian_year_df = pd.DataFrame(ian_usa.groupby('year_y')['title'].count())\n",
+ "plt.bar(x=ian_year_df.index, height=ian_year_df['title'])\n",
+ "plt.title('Years when Ian Mckellen movies are released in the USA')\n",
+ "plt.xlabel('Years')\n",
+ "plt.ylabel('Number of Films')\n",
+ "plt.show()"
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,
@@ -1153,7 +3309,7 @@
],
"metadata": {
"kernelspec": {
- "display_name": "Python 3",
+ "display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -1167,9 +3323,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.7.6-final"
+ "version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
-}
\ No newline at end of file
+}
diff --git a/mec-5.4.4-json-data-wrangling-mini-project/Mini_Project_Wrangling_Json_Exercise.ipynb b/mec-5.4.4-json-data-wrangling-mini-project/Mini_Project_Wrangling_Json_Exercise.ipynb
index a8bfea9e2..5ccdabb42 100755
--- a/mec-5.4.4-json-data-wrangling-mini-project/Mini_Project_Wrangling_Json_Exercise.ipynb
+++ b/mec-5.4.4-json-data-wrangling-mini-project/Mini_Project_Wrangling_Json_Exercise.ipynb
@@ -80,9 +80,7 @@
{
"cell_type": "code",
"execution_count": 7,
- "metadata": {
- "collapsed": false
- },
+ "metadata": {},
"outputs": [
{
"data": {
@@ -148,9 +146,7 @@
{
"cell_type": "code",
"execution_count": 8,
- "metadata": {
- "collapsed": false
- },
+ "metadata": {},
"outputs": [
{
"data": {
@@ -246,9 +242,7 @@
{
"cell_type": "code",
"execution_count": 9,
- "metadata": {
- "collapsed": false
- },
+ "metadata": {},
"outputs": [
{
"data": {
@@ -433,9 +427,7 @@
{
"cell_type": "code",
"execution_count": 10,
- "metadata": {
- "collapsed": false
- },
+ "metadata": {},
"outputs": [
{
"data": {
@@ -586,35 +578,743 @@
"3. In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in."
]
},
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " sector \n",
+ " supplementprojectflg \n",
+ " projectfinancialtype \n",
+ " prodline \n",
+ " mjtheme \n",
+ " idacommamt \n",
+ " impagency \n",
+ " project_name \n",
+ " mjthemecode \n",
+ " closingdate \n",
+ " ... \n",
+ " majorsector_percent \n",
+ " board_approval_month \n",
+ " theme_namecode \n",
+ " countryname \n",
+ " url \n",
+ " source \n",
+ " projectstatusdisplay \n",
+ " ibrdcommamt \n",
+ " sector_namecode \n",
+ " _id \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " [{'Name': 'Primary education'}, {'Name': 'Seco... \n",
+ " N \n",
+ " IDA \n",
+ " PE \n",
+ " [Human development] \n",
+ " 130000000 \n",
+ " MINISTRY OF EDUCATION \n",
+ " Ethiopia General Education Quality Improvement... \n",
+ " 8,11 \n",
+ " 2018-07-07T00:00:00Z \n",
+ " ... \n",
+ " [{'Percent': 46, 'Name': 'Education'}, {'Perce... \n",
+ " November \n",
+ " [{'code': '65', 'name': 'Education for all'}] \n",
+ " Federal Democratic Republic of Ethiopia \n",
+ " http://www.worldbank.org/projects/P129828/ethi... \n",
+ " IBRD \n",
+ " Active \n",
+ " 0 \n",
+ " [{'code': 'EP', 'name': 'Primary education'}, ... \n",
+ " {'$oid': '52b213b38594d8a2be17c780'} \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " [{'Name': 'Public administration- Other social... \n",
+ " N \n",
+ " OTHER \n",
+ " RE \n",
+ " [Economic management, Social protection and ri... \n",
+ " 0 \n",
+ " MINISTRY OF FINANCE \n",
+ " TN: DTF Social Protection Reforms Support \n",
+ " 1,6 \n",
+ " NaN \n",
+ " ... \n",
+ " [{'Percent': 70, 'Name': 'Public Administratio... \n",
+ " November \n",
+ " [{'code': '24', 'name': 'Other economic manage... \n",
+ " Republic of Tunisia \n",
+ " http://www.worldbank.org/projects/P144674?lang=en \n",
+ " IBRD \n",
+ " Active \n",
+ " 0 \n",
+ " [{'code': 'BS', 'name': 'Public administration... \n",
+ " {'$oid': '52b213b38594d8a2be17c781'} \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " [{'Name': 'Rural and Inter-Urban Roads and Hig... \n",
+ " Y \n",
+ " IDA \n",
+ " PE \n",
+ " [Trade and integration, Public sector governan... \n",
+ " 6060000 \n",
+ " MINISTRY OF TRANSPORT AND COMMUNICATIONS \n",
+ " Tuvalu Aviation Investment Project - Additiona... \n",
+ " 5,2,11,6 \n",
+ " NaN \n",
+ " ... \n",
+ " [{'Percent': 100, 'Name': 'Transportation'}] \n",
+ " November \n",
+ " [{'code': '47', 'name': 'Regional integration'... \n",
+ " Tuvalu \n",
+ " http://www.worldbank.org/projects/P145310?lang=en \n",
+ " IBRD \n",
+ " Active \n",
+ " 0 \n",
+ " [{'code': 'TI', 'name': 'Rural and Inter-Urban... \n",
+ " {'$oid': '52b213b38594d8a2be17c782'} \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " [{'Name': 'Other social services'}] \n",
+ " N \n",
+ " OTHER \n",
+ " RE \n",
+ " [Social dev/gender/inclusion, Social dev/gende... \n",
+ " 0 \n",
+ " LABOR INTENSIVE PUBLIC WORKS PROJECT PMU \n",
+ " Gov't and Civil Society Organization Partnership \n",
+ " 7,7 \n",
+ " NaN \n",
+ " ... \n",
+ " [{'Percent': 100, 'Name': 'Health and other so... \n",
+ " October \n",
+ " [{'code': '57', 'name': 'Participation and civ... \n",
+ " Republic of Yemen \n",
+ " http://www.worldbank.org/projects/P144665?lang=en \n",
+ " IBRD \n",
+ " Active \n",
+ " 0 \n",
+ " [{'code': 'JB', 'name': 'Other social services'}] \n",
+ " {'$oid': '52b213b38594d8a2be17c783'} \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " [{'Name': 'General industry and trade sector'}... \n",
+ " N \n",
+ " IDA \n",
+ " PE \n",
+ " [Trade and integration, Financial and private ... \n",
+ " 13100000 \n",
+ " MINISTRY OF TRADE AND INDUSTRY \n",
+ " Second Private Sector Competitiveness and Econ... \n",
+ " 5,4 \n",
+ " 2019-04-30T00:00:00Z \n",
+ " ... \n",
+ " [{'Percent': 50, 'Name': 'Industry and trade'}... \n",
+ " October \n",
+ " [{'code': '45', 'name': 'Export development an... \n",
+ " Kingdom of Lesotho \n",
+ " http://www.worldbank.org/projects/P144933/seco... \n",
+ " IBRD \n",
+ " Active \n",
+ " 0 \n",
+ " [{'code': 'YZ', 'name': 'General industry and ... \n",
+ " {'$oid': '52b213b38594d8a2be17c784'} \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
5 rows × 50 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " sector supplementprojectflg \\\n",
+ "0 [{'Name': 'Primary education'}, {'Name': 'Seco... N \n",
+ "1 [{'Name': 'Public administration- Other social... N \n",
+ "2 [{'Name': 'Rural and Inter-Urban Roads and Hig... Y \n",
+ "3 [{'Name': 'Other social services'}] N \n",
+ "4 [{'Name': 'General industry and trade sector'}... N \n",
+ "\n",
+ " projectfinancialtype prodline \\\n",
+ "0 IDA PE \n",
+ "1 OTHER RE \n",
+ "2 IDA PE \n",
+ "3 OTHER RE \n",
+ "4 IDA PE \n",
+ "\n",
+ " mjtheme idacommamt \\\n",
+ "0 [Human development] 130000000 \n",
+ "1 [Economic management, Social protection and ri... 0 \n",
+ "2 [Trade and integration, Public sector governan... 6060000 \n",
+ "3 [Social dev/gender/inclusion, Social dev/gende... 0 \n",
+ "4 [Trade and integration, Financial and private ... 13100000 \n",
+ "\n",
+ " impagency \\\n",
+ "0 MINISTRY OF EDUCATION \n",
+ "1 MINISTRY OF FINANCE \n",
+ "2 MINISTRY OF TRANSPORT AND COMMUNICATIONS \n",
+ "3 LABOR INTENSIVE PUBLIC WORKS PROJECT PMU \n",
+ "4 MINISTRY OF TRADE AND INDUSTRY \n",
+ "\n",
+ " project_name mjthemecode \\\n",
+ "0 Ethiopia General Education Quality Improvement... 8,11 \n",
+ "1 TN: DTF Social Protection Reforms Support 1,6 \n",
+ "2 Tuvalu Aviation Investment Project - Additiona... 5,2,11,6 \n",
+ "3 Gov't and Civil Society Organization Partnership 7,7 \n",
+ "4 Second Private Sector Competitiveness and Econ... 5,4 \n",
+ "\n",
+ " closingdate ... \\\n",
+ "0 2018-07-07T00:00:00Z ... \n",
+ "1 NaN ... \n",
+ "2 NaN ... \n",
+ "3 NaN ... \n",
+ "4 2019-04-30T00:00:00Z ... \n",
+ "\n",
+ " majorsector_percent board_approval_month \\\n",
+ "0 [{'Percent': 46, 'Name': 'Education'}, {'Perce... November \n",
+ "1 [{'Percent': 70, 'Name': 'Public Administratio... November \n",
+ "2 [{'Percent': 100, 'Name': 'Transportation'}] November \n",
+ "3 [{'Percent': 100, 'Name': 'Health and other so... October \n",
+ "4 [{'Percent': 50, 'Name': 'Industry and trade'}... October \n",
+ "\n",
+ " theme_namecode \\\n",
+ "0 [{'code': '65', 'name': 'Education for all'}] \n",
+ "1 [{'code': '24', 'name': 'Other economic manage... \n",
+ "2 [{'code': '47', 'name': 'Regional integration'... \n",
+ "3 [{'code': '57', 'name': 'Participation and civ... \n",
+ "4 [{'code': '45', 'name': 'Export development an... \n",
+ "\n",
+ " countryname \\\n",
+ "0 Federal Democratic Republic of Ethiopia \n",
+ "1 Republic of Tunisia \n",
+ "2 Tuvalu \n",
+ "3 Republic of Yemen \n",
+ "4 Kingdom of Lesotho \n",
+ "\n",
+ " url source \\\n",
+ "0 http://www.worldbank.org/projects/P129828/ethi... IBRD \n",
+ "1 http://www.worldbank.org/projects/P144674?lang=en IBRD \n",
+ "2 http://www.worldbank.org/projects/P145310?lang=en IBRD \n",
+ "3 http://www.worldbank.org/projects/P144665?lang=en IBRD \n",
+ "4 http://www.worldbank.org/projects/P144933/seco... IBRD \n",
+ "\n",
+ " projectstatusdisplay ibrdcommamt \\\n",
+ "0 Active 0 \n",
+ "1 Active 0 \n",
+ "2 Active 0 \n",
+ "3 Active 0 \n",
+ "4 Active 0 \n",
+ "\n",
+ " sector_namecode \\\n",
+ "0 [{'code': 'EP', 'name': 'Primary education'}, ... \n",
+ "1 [{'code': 'BS', 'name': 'Public administration... \n",
+ "2 [{'code': 'TI', 'name': 'Rural and Inter-Urban... \n",
+ "3 [{'code': 'JB', 'name': 'Other social services'}] \n",
+ "4 [{'code': 'YZ', 'name': 'General industry and ... \n",
+ "\n",
+ " _id \n",
+ "0 {'$oid': '52b213b38594d8a2be17c780'} \n",
+ "1 {'$oid': '52b213b38594d8a2be17c781'} \n",
+ "2 {'$oid': '52b213b38594d8a2be17c782'} \n",
+ "3 {'$oid': '52b213b38594d8a2be17c783'} \n",
+ "4 {'$oid': '52b213b38594d8a2be17c784'} \n",
+ "\n",
+ "[5 rows x 50 columns]"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "import json\n",
+ "from pandas.io.json import json_normalize\n",
+ "\n",
+ "data = pd.read_json('data/world_bank_projects.json')\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'sector': [{'Name': 'Primary education'},\n",
+ " {'Name': 'Secondary education'},\n",
+ " {'Name': 'Public administration- Other social services'},\n",
+ " {'Name': 'Tertiary education'}],\n",
+ " 'supplementprojectflg': 'N',\n",
+ " 'projectfinancialtype': 'IDA',\n",
+ " 'prodline': 'PE',\n",
+ " 'mjtheme': ['Human development'],\n",
+ " 'idacommamt': 130000000,\n",
+ " 'impagency': 'MINISTRY OF EDUCATION',\n",
+ " 'project_name': 'Ethiopia General Education Quality Improvement Project II',\n",
+ " 'mjthemecode': '8,11',\n",
+ " 'closingdate': '2018-07-07T00:00:00Z',\n",
+ " 'totalcommamt': 130000000,\n",
+ " 'id': 'P129828',\n",
+ " 'mjsector_namecode': [{'code': 'EX', 'name': 'Education'},\n",
+ " {'code': 'EX', 'name': 'Education'},\n",
+ " {'code': 'BX', 'name': 'Public Administration, Law, and Justice'},\n",
+ " {'code': 'EX', 'name': 'Education'}],\n",
+ " 'docty': 'Project Information Document,Indigenous Peoples Plan,Project Information Document',\n",
+ " 'sector1': {'Percent': 46, 'Name': 'Primary education'},\n",
+ " 'lendinginstr': 'Investment Project Financing',\n",
+ " 'countrycode': 'ET',\n",
+ " 'sector2': {'Percent': 26, 'Name': 'Secondary education'},\n",
+ " 'totalamt': 130000000,\n",
+ " 'mjtheme_namecode': [{'code': '8', 'name': 'Human development'},\n",
+ " {'code': '11', 'name': ''}],\n",
+ " 'boardapprovaldate': '2013-11-12T00:00:00Z',\n",
+ " 'countryshortname': 'Ethiopia',\n",
+ " 'sector4': {'Percent': 12, 'Name': 'Tertiary education'},\n",
+ " 'prodlinetext': 'IBRD/IDA',\n",
+ " 'productlinetype': 'L',\n",
+ " 'regionname': 'Africa',\n",
+ " 'status': 'Active',\n",
+ " 'country_namecode': 'Federal Democratic Republic of Ethiopia!$!ET',\n",
+ " 'envassesmentcategorycode': 'C',\n",
+ " 'project_abstract': {'cdata': 'The development objective of the Second Phase of General Education Quality Improvement Project for Ethiopia is to improve learning conditions in primary and secondary schools and strengthen institutions at different levels of educational administration. The project has six components. The first component is curriculum, textbooks, assessment, examinations, and inspection. This component will support improvement of learning conditions in grades KG-12 by providing increased access to teaching and learning materials and through improvements to the curriculum by assessing the strengths and weaknesses of the current curriculum. This component has following four sub-components: (i) curriculum reform and implementation; (ii) teaching and learning materials; (iii) assessment and examinations; and (iv) inspection. The second component is teacher development program (TDP). This component will support improvements in learning conditions in both primary and secondary schools by advancing the quality of teaching in general education through: (a) enhancing the training of pre-service teachers in teacher education institutions; and (b) improving the quality of in-service teacher training. This component has following three sub-components: (i) pre-service teacher training; (ii) in-service teacher training; and (iii) licensing and relicensing of teachers and school leaders. The third component is school improvement plan. This component will support the strengthening of school planning in order to improve learning outcomes, and to partly fund the school improvement plans through school grants. It has following two sub-components: (i) school improvement plan; and (ii) school grants. The fourth component is management and capacity building, including education management information systems (EMIS). This component will support management and capacity building aspect of the project. This component has following three sub-components: (i) capacity building for education planning and management; (ii) capacity building for school planning and management; and (iii) EMIS. The fifth component is improving the quality of learning and teaching in secondary schools and universities through the use of information and communications technology (ICT). It has following five sub-components: (i) national policy and institution for ICT in general education; (ii) national ICT infrastructure improvement plan for general education; (iii) develop an integrated monitoring, evaluation, and learning system specifically for the ICT component; (iv) teacher professional development in the use of ICT; and (v) provision of limited number of e-Braille display readers with the possibility to scale up to all secondary education schools based on the successful implementation and usage of the readers. The sixth component is program coordination, monitoring and evaluation, and communication. It will support institutional strengthening by developing capacities in all aspects of program coordination, monitoring and evaluation; a new sub-component on communications will support information sharing for better management and accountability. It has following three sub-components: (i) program coordination; (ii) monitoring and evaluation (M and E); and (iii) communication.'},\n",
+ " 'approvalfy': 1999,\n",
+ " 'projectdocs': [{'DocDate': '28-AUG-2013',\n",
+ " 'EntityID': '090224b081e545fb_1_0',\n",
+ " 'DocURL': 'http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=090224b081e545fb_1_0',\n",
+ " 'DocType': 'PID',\n",
+ " 'DocTypeDesc': 'Project Information Document (PID), Vol.'},\n",
+ " {'DocDate': '01-JUL-2013',\n",
+ " 'EntityID': '000442464_20130920111729',\n",
+ " 'DocURL': 'http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000442464_20130920111729',\n",
+ " 'DocType': 'IP',\n",
+ " 'DocTypeDesc': 'Indigenous Peoples Plan (IP), Vol.1 of 1'},\n",
+ " {'DocDate': '22-NOV-2012',\n",
+ " 'EntityID': '090224b0817b19e2_1_0',\n",
+ " 'DocURL': 'http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=090224b0817b19e2_1_0',\n",
+ " 'DocType': 'PID',\n",
+ " 'DocTypeDesc': 'Project Information Document (PID), Vol.'}],\n",
+ " 'lendprojectcost': 550000000,\n",
+ " 'lendinginstrtype': 'IN',\n",
+ " 'theme1': {'Percent': 100, 'Name': 'Education for all'},\n",
+ " 'grantamt': 0,\n",
+ " 'themecode': '65',\n",
+ " 'borrower': 'FEDERAL DEMOCRATIC REPUBLIC OF ETHIOPIA',\n",
+ " 'sectorcode': 'ET,BS,ES,EP',\n",
+ " 'sector3': {'Percent': 16,\n",
+ " 'Name': 'Public administration- Other social services'},\n",
+ " 'majorsector_percent': [{'Percent': 46, 'Name': 'Education'},\n",
+ " {'Percent': 26, 'Name': 'Education'},\n",
+ " {'Percent': 16, 'Name': 'Public Administration, Law, and Justice'},\n",
+ " {'Percent': 12, 'Name': 'Education'}],\n",
+ " 'board_approval_month': 'November',\n",
+ " 'theme_namecode': [{'code': '65', 'name': 'Education for all'}],\n",
+ " 'countryname': 'Federal Democratic Republic of Ethiopia',\n",
+ " 'url': 'http://www.worldbank.org/projects/P129828/ethiopia-general-education-quality-improvement-project-ii?lang=en',\n",
+ " 'source': 'IBRD',\n",
+ " 'projectstatusdisplay': 'Active',\n",
+ " 'ibrdcommamt': 0,\n",
+ " 'sector_namecode': [{'code': 'EP', 'name': 'Primary education'},\n",
+ " {'code': 'ES', 'name': 'Secondary education'},\n",
+ " {'code': 'BS', 'name': 'Public administration- Other social services'},\n",
+ " {'code': 'ET', 'name': 'Tertiary education'}],\n",
+ " '_id': {'$oid': '52b213b38594d8a2be17c780'}}"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "raw_data = json.load((open('data/world_bank_projects.json')))\n",
+ "raw_data[0]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Task 1: Find Top 10 Countries with most projects"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "countryname\n",
+ "People's Republic of China 19\n",
+ "Republic of Indonesia 19\n",
+ "Socialist Republic of Vietnam 17\n",
+ "Republic of India 16\n",
+ "Republic of Yemen 13\n",
+ "People's Republic of Bangladesh 12\n",
+ "Nepal 12\n",
+ "Kingdom of Morocco 12\n",
+ "Republic of Mozambique 11\n",
+ "Africa 11\n",
+ "Name: project_name, dtype: int64"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#Top 10 countries with most projects\n",
+ "data.groupby('countryname')['project_name'].count().sort_values(ascending=False).head(10"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Task2: Find the top 10 major project themes (using column 'mjtheme_namecode')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " code \n",
+ " name \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 8 \n",
+ " Human development \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 11 \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 1 \n",
+ " Economic management \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 6 \n",
+ " Social protection and risk management \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 5 \n",
+ " Trade and integration \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 1494 \n",
+ " 10 \n",
+ " Rural development \n",
+ " \n",
+ " \n",
+ " 1495 \n",
+ " 9 \n",
+ " Urban development \n",
+ " \n",
+ " \n",
+ " 1496 \n",
+ " 8 \n",
+ " Human development \n",
+ " \n",
+ " \n",
+ " 1497 \n",
+ " 5 \n",
+ " Trade and integration \n",
+ " \n",
+ " \n",
+ " 1498 \n",
+ " 4 \n",
+ " Financial and private sector development \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
1499 rows × 2 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " code name\n",
+ "0 8 Human development\n",
+ "1 11 \n",
+ "2 1 Economic management\n",
+ "3 6 Social protection and risk management\n",
+ "4 5 Trade and integration\n",
+ "... ... ...\n",
+ "1494 10 Rural development\n",
+ "1495 9 Urban development\n",
+ "1496 8 Human development\n",
+ "1497 5 Trade and integration\n",
+ "1498 4 Financial and private sector development\n",
+ "\n",
+ "[1499 rows x 2 columns]"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "mjtheme = pd.json_normalize(raw_data, 'mjtheme_namecode')\n",
+ "mjtheme"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'8': 'Human development',\n",
+ " '1': 'Economic management',\n",
+ " '6': 'Social protection and risk management',\n",
+ " '5': 'Trade and integration',\n",
+ " '2': 'Public sector governance',\n",
+ " '11': 'Environment and natural resources management',\n",
+ " '7': 'Social dev/gender/inclusion',\n",
+ " '4': 'Financial and private sector development',\n",
+ " '10': 'Rural development',\n",
+ " '9': 'Urban development',\n",
+ " '3': 'Rule of law'}"
+ ]
+ },
+ "execution_count": 42,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#Creating dictionary to reference values\n",
+ "mjtheme_subset = mjtheme[mjtheme['name'] != '']\n",
+ "project_dict = {}\n",
+ "mjtheme['code'][0], mjtheme['name'][0]\n",
+ "for i in range(len(mjtheme['code'])):\n",
+ " if mjtheme['name'][i] != '':\n",
+ " project_dict[mjtheme['code'][i]] = mjtheme['name'][i]\n",
+ " i += 1\n",
+ "project_dict"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 Human development\n",
+ "1 \n",
+ "2 Economic management\n",
+ "3 Social protection and risk management\n",
+ "4 Trade and integration\n",
+ " ... \n",
+ "1494 Rural development\n",
+ "1495 Urban development\n",
+ "1496 Human development\n",
+ "1497 Trade and integration\n",
+ "1498 Financial and private sector development\n",
+ "Name: name, Length: 1499, dtype: object"
+ ]
+ },
+ "execution_count": 46,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "mjtheme['name']"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 3. In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 Human development\n",
+ "1 Environment and natural resources management\n",
+ "2 Economic management\n",
+ "3 Social protection and risk management\n",
+ "4 Trade and integration\n",
+ " ... \n",
+ "1494 Rural development\n",
+ "1495 Urban development\n",
+ "1496 Human development\n",
+ "1497 Trade and integration\n",
+ "1498 Financial and private sector development\n",
+ "Name: name, Length: 1499, dtype: object"
+ ]
+ },
+ "execution_count": 47,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#adding values back in with reference to dicitonary of values\n",
+ "for i in range(len(mjtheme['name'])):\n",
+ " if mjtheme['name'][i] == '':\n",
+ " mjtheme['name'][i] = project_dict[mjtheme['code'][i]]\n",
+ "mjtheme['name']"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 49,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "False"
+ ]
+ },
+ "execution_count": 49,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "mjtheme['name'].isna().any()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 52,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Environment and natural resources management 250\n",
+ "Rural development 216\n",
+ "Human development 210\n",
+ "Public sector governance 199\n",
+ "Social protection and risk management 168\n",
+ "Financial and private sector development 146\n",
+ "Social dev/gender/inclusion 130\n",
+ "Trade and integration 77\n",
+ "Urban development 50\n",
+ "Economic management 38\n",
+ "Name: code, dtype: int64"
+ ]
+ },
+ "execution_count": 52,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#Top Projects below\n",
+ "top_projects = mjtheme.groupby('name')['code'].count().sort_values(ascending=False).head(10)\n",
+ "top_projects"
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,
- "metadata": {
- "collapsed": true
- },
+ "metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
- "display_name": "Python 2",
+ "display_name": "Python 3 (ipykernel)",
"language": "python",
- "name": "python2"
+ "name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
- "version": 2
+ "version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
- "pygments_lexer": "ipython2",
- "version": "2.7.9"
+ "pygments_lexer": "ipython3",
+ "version": "3.10.6"
}
},
"nbformat": 4,
- "nbformat_minor": 0
+ "nbformat_minor": 1
}