Data and Scripts for the proposed sample design for CFS are stored in this repository. Following are the list of folders and their content.
Main data sources used for generating the sample data are stored in this folder.
FAF441_2016.csvis 2016 FAF acquired from https://www.bts.gov/fafCommodity_Naics.csvis the mapping made between the SCTG codes used in the FAF 2016 dataset and NAICS codes used in CBP dataset.CFS12.csvis the list of counties in 2012 CFS areas from https://www.census.gov/programs-surveys/cfs/about.htmlcbp16co.7z2016 County Business Pattern complete county file acquired from https://www.census.gov/data/datasets/2016/econ/cbp/2016-cbp.html. Use 7-Zip to extract the compressed archive. This is a huge text file with over 2 million rows and over 170 MB in size. It will crash most text editors like notepad!
SQL Scripts used to create tables and anaylze the raw data are store in this folder. We used PostgreSQL which is a free open source database management system (DBMS). The queries and functions can be run on PostgreSQL 9.6 or later. Running on other SQL compatible DBMSs such as MySQL/MriaDB or MS SQL Server may require minor modifications.
SQL_Scripts.sqlincludes the scripts for creating tables and all queries developed for cleaning and aggregating the data. The comments in this file provide a high level explanation of each step. We used Common Table Expressions (CTEs) to merge multiple related queries in one step.- `Generate_est.sql' includes a function written in procedural PostgreSQL language that generates a sampling frame with user defined parameters based on CBP and FAF datasets.
Includes the final output of the scripts in SQL folder applied to the data in Raw_Data.
fafcbp.csvis the combined FAF and CBP datasets in CSV format. It is the disaggregated FAF data by county and NAICS based on CBP data. This data is needed by thegenerate_estfunction presented inSQLfolder.100K_Frame_newCFS.csvis a set of 100,000 establishments generated with thegenerate_estfunction.
Includes the R scripts, functions used in the document.