Chapter 2 Metro Data Warehouse Structure
2.1 Location and Access
The Brookings Metro data warehouse is a private repository at Brookings Institution GitHub, with all datasets and source code accessible to authorized users only.
There is also a read-only copy on the program’s V: drive
, synced whenever the data warehouse is updated. The local copy is accessible via all desktops and research servers that are connected to this drive. Any statistical programming package (including R
, SAS
, and Stata
) orExcel
on these machines can access the warehouse using navigation through V:/_metro_data_warehouse
.
2.2 Structure
The data warehouse currently houses four types of datasets: crosswalks, processed datasets from Metro researchers, raw datasets acquired from public agencies or private vendors, and spatial datasets including shapefiles and base maps. The current folder structure is listed below:
## -- crosswalks
## |__cbsa.csv
## |__county_cbsa_st.csv
## |__NCESSCH2FIPS.rda
## |__place2county.csv
## |__puma2county.csv
## |__st.csv
## |__superseded
## |__tract2place.csv
## |__zcta2county.csv
## -- data_final
## |__access across america
## |__acs5_2017
## |__acs5_2018
## |__biz_rd
## |__broadband
## |__digitalization
## |__export_monitor
## |__housing_price
## |__inc5000
## |__job_density
## |__low_wage_worker
## |__metro_monitor_2019
## |__metro_monitor_2020
## |__millenials diversity
## |__oecd_patent
## |__old_industrial_cities
## |__opportunity industries
## |__out_of_work
## |__patent_complexity
## |__school proficiency
## |__univ_licensing
## |__univ_rd
## |__univ_rd_detail
## |__uspto
## |__vc
## -- data_raw
## |__2017 ACS
## |__2018 ACS
## |__BEA GDP 2001 - 2018
## |__CBP EMP
## |__chetty opportunity
## |__Emsi
## |__LEHD LODES - Shortcut.lnk
## |__oxford_global_metros
## -- data_spatial
## |__basemaps
## |__README.pdf
## |__README.pptx
## |__shapefiles
## |__Thumbs.db
## -- desktop.ini
2.3 Maintenance and Update
For crosswalks, data files whose definitions are regularly updated, including CBSAs, congressional districts, and Public Use Microdata Areas (PUMAs), are updated immediately after the release.
Designations that rely on data that are updated regularly, such as the “top 100 largest metro areas” are also updated immediately after the data on which they rely are released.
Older vintages of these definitions are preserved within the warehouse, labeled as “DEPRECIATED”. Further documentation on the sources and vintages of the data underlying these designations will be available in the code that is used to generate them, including in human-readable comments.
The datasets on Metro Data Warehouse are made available “as is” either by accessing open-source data repositories or provided to us by authors of research publications. We do our best to ensure that the data in the data warehouse is complete, accurate and useful. However, because the processing required to make the data useful is complex, we cannot be liable for omissions or inaccuracies.