Data sets

Use this data wisely to practice your new skills and discover amazing insights!

Sector / TypeSourceDescription / Link
AgricultureR package by Kevin Wright, This package contains datasets from published papers and books relating to agriculture including field crops, tree crops, animal studies, and a few others.
BiologyThe National Center for Biotechnology Information GEO is a public functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted. Tools are provided to help users query and download experiments and curated gene expression profiles.
BiologyThe National Institute of Health Microbiome Project A collection of data about the human genome
BiologyBroad Institute A collection of genomics cancer data
BiologyInterdisciplinary Computing and Complex BioSystems (ICOS) research group A database of protein structures. The ICOS PSP benchmarks repository contains an adjustable real-world family of benchmarks suitable for testing the scalability of classification/regression methods. When we test a machine learning method we usually choose a test suite containing datasets with a broad set of characteristics, as we are interested in knowing how the learning method reacts to a veriety of scenarios. The PSP field provides us with a whole family of real-world classification/regression problems that can be adjusted almost arbitrarily in terms of number of variables, number of classes, class balance, etc. Thus, these datasets are an ideal benchmark suite for data mining methods.
Consumer retailBest Buy Provides access to Best Buy's product data
Consumer retailWalmart The Walmart Open API provides access to its extensive product catalog
CrimeMontgomery County, MD Traffic citations in Montgomery County, MD
CrimeCity of Cambridge, MA City of Cambridge, MA crime data
CrimeUniversity of Maryland A global terrorism database of attacks and their perpetrators
Data set aggregators & APIsKansas City, MO Kansas City public data
Data set aggregators & APIsUniversity of California at Irvine An aggregation of data sets that can be used to test and practice machine learning alorithms
Data set aggregators & APIsCity of Cambridge, MA Open data provided by the city of Cambridge, MA
Data set aggregators & Various survey data
Data set aggregators & APIsNYC open data A collection of datasets about the population and economy of New York City
Data set aggregators & APIsThe United Kingdom Data Service The UK Data Service provides access to over 6,000 digital data collections for research and teaching purposes covering an extensive range of key economic and social data, both quantitative and qualitative, and spanning many disciplines and themes.
Data set aggregators & APIsNew York Times A list of APIs provided by the New York Times about a range of subjects including articles, blog and political data
Data set aggregators & APIsCanada's Open Government portal Here you can explore how the Government of Canada is working with the national and international open government community to create greater transparency and accountability, increase citizen engagement, and drive innovation and economic opportunities through Open Data, Open Information, and Open Dialogue.
Data set aggregators & APIsWiki Open government intiatives archive
Data set aggregators & APIsSimply Stats List of cities with open data
Data set aggregators & APIsCity of London, U.K. Various statisitcs about the population and economy of London, U.K.
Data set aggregators & APIsStatistics New Zealand Tatauranga Aotearoa Statistics collected by the New Zealand government
Data set aggregators & APIsNYC Open Data Hundreds of data sets containing information about New York City
Data set aggregators & APIsDataPortals A list of open data portals around the world
Data set aggregators & APIsU.S. Open Data The home of the U.S. Government_s open data. Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more.
Data set aggregators & APIsUnited Kingdom Open Data An archive of data made available by the British government
Data set aggregators & APIsFrance Open Data An archive of data made available by the French government
Data set aggregators & APIsThe City of San Francisco An archive of data about San Francisco
Data set aggregators & APIsU.S. Federal Government Agencies ( An archive of data from various U.S. government agencies
Data set aggregators & APIsCausality Workbench A broad aggregation of data sets intended to test machine learning skills and algorithms
Data set aggregators & APIsKaggle Data sets from the world's largest data science and machine learning competition organizer
Data set aggregators & APIsMachine Learning Data Set Repository A broad range of data sets intended to be mined and examined with machine learning techniques
Data set aggregators & APIsReddit Data repositories posted on Reddit
Data set aggregators & APIsR A collection of datasets that come with R
Data set aggregators & APIsGoogle Google's directory of publicly data sets on a broad range of topics
Data set aggregators & APIsStats4Stem A collection of datasets that come with R
Data set aggregators & APIsThe Washington Post A collection of data sets on demographics, health, safety, real estate, sports, education and government & politics assembled by The Washington Post
Data set aggregators & APIsData Market This is a collection of time series data on a broad variety of topics
Data set aggregators & APIsAugmented Intel Searchable list of public data mining data sets
Data set aggregators & APIsProgrammableWeb A collection of APIs across a broad range of sectors
Data set aggregators & APIsFederal Emergency Management Agency (FEMA) An aggregation of FEMA data sets about housing, public assistance, hazard mitigants, etc.
Data set aggregators & APIsYahoo! PlaceSpotter is a web service that identifies places mentioned in text, disambiguates those places, and returns unique identifiers (WOEIDs) for each. This also includes information about how many times the place was found in the text, and where in the text it was found.
Data set aggregators & APIsUniversity College London A database of web searches and click-throughs aggregated by the University College London
Economics & DemographicsCIA World Fact book The World Factbook, produced for US policymakers and coordinated throughout the US Intelligence Community, marshals facts on every country, dependency, and geographic entity in the world. The Factbook provides information on the history, people, government, economy, energy, geography, communications, transportation, military, and transnational issues for 267 world entities.
Economics & DemographicsKnoema An aggregation of data sets on over 1,000 about the population and economic development of numerous countries
Economics & DemographicsUnited Nations A collection of 34 databases with over 64 million records on economic and demographic trends across the globe
Economics & DemographicsCity Data A collection of profiles of all cities in the U.S.
Economics & DemographicsCKAN An aggregation of open data sites from around the world including federal government and city government data
Economics & DemographicsThe World Bank An aggregation of demographic and economic data sets from around the world
Economics & DemographicsGapminder A collection of demographic and economic data from around the world as well as dynamic visualizations of these data sets
Economics & DemographicsThe Organisation for Economic Co-operation and Development (OECD) An archive of census and economic data from around the world collected by the OECD
Economics & FinanceInternational Monetary Fund (IMF) Various data sets provided by the International Monetary Fund
Economics & FinanceThe Federal Reserve Board A wide variety of economic data provided by The Federal Reserve Board of the United States
Economics & FinanceUniversity of Maryland Several thousand economic time series, produced by a number of U.S. Government agencies and distributed in a variety of formats and media, can be found here. Data has been put into a standard, highly efficient, easy-to-use form for personal computers and made publicly available through this site. These series include national income and product accounts (NIPA), labor statistics, price indices, current business indicators, and industrial production.
Economics & FinanceChicago Board Options Exchange Data on trading of financial options
Economics & FinanceFeredral Reserve Bank of St. Louis Economic research data compiled by the Feredral Reserve Bank of St. Louis
Economics & FinanceNASDAQ Stock market and financial data provided by NASDAQ
Economics & FinanceYahoo! Finance Stock market and financial data provided by Yahoo!
Economics & FinanceGoogle Finance Stock market and financial data provided by Google
Economics & FinanceAustralian Bureau of Statistics Census and economics data collected by the Australian Bureau of Statistics
Economics & FinanceKIVA A database of microfinance loans extended to small businesses around the world
EntertainmentColumbia University Feature data and metadata for 1 million songs
EntertainmentFreebase A community-curated database of well-known people, places, and things
EnvironmentClimatic Research Unit (University of East Anglia) A collection of data about weather patterns throughout the world
EnvironmentArizona State University A collection of geospatial data that is well suited for geographic analysis
Government & PoliticsArchive-It The leading web archiving service for collecting and accessing cultural heritage on the web
HealthcareMedicare Medicare claims data
HealthcareMedicare Medicare provider utilization and payment data
HealthcareCenters for Disease Control and Prevention (CDC) Data sets about the health of the U.S. population
HealthcareMedicare The National Health Expenditure Accounts (NHEA) are the official estimates of total health care spending in the United States. Dating back to 1960, the NHEA measures annual U.S. expenditures for health care goods and services, public health activities, government administration, the net cost of health insurance, and investment related to health care. The data are presented by type of service, sources of funding, and type of sponsor.
HealthcareCenters for Disease Control and Prevention (CDC) US CDC Public Health datasets
HealthcareThe Food and Drug Administration OpenFDA provides APIs for a number of high-value structured datasets, including adverse events, drug product labeling, and recall enforcement reports.
HealthcareThe Office on Women's Health The system provides state- and county-level data for all 50 states, the District of Columbia, and US territories and possessions. Data are available by gender, race and ethnicity and come from a variety of national and state sources. The system is organized into eleven main categories, including demographics, mortality, natality, reproductive health, violence, prevention, disease and mental health. Within each main category, there are numerous subcategories.
HealthcareAARP The AARP Public Policy Institute analyzes and publishes a wide range of state-specific data related to Americans 50+.
Networks dataPrinceton University The International Networks Archive collects extensive current and historical data in the numerous areas. All of the data is public and available for free download.
Networks dataCarnegie Mellon University Network data provided by
Networks dataThe Koblenz Network Collection (KONECT) KONECT contains over a hundred network datasets of various types, including directed, undirected, bipartite, weighted, unweighted, signed and rating networks. The networks of KONECT are collected from many diverse areas such as social networks, hyperlink networks, authorship networks, physical networks, interaction networks and communication networks.
Networks dataCarnegie Mellon University This is the collection of Enron e-mails released after one of the largest frauds in U.S. history was publicly revealed. This data can be used for text mining and network analysis.
Networks dataStanford University This is one of the largest publicly available collections of network data provided by Stanford University for research puposes
Networks dataWikipedia Wikipedia allows its entire database to be downloaded. One file that is available for download is a list of all page-to-page links. This might therefore be an excellent intermediate-sized data set to try out techniques such as PageRank.
Networks dataTore Opsahl's blog A database of social and transportation network data
PoliticsNew York Times With the Campaign Finance API, you can retrieve data from United States Federal Election Commission filings. is the publicly accessible, searchable website mandated by the Federal Funding Accountability and Transparency Act of 2006 to give the American public access to information on how their tax dollars are spent.
PoliticsGoogle For any U.S. residential address, you can look up who represents that address at each elected level of government. During supported elections, you can also look up polling places, early vote location, candidate data, and other election official information.
PoliticsFederal Election Commission U.S. campaign finance reports and data
Public opinionGeneral Social Survey The GSS contains a standard 'core' of demographic, behavioral, and attitudinal questions, plus topics of special interest. Many of the core questions have remained unchanged since 1972 to facilitate time-trend studies as well as replication of earlier findings. The GSS takes the pulse of America, and is a unique and valuable resource. It has tracked the opinions of Americans over the last four decades.
Public opinionUniversity of California, Los Angeles This is a collection of survey data of the American public
SpaceNASA NSSDCA archives more than 230 TB of digital data from about 550 mostly-NASA space science spacecraft, of which the most important 7 TB are electronically accessible. NBA play-by-play data
TransportationCapital bikeshare A collection of data about the Washington D.C. bikeshare program
TransportationCarQuery CarQuery API is an easy to use JSON based API for retrieving detailed car information, including year, make, model, trim, and specifications.
TransportationMassachusetts Institute of Technology A compilation of airline data from airplane types to operating data
TransportationUnited States Department of Transportation A database of transportation statistics for the United States. Some data sets can be used to analyze the transportation network in the U.S.
Web trafficGoogle Trends A compilation of data about what people are Googling around the world
Data set aggregators & APIsR packagesType in this code in RStudio and a list of all the data sets that are included with all the packges you have installed will come up: data(package = .packages(all.available = TRUE))