Dataset Search: Google’s Gift to the Scientific Community

Datasets related to the government, or organizations like NASA and ProPublica, hosted on a digital library, will now be available on Dataset Search

New Update
dataset search

In what seems like an attempt to encourage more scientists, researchers and even journalists to use Google, the Internet-related-service company today launched a new search engine called Dataset Search. While emphasizing on the importance of data in the present times, Google says that it is launching Dataset Search so that scientists, students, and even journalists looking for data repositories to write and publish a story have easy access to datasets through this new webpage.


Millions of datasets related to the government, or organizations like NASA and ProPublica, which are hosted on a digital library or publishers’ site, will now be available on Dataset Search. This is almost similar to how Google Scholar works, which is the company’s popular search engine for reports and academic studies. Google, however, has developed certain guidelines for dataset providers so that Google or other search engines can understand the content of their pages better.

“Google's approach to dataset discovery makes use of and other metadata standards that can be added to pages that describe datasets. The purpose of this markup is to improve discovery of datasets from fields such as life sciences, social sciences, machine learning, civic and government data, and more,” says a blog post on Google. Some of the examples of what can be found and considered a dataset are as follows: a table or comma separated values that contain data, image capturing data, a collection of tables, files relating to machine learning, such as trained parameters or neural network structure definitions, and so on.

The guidelines also include information on the creator of datasets, timeline of publishing of the data, methodologies used to collect that data, and terms used for the data in question. Google is also requesting dataset providers, irrespective of whether they are a small or large organisation, to adopt this common standard so that all datasets are part of the robust ecosystem.

Google will then collect and use this information to analyze where different versions of the same dataset could be, and find publications that may be describing or discussing the dataset. Dataset Search will be available in multiple languages and will continuously expand operations in other languages, says Google.

google data