Big Science and Big Data

I’d like to introduce the topic of “big science.” This is especially important as appropriations committees in Congress debate budgets for NASA and NSF in the US (see my previous post) and related debates occurred a couple month’s ago in Europe over the budget of the European Space Agency (ESA).

“Big science” usually refers to large international collaborations on projects with big budgets and long time spans. According to Harry Collins in Gravity’s Shadow (2004),

small science is usually a private activity that can be rewarding to the scientists even when it does not bring immediate success. In contrast, big-spending science is usually a public activity for which orderly and timely success is the priority for the many parties involved and watching.

He goes on to point out that in a project like the Laser Interferometer Gravitational-Wave Observatory (LIGO), it’s possible to change from small science to big but it means a relative loss of autonomy and status for most of the scientists who live through the transition. Kevles & Hood (1992) distinguish between “‘centralized’ big science, such as the Manhattan Project and the Apollo program; ‘federal’ big science, which collects and organizes data from dispersed sites; and ‘mixed’ big science, which offers a big, centrally organized facility for the use of dispersed teams.”

In addition to LIGO, there are many other big science projects, such the Large Hadron Collider (LHC, which discovered the Higgs boson), the International Thermonuclear Experimental Reactor (ITER), and in astronomy and astrophysics, the James Webb Space Telescope (JWST, the successor to Hubble), the Large Synoptic Survey Telescope (LSST, pictured below), and the Wide-Field InfraRed Survey Telescope (WFIRST), for example.

Dome_at_Night-half

Note that some big science projects are primarily supported by government funding while others receive significant funding from industry or philanthropists. LSST and LIGO are supported by the NSF, JWST and WFIRST are supported by NASA, and LHC is supported by CERN, but all of these are international. In the case of the fusion reactor ITER (see diagram below), on which there was a recent detailed New Yorker article, it has experienced many delays and has gone over its many-billion-dollar budget, and it has had management problems as well. While budget and scheduling problems are common for big science projects, ITER is in a situation in which it needs produce results in the near future and avoid additional delays. (The US is committing about 9% to ITER’s total cost, but its current contribution is lower than last year’s and its future contributions may be reevaluated at later stages of the project.)

in-cryostat overview 130116

As scientists, we try to balance small-, mid-, and large-size projects. The large ones are larger than before, require decades of planning and large budgets, and often consist of collaborations with hundreds of people from many different countries. It’s important to be aware that relatively small- and mid-scale projects (such as TESS and IBEX in astronomy) are very important too for research, innovation, education, and outreach, and as they usually involve fewer risks, they can provide at least as much “bang for the buck” (in the parlance of our times).

In the context of “big science” projects these days, the concepts of “big data” and “data-driven science” are certainly relevant. Many people argue that we are now in an era of big data, in which we’re obtaining collections of datasets so large and complex that it becomes difficult to process them using on-hand database management tools or traditional data processing applications. Since the volume, velocity, and variety of data are rapidly increasing, it is increasingly important to develop and apply appropriate data mining techniques, machine learning, scalable algorithms, analytics, and other kinds of statistical tools, which often require more computational power than traditional data analyses. (For better or for worse, “big data” is also an important concept in the National Security Agency and related organizations, in government-funded research, and in commercial analyses of consumer behavior.)

In astronomy, this is relevant to LSST and other projects mentioned above. When LSST begins collecting data, each night for ten years it will obtain roughly the equivalent amount of data that was obtained by the entire Sloan Digital Sky Survey, which was until recently the biggest survey of its kind, and it will obtain about 800 measurements each for about 20 billion sources. We will need new ways to store and analyze these vast datasets. This also highlights the importance of “astrostatistics” (including my own) and of “citizen science” (which we introduced in a previous post) such as the Galaxy Zoo project. IT companies are becoming increasingly involved in citizen science as well, and the practice of citizen science itself is evolving with new technologies, datasets, and organizations.

I’ll end by making a point that was argued in a recent article in Science magazine: we should avoid “big data hubris,” the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.

7 thoughts on “Big Science and Big Data

  1. Pingback: Frontiers of Citizen Science | Science Political

  2. Pingback: Is “Data-driven Science” an Oxymoron? | Science Political

  3. Pingback: Is “Data-driven Science” an Oxymoron? | Science Political

  4. Pingback: Rise of the Giant Telescopes | Science Political

  5. Pingback: High-Definition Space Telescope: Our Giant Glimpse of the Future? | Science Political

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s