Big Science and Big Data

I’d like to introduce the topic of “big science.” This is especially important as appropriations committees in Congress debate budgets for NASA and NSF in the US (see my previous post) and related debates occurred a couple month’s ago in Europe over the budget of the European Space Agency (ESA).

“Big science” usually refers to large international collaborations on projects with big budgets and long time spans. According to Harry Collins in Gravity’s Shadow (2004),

small science is usually a private activity that can be rewarding to the scientists even when it does not bring immediate success. In contrast, big-spending science is usually a public activity for which orderly and timely success is the priority for the many parties involved and watching.

He goes on to point out that in a project like the Laser Interferometer Gravitational-Wave Observatory (LIGO), it’s possible to change from small science to big but it means a relative loss of autonomy and status for most of the scientists who live through the transition. Kevles & Hood (1992) distinguish between “‘centralized’ big science, such as the Manhattan Project and the Apollo program; ‘federal’ big science, which collects and organizes data from dispersed sites; and ‘mixed’ big science, which offers a big, centrally organized facility for the use of dispersed teams.”

In addition to LIGO, there are many other big science projects, such the Large Hadron Collider (LHC, which discovered the Higgs boson), the International Thermonuclear Experimental Reactor (ITER), and in astronomy and astrophysics, the James Webb Space Telescope (JWST, the successor to Hubble), the Large Synoptic Survey Telescope (LSST, pictured below), and the Wide-Field InfraRed Survey Telescope (WFIRST), for example.

Dome_at_Night-half

Note that some big science projects are primarily supported by government funding while others receive significant funding from industry or philanthropists. LSST and LIGO are supported by the NSF, JWST and WFIRST are supported by NASA, and LHC is supported by CERN, but all of these are international. In the case of the fusion reactor ITER (see diagram below), on which there was a recent detailed New Yorker article, it has experienced many delays and has gone over its many-billion-dollar budget, and it has had management problems as well. While budget and scheduling problems are common for big science projects, ITER is in a situation in which it needs produce results in the near future and avoid additional delays. (The US is committing about 9% to ITER’s total cost, but its current contribution is lower than last year’s and its future contributions may be reevaluated at later stages of the project.)

in-cryostat overview 130116

As scientists, we try to balance small-, mid-, and large-size projects. The large ones are larger than before, require decades of planning and large budgets, and often consist of collaborations with hundreds of people from many different countries. It’s important to be aware that relatively small- and mid-scale projects (such as TESS and IBEX in astronomy) are very important too for research, innovation, education, and outreach, and as they usually involve fewer risks, they can provide at least as much “bang for the buck” (in the parlance of our times).

In the context of “big science” projects these days, the concepts of “big data” and “data-driven science” are certainly relevant. Many people argue that we are now in an era of big data, in which we’re obtaining collections of datasets so large and complex that it becomes difficult to process them using on-hand database management tools or traditional data processing applications. Since the volume, velocity, and variety of data are rapidly increasing, it is increasingly important to develop and apply appropriate data mining techniques, machine learning, scalable algorithms, analytics, and other kinds of statistical tools, which often require more computational power than traditional data analyses. (For better or for worse, “big data” is also an important concept in the National Security Agency and related organizations, in government-funded research, and in commercial analyses of consumer behavior.)

In astronomy, this is relevant to LSST and other projects mentioned above. When LSST begins collecting data, each night for ten years it will obtain roughly the equivalent amount of data that was obtained by the entire Sloan Digital Sky Survey, which was until recently the biggest survey of its kind, and it will obtain about 800 measurements each for about 20 billion sources. We will need new ways to store and analyze these vast datasets. This also highlights the importance of “astrostatistics” (including my own) and of “citizen science” (which we introduced in a previous post) such as the Galaxy Zoo project. IT companies are becoming increasingly involved in citizen science as well, and the practice of citizen science itself is evolving with new technologies, datasets, and organizations.

I’ll end by making a point that was argued in a recent article in Science magazine: we should avoid “big data hubris,” the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.

Paradigm Shifts?

In addition to physics and astronomy, I used to study philosophy of science and sociology. In my opinion, many scientists could learn a few things from sociologists and philosophers of science, to help them to better understand and consider how scientific processes work, what influences them and potentially biases scientific results, and how science advances through their and others’ work. In addition, I think that people who aren’t professional scientists (who we often simply call “the public”) could better understand what we are learning and gaining from science and how scientific results are obtained. I’ll just write a few ideas here and we can discuss these issues further later, but my main point is this: science is an excellent tool that sometimes produces important results and helps us learn about the universe, our planet, and ourselves, but it can be a messy and nonlinear process, and scientists are human–they sometimes make mistakes and may be stubborn about abandoning a falsified theory or interpretation. The cleanly and clearly described scientific results in textbooks and newspaper articles are misleading in a way, as they sometimes make us forget the long, arduous, and contentious process through which those results were achieved. To quote from Carl Sagan (in Cosmos), who inspired the subtitle of this blog (the “pale blue dot” reference),

[Science] is not perfect. It can be misused. It is only a tool. But it is by far the best tool we have, self-correcting, ongoing, applicable to everything. It has two rules. First: there are no sacred truths; all assumptions must be critically examined; arguments from authority are worthless. Second: whatever is inconsistent with the facts must be discarded or revised.

As you may know, the title of this post refers to Thomas Kuhn (in his book, The Structure of Scientific Revolutions). “Normal science” (the way science is usually done) proceeds gradually and is based on paradigms, which are collections of diverse elements that tell scientists what experiments to perform, which observations to make, how to modify their theories, how to make choices between competing theories and hypotheses, etc. We need a paradigm to demarcate what is science and to distinguish it from pseudo-science. Scientific revolutions are paradigm shifts, which are relatively sudden and unstructured events, and which often occur because of a crisis brought about by the accumulation of anomalies under the prevailing paradigm. Moreover, they usually cannot be decided by rational debate; paradigm acceptance via revolution is essentially a sociological phenomenon and is a matter of persuasion and conversion (according to Kuhn). In any case, it’s true that some scientific debates, especially involving rival paradigms, are less than civil and rational and can look something like this:
calvin_arguing

I’d like to make the point that, at conferences and in grant proposals, scientists (including me) pretend that we are developing research that is not only cutting edge but is also groundbreaking and Earth-shattering; some go so far as to claim that they are producing revolutionary (or paradigm-shifting) research. Nonetheless, scientific revolutions are actually extremely rare. Science usually advances at a very gradual pace and with many ups and downs. (There are other reasons to act like our science is revolutionary, however, since this helps to gain media attention and perform outreach in the public, and it helps policy-makers to justify investments in basic research in science.) When a scientist or group of scientists does obtain a critically important result, it is usually the case that others have already produced similar results, though perhaps with less precision. Credit often goes to a single person who packaged and advertised their results well. For example, many scientists are behind the “Higgs boson” discovery, and though American scientists received the Nobel Prize for detecting anisotropies in the cosmic microwave background with the COBE satellite, Soviets actually made an earlier detection with the RELIKT-1 experiment.

einstein-bohr

Let’s briefly focus on the example of quantum mechanics, in which there were intense debates intense debates in the 1920s about (what appeared to be) “observationally equivalent” interpretations, which in a nutshell were either probabilistic or deterministic and realist ones. My favorite professor at Notre Dame, James T. Cushing, wrote a provocative book on the subject with the subtitle, “Historical Contingency and the Copenhagen Hegemony“. The debates occurred between Neils Bohr’s camp (with Heisenberg, Pauli, and others, who were primarily based in Copenhagen and Göttingen) and Albert Einstein’s camp (with Schrödinger and de Broglie). Bohr’s younger followers were trying to make bold claims about QM and to make names for themselves, and one could argue that they misconstrued Einstein’s views. Einstein had essentially lost by the 1930s, in which the nail in the coffin was von Neumann’s so-called impossibility proof of “hidden variables” theories–a proof that was shown to be false thirty years later. In any case, Cushing argues that in decisions about accepting or dismissing scientific theories, sometimes social conditions or historical coincidences can play a role. Mara Beller also wrote an interesting book about this (Quantum Dialogue: The Making of a Revolution), and she finds that in order to understand the consolidation of the Copenhagen interpretation, we need to account for the dynamics of the Bohr et al. vs. Einstein et al. struggle. (In addition to Cushing and Beller, another book by Arthur Fine, called The Shaky Game, is also a useful reference.) I should also point out that Bohr used the rhetoric of “inevitability” which implied that there was no plausible alternative to the Copenhagen paradigm. If you can convince people that your view is already being adopted by the establishment, then the battle has already been won.

More recently, we have had other scientific debates about rival paradigms, such as in astrophysics, the existence of dark matter (DM) versus modified Newtonian dynamics (MOND); DM is more widely accepted, though its nature–whether it is “cold” or “warm” and to what extent it is self-interacting–is still up for debate. Debates in biology, medicine, and economics, are often even more contentious, partly because they have policy implications and can conflict with religious views.

Other relevant issues include the “theory-ladenness of observation”, the argument that everything one observes is interpreted through a prior understanding (and assumption) of other theories and concepts, and the “underdetermination of theory by data.” The concept of underdetermination dates back to Pierre Duhem and W. V. Quine, and it refers to the argument that given a body of evidence, more than one theory may be consistent with it. A corollary is that when a theory is confronted with recalcitrant evidence, the theory is not falsified, but instead, it can be reconciled with the evidance by making suitable adjustments to its hypotheses and assumptions. It is nonetheless the case that some theories are clearly better than others. According to Larry Laudan, we should not overemphasize the role of sociological factors over logic and the scientific method.

In any case, all of this has practical implications for scientists as well as for science journalists and for people who popularize science. We should be careful to be aware of, examine, and test our implicit assumptions; we should examine and quantify all of our systematic uncertainties; and we should allow for plenty of investigation of alternative explanations and theories. In observations, we also should be careful about selection effects, incompleteness, and biases. Finally, we should remember that scientists are human and sometimes make mistakes. Scientists are trying to explore and gain knowledge about what’s really happening in the universe, but sometimes other interests (funding, employment, reputation, personalities, conflicts of interest, etc.) play important roles. We must watch out for herding effects and confirmation bias, where we converge and end up agreeing on the incorrect answer. (Historical examples include the optical or electromagnetic ether; the crystalline spheres of medieval astronomy; the humoral theory of medicine; ‘catastrophist’ geology; etc.) Paradigm shifts are rare, but when we do make such a shift, let’s be sure that what we’re transitioning to is actually our currently best paradigm.

[For more on philosophy of science, this anthology is a useful reference, and in particular, I recommend reading work by Imre Lakatos, Paul Feyerabend, Helen Longino, Nancy Cartwright, Bas van Fraassen, Mary Hesse, and David Bloor, who I didn’t have the space to write about here. In addition, others (Ian Hacking, Allan Franklin, Andrew Pickering, Peter Galison) have written about these issues in scientific observations and experimentation. For more on the sociology of science, this webpage seems to contain useful references.]

Citizen Science: a tool for education and outreach

I’ll write about a different kind of topic today. “Citizen science” is a relatively new term though the activity itself is not so new. One definition of citizen science is “the systematic collection and analysis of data; development of technology; testing of natural phenomena; and the dissemination of these activities by researchers on a primarily avocational basis.” It involves public participation and engagement in scientific research in a way that educates the participants, makes the research more democratic, and makes it possible to perform tasks that a small number of researchers could not accomplish alone. Volunteers simply need access to a computer (or smartphone) and an internet connection to become involved and assist scientific research.

example_face_on_spiral

Citizen science was popularized a few years ago by Galaxy Zoo, which involved visually classifying hundreds of thousands of galaxies into spirals, ellipticals, mergers, and finer classifications using the classification tree below. (I am a member of the Galaxy Zoo collaboration and have published a few papers with them.) As a result of “crowdsourcing” the work of more than 100,000 volunteers around the world, new scientific research can be done that was not previously possible with such large datasets, including studies of the handedness of spiral galaxies, analyses of the environmental dependence of barred galaxies, and the identification of rare objects such as a quasar light echo that was dubbed “Hanny’s Voorwerp”. Other citizen science projects include mapping the moon, mapping air pollution, counting birds with birdwatchers, classifying a variety of insects, and many other projects.

Willettetal13_Fig1

Citizen scientists have many motivations, but it appears that the primary one is the desire to make a contribution to scientific research (see this paper). In the process, by bringing together professional scientists and members of the general public and facilitating interactions between them, citizen science projects are important for outreach purposes, not just for research. In addition, by encouraging people to see a variety of images or photographs and to learn about how the research is done, citizen science is useful for education as well. Many valuable educational tools have been produced (such as by the Zooniverse projects). Citizen science projects are popular and proliferating because they give the opportunity for people at home or in the classroom to become actively involved in science. It has other advantages too, including raising awareness and stimulating interest in particular issues. Citizen science is continuing to evolve, and in the era of “big data” and social media, it has much potential and room for improvement.

Scientific Integrity

In this blog post, let’s discuss scientific integrity–specifically, efforts to keep scientific research as independent as possible from political, corporate, or other influence. Such influences are important for a variety of policies including energy policy (especially related to climate change), health and drugs, food and nutrition, education, etc., when particular companies or organizations have a financial or other stake in the outcome. For example, fossil fuel companies support the “denial industry“, claiming that the science of global warming is inconclusive, agribusinesses promote genetically modified crops, and drug companies promote antidepressant and ADHD drugs, while funding scientific research that often supports their campaigns.

Science informs political officials and agencies when they’re designing regulations for air and water pollution, when determining whether a particular drug is safe and efficacious, when assessing whether particular foods or products are safe for consumers, etc. In my opinion, science can rarely be completely “objective” and “unbiased”; scientists are humans, after all, and they have their own motivations and considerations that can affect their work. The important thing, however, is to reduce political and commercial influence as much as possible so that scientists can do their research and then present their results as clearly and accurately as possible.

In all fields of science, scientists to some extent are affected by funding constraints and grant agencies. These constraints can affect exactly what is studied, how it is researched, and how the results are presented in the media and to the public. Nonetheless, scientific research is particularly important–and susceptible to more outside influences–when it is related to public policy, including the topics above. In addition, politically-related work in the social sciences, especially economics, can be contentious as well.

In the US under the Bush administration, many felt that scientists were under attack. For example, a “revolving door” appeared to be in place when former lobbyists and spokespeople for industries later worked at agencies having the task of regulating their former industries; in particular cases, they appeared to write or advocate for policy shifts that benefited these industries. In 2004, the Union of Concerned Scientists (UCS) released a report, “Scientific Integrity in Policymaking: An Investigation into the Bush Administration’s Misuse of Science”, claiming that the White House censors and suppresses reports by its own scientists, stacks advisory committees, and disbands government panels. There later appeared to be political influence on the Food and Drug Administration (FDA), on researchers working on embryonic stem cells, on sex education (because of arguments about the effectiveness of abstinence-based programs), and on the teaching of biological evolution.

Although the Obama administration appears to have more respect for science and scientists (see this 2013 UCS report), the politicization of some scientific work continues. The assessment of the social and environmental impact of the Keystone XL pipeline may be such an example. The final environmental impact statement, which was released by the State Department yesterday, appears to endorse the pipeline, but the interpretation is unclear (see this coverage in the Wall Street Journal and Scientific American blog).

In any case, these contentious situations will be easier when government agencies have explicit policies for scientific integrity and when the affiliations and employment histories of officials are transparent. It’s also important to keep in mind that the struggle for independent and transparent science never ends. Scientists should always try to be as clear as possible about their views or beliefs when they are relevant to their work (see this NYT blog for useful advice), and results and data should be made publicly available whenever possible.

Some thoughts on “work-life balance”

Since my partner and I are about to go on vacation and I’m therefore about to go on a break from work, this will be my last blog until mid-January. I figured that this might be a good occasion to talk a bit about what some people call the “work-life balance.” I’ll try to make my comments general, but note that my perspective is that of a man, a scientist, and an academic in the US, which may be very different than others’ perspectives. One major difference of jobs in academia is that they tend have more flexible schedules but less security than other jobs. (For more discussion of these issues, I suggest looking at the Women in Astronomy blog and the American Astronomical Society Committee on the Status of Women.)

I think the main point I want to make here is that work-life balance issues and issues of equality and diversity are closely related, and issues of fair working conditions and job security are related as well but are discussed less often in this context.

One thing is clear: both women and men want to “have it all”, though what “all” refers to is different for different people. In addition, there has been much debate and discussion recently in news media, such as these articles in The Guardian and The Atlantic, of the fact that men also want a balance between work and life, which often refers to men taking a larger role than before at home with their families.  It’s interesting that this is considered noteworthy, but it’s good that changes toward equality are happening even if they’re a bit late.

article-0-02866F09000004B0-998_634x542

When both men and women seek balances between work and life, this also should result in more equal career and employment opportunities for women and therefore more women in leadership positions than there have been in the past.  For example, when both men and women take parental leave, it is less likely to hurt them in terms of their long-term career advancement.  It is increasingly becoming understood and expected by co-workers and employers that both men and women take leave, though some employers (and universities) have better policies for this than others.  Many countries require paid paternity leave, but the US is not one of them.

I also want to point out that discussions of these issues often seem to occur about people with children, though of course people without kids want work-life balance too.  Work and careers are important for many people, but some people only notice a work-life tension when they have kids, partly because kids take a lot of time but also because some people’s lives are primarily focused on their work. This isn’t really a criticism (after all, many great scientists and artists have been passionately focused only on their work), but it’s worth noting that “workaholic” attitudes are common but are especially prevalent in the US, to some people’s detriment. In addition, when there is a lot of competition for jobs and job security is hard to find, there is more pressure to work harder and longer hours at the expense of other important things. In any case, every person has different goals and priorities, but jobs and employer policies should be flexible enough to accommodate that. A work-life balance is important for one’s mental and physical health and happiness and for the health of families and communities, though of course different people will have different ways for attempting to achieve such a balance.

Finally, to lighten things up, let’s end with an Onion article.