What’s the Big Data? |

The first time I heard about big data was April 1st, the “fool’s day”, while I was drinking coffee, and reading the SIAM newspaper. I was attracted by a report about a meeting of the SIAM Committee on Science Policy (December 3 and 4, Washington, DC), a few themes emerged from the wide-ranging discussions generated by the agenda and slate of visitors. “Big data,” the overarching theme, was closely connected to two others: programs of most of the U.S. agencies that fund science and the increasingly interdisciplinary nature of research on important problems.

Just as CSP member Fred Roberts’s comment, ” a lot of agencies don’t understand “big data”–do not, in fact, even know what it is.” Of interest to the CSP, he continued, are the “huge opportunities and roles for mathematics with data,” both in the use of existing methods to analyze data and in the development of new methods.

So what’s big data?

According to Wiki’s definition, Big data^[1]^[2] is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage,^[3] search, sharing, transfer, analysis,^[4] and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to “spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.”^[5]^[6]^[7]

Therefore, big data does exist in our real life, honestly, everywhere. After attending the recent SIAM 2013 Annual Meeting, I found out that there are a lot of topics about “big data”. For instance, how you can deal with a system of ODEs having 2,000 variables? (The one available solution is reduction method.) And in the dynamics, chaos and nonlinear systems, mathematicians will never avoid to meet the “elephant”.

However, what kind of big data is worth studying? In my view, Roberts’s points about big data are exclusive,

■ With respect to a definition of “big data,” you have a big data question if you have so much data that you don’t know what to save and, in some cases, need to make that decision instantaneously. This occurs especially in certain disciplines, e.g., astrophysics.

■ It is often necessary to determine the normal state of a system in order to be able to quickly detect departures. An example is the smart grid: Operators now get data every 2–4 seconds; with new phasor technology, updates might come 10 times/second; a human won’t be able to detect the state of the grid without algorithms.

■ Data now comes from a variety of sources–sensors, audio, video, among many others–and a variety of media. How do you make sense of data coming from the many different sources?

■ How do you store, query, and search data when there’s so much of it?

■ How can you trust the data you have? How do you define “trust”? Social media data is an example–can Twitter and Facebook data be considered accurate?

■ You would like to make inferences and hypotheses from large amounts of data. How do you do that?

■ The problem is not just the size of the data set, but also its complexity.

■ Large data problems now come from many disciplines. Examples are NEON (National Ecological Observatory Network), a project of the National Science Foundation, and GBIF (Global Biodiversity Information Facility), an international effort to digitize all information about all living species (estimated number: between 2 and 10 million). These projects are striking for both the size and the complexity of the data sets. The intelligence community has been dealing with big data questions for some time; the Department of Homeland Security tries to do so. The financial sector grapples with huge amounts of data.

Just about every U.S. federal agency that funds science currently supports at least one major national initiative on data. Announcing a $200 million R&D initiative in big data in March 2012, the White House described the program as a way to enhance “our ability to extract knowledge and insights from large and complex collections of digital data.”

If you are a guy trying to find funds or fellowships, maybe you could start thinking about big data. Because I do have a friend got a grant from NSF to support his big data problem.

What’s the Big Data?

About Shijie Gu

Categories

Archives

Retired Blogs

Meta