The NSF’s Cybertools Program

Posted: August 1, 2008 in 2005, Articles
Tags: , , , ,

Social and behavioral research to gain from better computing

Cornell researchers are developing tools to explore the 40-billion-page Internet Archive.
Credit and Larger Version

October 18, 2005

The National Science Foundation (NSF) has announced the first awards in its Next-Generation Cyberinfrastructure Tools program–an initiative designed to extend the boundaries of social and behavioral research and lead to fundamental advances in cyberinfrastructure–will go to research teams at Cornell University and the University of Chicago.

The cyberinfrastructure tools initiative will serve two purposes. First, it will help social and behavioral scientists push their research through the use of “cyberinfrastructure”–vast new webs of computers, networks and data resources that are becoming increasingly important to science as a whole, and to the activities of NSF in particular. Second, the scientists’ efforts will guide the development of future computational tools that will advance cyberinfrastructure itself.

“Over the past 15 years, the social and behavioral sciences have arguably been most transformed by new cyberinfrastructure,” says David Lightfoot, head of NSF’s Directorate for Social, Behavioral & Economic Sciences. “Many areas have undergone dramatic changes in the kind of research that has become possible. The goal of this initiative is to explore further innovations that will continue to drive new research possibilities in scale and detail. This will be an entree to a larger feast of exploration for the directorate.”

The two awards will amount to about $2 million each and will last for approximately two years.

  • The Cornell project, headed by sociologist Michael Macy, will attempt to create a novel laboratory for social- science research based on the vast Internet Archive. The 40 billion pages of the archive represent snapshots of the Web that have been captured and stored every 2 months for nearly 10 years–everything from corporate web pages to news groups and blogs. The archive is thus a remarkably rich and detailed record of societal events and dynamics over that time. The challenge is to access that record and make sense of it. To meet that challenge, the Cornell team plans to build an intelligent front-end for searching the archive, an effort that will require cutting edge research in natural language processing and machine learning algorithms, as well as next-generation technology in privacy preservation. These front-end cyberinfrastructure tools, operating on Cornell’s NSF-funded Petabyte Data Storage facility infrastructure, represent an entirely new scale and new methodology for social science research.The Cornell group will develop, test, and refine their search tools through one specific research topic, the diffusion of innovation. But they foresee many other uses of this ability to study social life in cyberspace, ranging from pure research to practical applications for business and government. Some notable examples include the identification of market trends, the rise and fall of demand, and the spread of consumer opinion. Likewise, community watchdog groups would be able to track the spread of “hate sites,” and government agencies would be able to trace past and current uses of the Web for organizing and coordinating terrorist attacks.
  • The Chicago project, headed by psychologist Bennett Bertenthal, will develop tools for collecting and analyzing human behavioral data on an unprecedented scale and level of sophistication. In their “SuperLab,” the multidisciplinary team of researchers will be able to track human behavior in both individual and group settings, while collecting exquisitely detailed data on the participants in real time. These data, in turn, will help the Chicago group address research questions that far exceed the capacity of any laboratory today. How is social behavior correlated with the participants’ neural activity, for example? How is it connected with their movements, postures, gestures, facial expressions and speech–or for that matter, their state of development, environmental context and cultural norms? Central to the Chicago group’s effort will be the creation of a distributed data warehouse known as the Social Informatics Data Grid (SID Grid): a piece of cyberinfrastructure will encourage data sharing and accelerate the development of standards for collecting and coding physiological and behavioral data. The SID Grid will be deployed as part of the larger TeraGrid, a suite of grid computing resources available to the scientific community at large.The Chicago group will develop, test, and refine their data collection and analysis tools through research in three areas of inquiry: multimodal communication, neurobiology of social behavior, and cognitive and social neuroscience. But they foresee many other uses for the tools they will create. Most notably, their efforts will contribute to research on how human behavior can be automatically extracted, and even interpreted, from media like audio and video recordings. Such research may open the way towards mining the truly vast amounts of data on human behavior that are recorded every day.


Media Contacts
M. Mitchell Waldrop, NSF (703) 292-7752

Program Contacts
Christopher Kello, NSF (703) 292-8732

Related Websites
Press release on NSF’s Teragrid:
NSF Special Report on Cyberinfrastructure:
The Internet Archive:
The Cornell University cyberinfrastructure tools release:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s