Computational Social Science Institute

University of Massachusetts Amherst

Lecture series generously sponsored by Yahoo!

Videos of some CSSI seminars are available  here

Spring 2011

Data Visualization at the New York Times [video]
Friday, February 11, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 151

Abstract:The New York Times graphics desk researches and creates charts, diagrams and maps for the paper and the Web site. We'll talk about what we do and why we do it, focusing on what we know (distributions are more interesting than averages, sliders are awesome) and what I'm still hoping to learn (how to turn uncertainty into a strength, how to resolve the conflict between interactivity and storytelling).

Bio:Amanda joined the Times as a graphics editor in 2005, after getting a master's degree in statistics at the University of Washington.

Matthew Salganik

Matt Salganik, Princeton University

Wiki surveys: Open, Adaptive, and Quantifiable Social Data Collection
Friday, February 18, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 151

Abstract:Research about attitudes and opinions is central to social science and relies on two common methodological approaches: surveys and interviews. While surveys allow researchers to quantify large amounts of information quickly and at a reasonable cost, they are routinely criticized for being "top-down" and rigid. In contrast, interviews allow unanticipated information to "bubble up" directly from respondents, but are slow, expensive, and hard to quantify. This tension between openness and quantifiability is at the heart of the debate about quantitative and qualitative approaches to social research. Advances in computing technology now enable a hybrid approach, wiki surveys, which combines the quantifiability of a survey with the openness of an interview. We draw on principles undergirding successful information aggregation projects, such as Wikipedia and the Linux operating system, to propose several general criteria that wiki surveys should satisfy. We then present results from, a free and open source website that we created which allows groups all over the world to deploy wiki surveys. To date, over 750 wiki surveys have been created, and they have collected over 25,000 ideas and 1.5 million votes. We describe some of the methodological challenges involved in collecting and analyzing this type of data, and present case studies of wiki surveys created by the New York City Mayor's Office and the Organization for Economic Cooperation and Development (OECD). The paper concludes with a discussion of limitations and how some of these limitations might be overcome with additional research. (Joint work with Karen Levy.)

Bio:Matthew Salganik is an Assistant Professor in the Department of Sociology at Princeton University. His interests include social networks, quantitative methods, and web-based social research. One main area of his research has focused on developing network-based statistical methods for studying populations most at risk for HIV/AIDS. A second main area of work has been using the World Wide Web to collect and analyze social data in innovative ways. Salganik's research has been published in journals such as Science, PNAS, Sociological Methodology, and Journal of the American Statistical Association. His papers have won the Outstanding Article Award from the Mathematical Sociology Section of the American Sociological Association and the Outstanding Statistical Application Award from the American Statistical Association. Popular accounts of his work have appeared in the New York Times, Wall Street Journal, Economist, and New Yorker. Salganik's research is funded by the National Science Foundation, National Institutes of Health, Joint United Nations Program for HIV/AIDS (UNAIDS), and Google.

Statistical Methods for Combining Survey and Population-Level Data [video]
Friday, February 25, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 151

Abstract:In many situations information from a sample of individuals can be supplemented by information from population level data on the relationship of the explanatory variable with the dependent variables. Sources of population level data include a census, vital events registration systems and other governmental administrative record systems. They contain too few variables, however, to estimate demographically interesting models. Thus in a typical situation the estimation is done by using sample survey data alone, and the information from complete enumeration procedures is ignored. Sample survey data, however, are subjected to sampling error and bias due to non- response, whereas population level data are comparatively free of sampling error and typically less biased from the effects of non-response. In this talk we will review statistical methods for the incorporation of population level information and show it can lead to statistically more accurate estimates and better inference. Population level information can be incorporated via constraints on functions of the model parameters. In general the constraints are non-linear, making the task of maximum likelihood estimation more difficult. We present an alternative approach exploiting the notion of an empirical likelihood. We give an application to demographic hazard modeling by combining panel survey data with birth registration data to estimate annual birth probabilities by parity. This is joint work with Sanjay Chaudhuri (National University of Singapore), and Michael S. Rendall (RAND Corporation).

Bio:Mark S. Handcock is Professor of Statistics and Director of the Center for Social Statistics at the University of California - Los Angeles. He received his B.Sc. from the University of Western Australia and his Ph.D. from the University of Chicago. Dr. Handcock’s research involves methodological development, and is based largely on motivation from questions in the social sciences, demography and epidemiology. He has published extensively on network models and inference as well as network sampling methods. He is a Fellow of the American Statistical Association.

James Kitts

James Kitts, Columbia University

Group Processes and Social Network Dynamics
Friday, March 4, 2011 • 12:30PM–2PM • Lunch provided
Lederle Graduate Research Tower, Room 1634

Abstract:I will begin by drawing on some of my research on utopian communes, illustrating some phenomena that have interested group process scholars for decades. I will then show how such patterns can emerge at the group level as unintended byproducts of the most elementary processes of social interaction in networks. The rest of the talk will focus on an empirical study of these micro-level processes. In 2004-2007, I collaborated with a team of computer scientists working to develop methods for using wearable sensors to record social interactions, and to then derive social network data automatically from audio recordings. We implemented these methods in a study of two student cohorts as they joined a graduate program at a large US university. In this presentation, I will analyze five forms of social interaction for the same students over the academic year. I will show that comparing how these five relations change over time sheds new light on the mechanisms underlying the evolution of networks within social groups.

Bio:James Kitts is an Assistant Professor in the Graduate School of Business at Columbia University, having previously held positions in Sociology at Dartmouth College and the University of Washington. He is broadly interested in the dynamics of cooperation and competition among organizations and among their members. He has studied the collective implications of communication biases in interaction networks, the dynamics of polarization, factionalism, and extremism in social influence networks, and the demography and ecology of radical social movement organizations. His work has recently appeared in American Sociological Review, Social Forces, Demography, and Social Psychology Quarterly.

Rob Franzese

Rob Franzese, University of Michigan

Modeling History Dependence in Network-Behavior Coevolution
Friday, March 11, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 151

Abstract:Spatial interdependence - the dependence of outcomes in some units on those in others - is substantively and theoretically ubiquitous and central across the social sciences. Spatial association is also omnipresent empirically. However, spatial association may arise from three importantly distinct processes: common exposure of actors to exogenous external and internal stimuli, interdependence of outcomes/behaviors across actors (contagion), and/or the putative outcomes may affect the dimensions along which the clustering occurs (selection). Accurate inference about any of these processes generally requires an empirical strategy that addresses all three well. From a spatial-econometric perspective, this suggests spatiotemporal empirical models with exogenous covariates (common exposure) and spatial lags (contagion), with the spatial weights being endogenous (selection). From a longitudinal network-analytic perspective, the same three processes are identified as potential sources of network effects and network formation. From that perspective, actors' self-selection into networks (by, e.g., behavioral homophily) and actors' behavior that is contagious through those network connections likewise demands theoretical and empirical models in which networks and behavior coevolve over time. This paper begins building such models by, on the theoretical side, extending a Markov type-interaction model to allow endogenous tie-formation, and, on the empirical side, merging a simple spatial-lag logit model of contagious behavior with a simple p-star logit model of network formation, building this synthetic discrete-time empirical model from the theoretical base of the modified Markov type-interaction model. One interesting consequence of network-behavior coevolution - identically: endogenous patterns of spatial interdependence - emphasized here is how it can produce history-dependent political dynamics, including equilibrium phat and path dependence (Page 2006). The paper explores these implications, and then concludes with a demonstration of the strategy applied to alliance formation and conflict behavior among the great powers in the first half of the twentieth century.

Bio:Robert (Rob) J. Franzese, Jr. (Ph.D. Government 1996 and A.M. Economics 1995, Harvard University) is Professor and Associate Chair of Political Science at the University of Michigan. He specializes in comparative democratic politics and political economy and in empirical methodology for the social sciences. He is (co)author or coeditor of numerous books, articles, chapters, and other publications spanning comparative and international politics and political economy and empirical methodology, on topics including the political economy of monetary and fiscal policy, empirical-model specification, approaches to empirical evaluation of social-science theory, spatial-econometric models of interdependence, multilevel (hierarchical) models, and network-behavior coevolution and path dependence. His work has appeared in journals situated in many disciplines including political science, statistics, economics, and environmental policy. His current research interests and recent publications surround spatial-econometric models of interdependence and nonlinear empirical models of complexly context-conditional outcomes. He was a principal investigator of the original NSF-funded EITM (Empirical Implications of Theoretical Models) training program for empirical-model specification and analysis in political science; he has hosted and directed that program twice and was lead lecturer for program modules twice and guest lecturer twice. He has designed and taught numerous seminars and mini-courses in time-series-cross-section data analysis, in spatial econometrics, in empirical-model specification, and other specialized topics at a range of academic and professional institutions across the globe, including at the ICPSR and the Essex Summer Schools, at Academia Sinica (Taipei), at Pompeu Fabra (Barcelona), and at JAWAC (Dahlgren, Virginia).

Detecting Structural Biases Across Networks: Bayesian Meta-Analysis of Social Network Data Using Reference Quantiles [video]
Friday, March 25, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 151

Abstract:Many basic questions in the social network literature center on the distribution of aggregate structural properties within and across populations of networks. Such questions are of increasing relevance given the growing availability of network data suitable for meta-analytic studies, as well as the rise of study designs that involve the collection of data on multiple networks drawn from a larger population. In this talk, I present a family of techniques that combines a classic approach to the identification of structural biases in network data (the use of conditional uniform graph quantiles) with strategies drawn from non-parametric Bayesian analysis. The methods described here employ quantile information to allow for principled inference regarding the distribution of structural biases within (and comparison across) populations of networks, given data sampled at the network level. I illustrate the use of these techniques by application to the study of interpersonal power in urban communes and centralization in emergency communication networks. Some computational issues relating to posterior simulation will also be discussed.

Bio:Carter T. Butts is currently an associate professor in the Department of Sociology and Institute for Mathematical Behavioral Sciences at the University of California, Irvine. His research involves the application of mathematical and computational techniques to theoretical and methodological problems within the areas of social network analysis, mathematical sociology, quantitative methodology, and human judgment and decision making. Currently, his work focuses on: the structure of spatially embedded large-scale interpersonal networks; models for informant accuracy, network inference, and graph comparison; representation and modeling of intertemporal relational data; and measurement and modeling of online social networks. Dr. Butts also studies social phenomena related to emergency situations, and is involved in research that seeks to combine social science and information technology to improve group and organizational responses to disasters and other adverse events

Using the Web to do Social Science [video]
Friday, April 1, 2011 • 12:30PM–2PM • Lunch providedAT 11:45
Computer Science Building, Room 151

Abstract:Social science is often concerned with the emergence of collective behavior out of the interactions of large numbers of individuals, but in this regard it has long suffered from a severe measurement problem-namely that interactions between people are hard to observe, especially at scale, over time, and at the same time as observing behavior. In this talk, I will argue that the technological revolution of the Internet is beginning to lift this constraint. To illustrate, I will describe several examples of internet-based research that would have been impractical to perform until recently, and that shed light on some longstanding sociological questions. Although internet-based research still faces serious methodological and procedural obstacles, I propose that the ability to study truly "social" dynamics at individual-level resolution will have dramatic consequences for social science.

Bio:Duncan Watts is a principal research scientist at Yahoo! Research, where he directs the Human Social Dynamics group. He is also an adjunct senior research fellow at Columbia University, and an external faculty member of the Santa Fe Institute and Nuffield College, Oxford. His research on social networks and collective dynamics has appeared in a wide range of journals, from Nature, Science, and Physical Review Letters to the American Journal of Sociology. He is also the author of Six Degrees: The Science of a Connected Age (W.W. Norton, 2003) and Small Worlds: The Dynamics of Networks between Order and Randomness (Princeton University Press, 1999). He holds a B.Sc. in Physics from the University of New South Wales, and Ph.D. in Theoretical and Applied Mechanics from Cornell University.

Estimating latent processes on a graph from indirect measurements [video]
Friday, April 8, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 151

Abstract:Structured measurements and populations/samples with interfering units are ubiquitous in science and have become a focal point for discussion in the past few years. Formal statistical models for the analysis of this type of data have emerged as a major topic of interest in diverse areas of study. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online social networking websites such as Facebook and LinkedIn, and a host of more specialized professional networking communities has intensified interest in the study of graphs, structured measurements and interference. In this talk, I will review a few ideas and open areas of research that are central to this burgeoning literature, placing emphasis on the statistical and data analysis perspectives. I will then focus on the the problem of making inference on latent processes on a graph, with an application to the estimating point-to-point traffic volumes in a communication network from indirect measurements. Inference in this setting requires solving a sequence of ill-posed inverse problems, y(t)= A x(t). We develop a multilevel state-space model for mixing times series and an efficient approach to inference; a simple model is used to calibrate regularization parameters that lead to efficient inference in the multilevel state-space model. Our two-stage approach suggests an efficient inference strategy for multilevel models of multivariate time series.

Bio:Edoardo Airoldi is an assistant professor in the department of statistics at Harvard University. In December 2006, he received a Ph.D. from Carnegie Mellon, working on statistical machine learning and the analysis of complex systems with Stephen Fienberg and Kathleen Carley. His dissertation introduced statistical and computational elements of graph theory that support data analysis of complex systems and their evolution. Until December 2008, he was a postdoctoral fellow in the Lewis-Sigler Institute for Integrative Genomics of Princeton University working with Olga Troyanskaya, David Botstein, and James Broach where he developed mechanistic models to gain computational insights into aspects of the molecular and cellular biology that are not directly observable with experimental probes. Since that time, he has been working closely with biologists and in the areas of cellular differentiation, cellular development and cancer.

Causal Graphical Models for Computational Social Science [video]
Friday, April 15, 2011 • 12:30PM–2PM • Lunch provided
Lederle Graduate Research Tower, Room 1634

Abstract:Over the past 25 years, a surprising and impressive body of work has accumulated on algorithms for inferring causal dependence from observational data. This work has shown how patterns of observed statistical dependence can constrain the space of possible joint causal models, often to the point of uniquely identifying specific causal dependencies. Most of this work has been limited to very simple types of data, typically independent and identically distributed instances. Recently, my students and I have begun to extend this work to the analysis of more complex models whose causal dependencies also have spatial and temporal extent. This work draws on graphical models, social network analysis, statistical relational learning, and quasi-experimental design. I will describe some of this work, and provide an extended historical example of how it could significantly alter the practice of computational social science.

Bio:David Jensen is Associate Professor of Computer Science and Director of the Knowledge Discovery Laboratory at the University of Massachusetts Amherst. His current research focuses on causal discovery in relational data, computational social network analysis, fraud detection, and privacy. He serves on the Executive Committee of the ACM Special Interest Group on Knowledge Discovery and Data Mining and on the program committees of the International Conference on Machine Learning and the International Conference on Knowledge Discovery and Data Mining. He is an associate editor of the ACM Transactions on Knowledge Discovery from Data. He serves on DARPA's Information Science and Technology (ISAT) Group. He recently served on a National Research Council panel assessing the research program of the National Institutes of Justice. From 1991 to 1995, he served as an analyst with the Office of Technology Assessment, an agency of the United States Congress. He received his doctorate from Washington University in St. Louis in 1992.

How we think together [video]
Friday, April 22, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 151

Abstract:How do we think together? In particular, how do groups of individuals solve problems, and how do our patterns of communication affect collective problem solving? This presentation summarizes a series of laboratory and computational experiments in which communication patterns are manipulated. We find that increased communication reduces how much a group experiments and probability that the group will find the solution. However, increased communication does increase the probability that the group retains the optimal solution and the confidence subjects have that they have found the right answer. We also examine the impact of exogenously imposed diversity in networks. Does segregation of types of actors increase or decrease group performance? We define two different types of diversity--one based on ability, and the other based on solution. We find that segregation improves performance for solution based diversity, and decreases performance for ability based diversity.

Bio:David Lazer is an Associate Professor in Northeastern University’s Department of Political Science and the College of Computer and Information Science and Director of the Program on Networked Governance at Harvard. His work focuses on the nexus of policy networks, computational social science, and collaborative intelligence. He is a reviewing editor for Science, and his research has been published in such journals as Science, Proceedings of the National Academy of Science, the American Political Science Review, and the Administrative Science Quarterly.

Representational Style: What Legislators Say and Why it Matters [video]
Friday, April 29, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 151

Abstract:Legislators recognize that communication is an important component of representation, investing substantial time and resources to present and explain their work to constituents. But the political science literature neglects communication. A large literature defines representation through roll call votes, while other studies focus on legislators' work in Washington. But representatives' home styles, how they present and explain their work to constituents, is regularly ignored or dismissed as unimportant cheap talk. I use new comprehensive, systematic, and verifiable measures of home style to demonstrate its far reaching consequences for representation. In the district, home styles are the primary connection between constituent and representative, allowing legislators to define the representation they provide constituents. In Washington, representatives anticipate presenting their work to constituents. As a result, home styles are much more than cheap talk: they are credible and valuable. They are systematically related to what legislators do in Washington and how senators vote on controversial legislation, they affect how bureaucrats cultivate support for their program, and provide a credible indicator of legislators' broader representational styles. Home styles also matter for representation because they affect the quality of representation citizens receive, which I demonstrate with an examination of manipulation in home styles. Because analyzing home styles across all legislators using standard methods and data is infeasible, I introduce a new Bayesian statistical model for political texts and an original collection of over 64,000 Senate press releases to measure home style

Bio:Justin Grimmer is an Assistant Professor in the Department of Political Science at Stanford University. His research examines how political representation occurs in America and its consequences for policy making. In addition to his work measuring representational style, he is involved in projects measuring partisan vitriol in American politics, the systematic determination of close elections, and manipulation of the public. Justin received his Ph.D. from Harvard University's Government Department in 2010.

Massive Scale Experiment in Social Influence and Political Mobilization
Friday, May 6, 2011 • 12:30PM–2PM • Lunch provided
Lederle Graduate Research Tower, Room 1634

Bio:James Fowler is a Professor in the School of Medicine and the Division of Social Sciences at the University of California, San Diego. He was recently named a Fellow of the John Simon Guggenheim Foundation and one of Foreign Policy's Top 100 Global Thinkers. James's work lies at the intersection of the natural and social sciences. His primary areas of research are social networks, behavioral economics, evolutionary game theory, political participation, cooperation, and genopolitics (the study of the genetic basis of political behavior).