Computational Social Science Institute

University of Massachusetts Amherst

Lecture series generously sponsored by Yahoo!

Videos of some CSSI seminars are available  here

Fall 2012

On the Causes of Effects [video]
Friday, September 21, 2012 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:While much of science is concerned with the effects of causes, relying upon evidence accumulated from randomized controlled experiments and observational studies, the problem of inferring the causes of effects arises in many practical policy and legal contexts. Discussions of the two concepts, "the effects of causes" and "the causes of effects," go far back in the philosophical literature but remain murky. The statistical literature is only of limited help here as well, focusing largely on the tradition problem of the "effects of causes." Through a series of examples, I review the two concepts, how they are related, and how they differ. I discuss the challenges for statisticians who should be worried about both problems.

Bio:Stephen E. Fienberg is Maurice Falk University Professor of Statistics and Social Science at Carnegie Mellon University, and co-director of the Living Analytics Research Centre (jointly operated by Carnegie Mellon and Singapore Management University), with appointments in the Department of Statistics, the Machine Learning Department, the Heinz College, and Cylab. He joined the faculty of Carnegie Mellon in 1980. He received his Ph.D. in Statistics from Harvard University in 1968 and has served on the faculties of the University of Chicago, University of Minnesota, and York University. Fienberg’s research includes the development of statistical methods, especially tools for the analysis of categorical data, networks, and privacy protection, from both likelihood and Bayesian perspectives. His current research also includes aspects of the history of statistics, statistics and the law, methodology for census-taking, and the use of algebraic and polyhedral geometry in statistical methodology and theory. He is a member of the U. S. National Academy of Sciences (elected 1999), and a fellow of the Royal Society of Canada, the American Academy of Arts and Sciences, and the American Academy of Political and Social Science.

A Framework for Social Data Analysis of Text
Thursday, October 4, 2012 (in joint with MLFL)• 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:What can text analysis tell us about society? Enormous corpora of news, historical documents, books, and social media encode ideas, beliefs, and culture. While manual content analysis is a useful and established social science method, interest in automated text analysis has exploded in recent years, since it scales to massive data sets, and can assist in discovering patterns and themes. I will present some case studies of using social media text analysis as a measurement instrument for social phenomena: sentiment analysis as a correlate of public opinion polls, geographic lexical variation as data for sociolinguistics, and characterization of Chinese online censorship. These examples, and other related work, suggest that "text-as-data" analysis techniques have wide variation in their computational/statistical complexity and amount of domain knowledge. Many methods, from word statistics to sentiment lexicons to document classifiers to topic models, can be unified as "weighted lexicon" corpus analysis tools across these spectrums, supporting both exploratory and confirmatory text data analysis. Finally, depending on time and audience interest, I could briefly present (1) generative Bayesian models for frame learning, or (2) syntactic analysis of Twitter text (part-of-speech tagging and word clustering).

Bio:Brendan O'Connor ( is a Ph.D. Student at Carnegie Mellon University's Machine Learning Deptartment, advised by Noah Smith. He is interested in machine learning and natural language processing, especially when informed by or applied to the social sciences. He has interned on the Facebook Data Science team, and worked on crowdsourcing at Crowdflower / Dolores Labs, and "semantic" search at Powerset. His undergraduate degree was Symbolic Systems.

Social networks and spatial configuration-How office layouts drive social interaction [video]
Friday, October 19, 2012 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:This paper analyzes the spatial dimensions of office layouts in diverse knowledge-intensive workplace environments based on the theoretical and methodological propositions of Space Syntax, and brings this together with the analysis of intra-organizational interaction networks. Physical distances between agents are modeled in different ways and used as explanatory variables in exponential random graph modeling. The paper shows that spatial configuration in offices can be considered an important but not sole rationale for tie formation. Furthermore, it is shown that spatial distance measures based on detailed configurational analysis outperform simple Euclidean distance metrics in predicting social ties.

Methods for the Measurement of Political Ideology [video]
Thursday, November 9, 2012• 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:How can we measure a person's political ideology? Traditional methods in political science depend upon the use of voting records, but this restricts the application of these methods to voting members of political bodies and therefore provides researchers with no tools for measuring the idealogy of the vast majority of citizens. The most famous of existing methods, the Ideal Points model, has been among the great successes of quantitative political science. Nevertheless it has proven difficult to use Ideal Point methods to extract the nuanced and multi-faceted ideological stances that politicians in the United States are sometimes argued to possess. In my talk, I will suggest that we can begin to develop more easily deployed methods for measuring ideology by turning to novel data sources, including text and social network data. I will review work that has used floor speeches and press releases to develop a measure of ideology based only word count data as well as subsequent work that has used matrix factorization methods to estimate ideology from social network and text data from Twitter. I will also discuss ways in which the Ideal Points model and several related models can be situated in the broader context of approximate matrix factorizations.

Bio:John Myles White ( is a Ph.D. Student in Princeton University's Psychology Department, where he studies behavioral decision theory and is advised by Jonathan Cohen and David Laibson. In addition to his work on decision-making, John has worked on developing novel methods for measuring political ideology using text and social networks. In addition to his time at Princeton, he has worked at Microsoft Research and is the author of two books on machine learning for a non-academic audience that were published by O'Reilly Media. His undergraduate degree was in pure mathematics.

Partition Decoupling for Roll Call Data [video]
Thursday, December 7, 2012• 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:Data driven measurements of ideology, centered around Poole and Rosenthal's family of NOMINATE models, seek to derive ideological categorizations and relations between individuals based on public roll call votes. While these methods have been very successful, they are constrained, in part, by their strict sets of assumptions. We approach this problem from a new angle, bringing to bear methods from machine learning to construct a data-driven model for ideology from roll call data that is geometric in nature. We adapt the methodology of the "Partition Decoupling Method," an unsupervised learning technique, to produce a multiscale geometric description of a weighted network associated to the roll call votes. The dominant factors in our analysis form a low (one or two) dimensional representation with secondary factors adding higher dimensional features. In this way, our method supports and extends the work of both Poole-Rosenthal and Heckman-Snyder concerning dimensionality of the action space. When used as a predictive model, this geometric view significanly outperforms spatial models such as DW-NOMINATE and the Heckman-Snyder 6-factor model, both in raw accuracy as well as Aggregate Proportional Reduced Error (APRE).

Bio:Scott Pauls is an associate professor of mathematics at Dartmouth College, where he has taught since 2001. His work in applied mathematics focuses on the study of complex systems and the construction and analysis of network models. His recent work includes applications to economics, genetics, neuroscience, and political science.

Past CSSI Seminars :

Spring 2012

Fall 2011

Spring 2011