Measuring what political actors do in the world is at the core of empirical social science, but existing automated methods to extract actions and behavior from text are limited to specific types of events, inaccurate, or expensive to build. This paper introduces a method for automatically extracting political events from text that uses syntactic information provided by natural language processing tools and neural networks trained on a diverse set of hand-labeled text. The method treats event extraction as a "slot filling" task, identifying the words in text that report who is doing what to whom, where and when, as reported by whom. In contrast to previous methods, this method does not require hand-constructed dictionaries or pre-specified ontologies. To learn categories of extracted events, I introduce a new short text clustering algorithm that uses word embeddings to provide prior information. I illustrate the method by extracting one million events reported in State Department annual human rights reports and find that the types of abuses and specificity of reporting have changed over time.
Andrew Halterman is a PhD candidate in political science at MIT. His dissertation focuses on developing new techniques in text analysis to better understand micro-level processes in international conflict and civil war. His projects include new techniques for extracting descriptions of political events from text, geolocating events in text, generating event data from Arabic text, and understanding the determinants of violence against civilians in Syria using new micro-level data. His work has been supported by the National Science Foundation, Fulbright, the US Defense Department, and the U.S. Holocaust Memorial Museum. Andy holds a BA from Amherst College.