In the second installment of the DSI Squared Unsolved Problems Seminar Series, Professor Ken Benoit, Director of LSE’s Data Science Institute will be presenting his unsolved problems relating to his recent paper on ‘Scaling text with the class affinity model’.
When it comes to data science research and its impact, LSE’s strength in social science naturally complements Imperial’s strengths in science, technology, and medicine
When your data science research hits an issue, what do you? You should present your research problems at the DSI Squared Unsolved Problems Research Seminar, to get helpful ideas or to find collaborators across disciplines for breaking through the obstacles.
This series of seminars forms part of the DSI Squared collaborationbetween the LSE Data Science Institute and ICL Data Science Institute, to foster innovations by bridging the social sciences and computer science and STEM subjects.
Innovative researchers from both Institutes are invited to showcase their ideas in front of an expert audience of colleagues from both Institutes. These attendees offer their ranging expertise and knowledge to crowd source solutions to these stumbling blocks! For example, core data science experts may wish for contributions from those with knowledge in social science and vice versa.
Speaker: Professor Ken Benoit
Location: Data Observatory, Imperial College London’s South Kensington Campus.
Date: 25 November 2022
Time: 12:30 – 13:30
Probabilistic methods for classifying text form a rich tradition in machine learning and natural language processing. For many important problems, however, class prediction is uninteresting because the class is known, and instead the focus shifts to estimating latent quantities related to the text, such as affect or ideology. We focus on one such problem of interest, estimating the ideological positions of 55 Irish legislators in the 1991 Da ́il confidence vote, a challenge brought by opposition party leaders against the then-governing Fianna Fa ́il party in response to corruption scandals. In this application, we clearly observe support or opposition from the known positions of party leaders, but have only information from speeches from which to estimate the relative degree of support from other legislators. To solve this scaling problem and others like it, we develop a text modeling framework that allows actors to take latent positions on a “gray” spectrum between “black” and “white” polar opposites. We are able to validate results from this model by measuring the influences exhibited by individual words, and we are able to quantify the uncertainty in the scaling estimates by using a sentence-level block bootstrap. Applying our method to the Da ́il debate, we are able to scale the legislators between extreme pro-government and pro-opposition in a way that reveals nuances in their speeches not captured by their votes or party affiliations.