Started in April 2024, the PAN-Metrics initiative—a series of conferences and seminars dedicated to presenting and discussing research in social science measurement—continues its activities.
Currently, the Institute of Philosophy and Sociology of the Polish Academy of Sciences (IFiS PAN) hosts regular PAN-Metrics seminars, where invited guests present their research findings.
To date, two events from this new series have taken place:
On November 28, 2024, Dr. Maciej Pankiewicz from the University of Pennsylvania delivered a presentation titled Innovative Applications of Large Language Models in Research: A Focus on Education.

Abstract:
In this presentation, I will explore the transformative role of Large Language Models (LLMs) in educational research. I’ll begin by discussing LLM techniques used in my research, such as prompt engineering, few-shot learning, Retrieval-Augmented Generation (RAG), and embeddings, highlighting how these methods enhance data analysis and interpretation. I’ll introduce examples using diverse data sources—including video recordings, discussion forums, and programming datasets—to demonstrate the flexibility and breadth of LLM applications. Through projects like JeepyTA, I’ll show how LLMs may be used to impact learning processes, enabling personalized learning experiences, and opening new avenues for data-driven educational studies.
Presentation slides are available here.
On January 28, 2025, the PAN-Metrics seminar featured Professor Miklós Sebők from the Hungarian Research Network (HUN-REN), who delivered a presentation titled Leveraging Open Large Language Models for Multilingual Policy Topic Classification: The Babel Machine Approach.

Abstract:
The article presents an open-source and freely available natural language processing system for comparative policy studies. The CAP Babel Machine allows for the automated classification of input files based on the 21 major policy topics of the codebook of the Comparative Agendas Project (CAP). By using multilingual XLM-RoBERTa large language models, the pipeline can produce state-of-the-art level outputs for selected pairs of languages and domains (such as media or parliamentary speech). For 24 cases out of 41, the weighted macro F1 of our language-domain models surpassed 0.75 (and, for 6 language-domain pairs, 0.90). Besides macro F1, for most major topic categories, the distribution of micro F1 scores is also centered around 0.75. These results show that the CAP Babel machine is a viable alternative for human coding in terms of validity at less cost and higher reliability. The proposed research design also has significant possibilities for scaling in terms of leveraging new models, covering new languages, and adding new datasets for fine-tuning. Based on our tests on manifesto and sentiment data, we argue that model-pipeline frameworks such as the Babel Machine can, over time, potentially replace double-blind human coding for a multitude of comparative classification problems.
The slides are available here.
The next seminar in the PAN-Metrics series is scheduled for March 2025.