All News

Adam Poliak and Joon Luther '25 Build System for Public Health Insights

September 15, 2025 Melissa Scott
Joon Luther '25 Presents Poster at 2025 Conference
Joon Luther '25 presents poster at International AAAI Conference on Web and Social Media in Copenhagen, Denmark.

At 麻豆视频, a spark of encouragement from a faculty member can ignite confidence for a student to pursue unique opportunities. In spring 2025, Assistant Professor of Computer Science Adam Poliak invited Joon Luther '25, a computer science and mathematics double-major in his computer science senior seminar, to collaborate on a Social Media Mining for Health/Health Real-World Data 2025 Shared Task. 

"She asked great questions during the class that demonstrated her innate curiosity. I came across this shared task and asked Joon if she was interested in working together. Over the course of two months, Joon learned a lot and outperformed expectations," says Poliak.

An excellent introduction for undergraduates into Machine Learning and Natural Language Processing research, Shared Tasks are academic competitions where faculty and student researchers are challenged to build Machine Learning systems for real-world problems. The Social Media Mining for Health Shared Tasks focuses on collecting public health insights from social media data. 

"Official health data from the CDC takes a lot of time and resources to be collected and released, but noisier health data is widely available on social media via individual users' posts on Reddit, Instagram, Twitter, Facebook, and other social media platforms," explains Luther, who took a job as a software engineer at Goldman Sachs after graduating. 

Poliak and Luther decided to tackle Task 6, which focused on the shingles vaccine and asked researchers to develop a system of classifying Reddit posts as "positive" if they mention personal shingles vaccine side effects or "negative" if they do not.

Working together from March through May, Poliak and Luther used a labeled dataset of over 3,000 Reddit posts, each marked as either 鈥減ositive鈥 or 鈥渘egative鈥 to train their system. Instead of relying on just one system to do the job, they trained several different models and then combined them 鈥 a method known as "stacking" 鈥 to try to boost the overall accuracy.

"The work I did with Professor Poliak shows that a small performance improvement is possible when stacking RoBERTa with a non-neural model, like logistic regression or Naive Bayes," explains Luther on the specifics of the project, which combined the RoBERTa system, known to be good at sorting text, with other models.  

In June, Luther presented their work at the , part of the , in Copenhagen, Denmark. 

"This project was a great opportunity for me to explore a Natural Language Processing application that benefits public health understanding. I enjoyed hearing from data science and psychology experts about their research and the future of health-related social media mining," reflects Luther. 

 

Tagged as