Reasoning with LLMs (UdS; Summer25/26)

📚 NLP Seminar – Summer 2025

Course Description

Large Language Models (LLMs) have become powerful tools capable of performing a wide range of human-like tasks, from translation to complex problem-solving. However, reasoning, the ability to logically infer, plan, and generalize, remains a fundamental challenge that distinguishes them from human intelligence. While techniques like Chain-of-Thought prompting and specialized training methods have significantly improved reasoning, the question of whether they can truly replicate human-like reasoning and how to approach it remains open. In this seminar, we will explore recent advancements in reasoning with LLMs, including techniques for improving their performance, analyses of how these models reason, and discussions on how to move beyond current approaches.

Prerequisites

Students should have a solid background in NLP and machine learning. Prior familiarity with language models, particularly transformer-based architectures like GPT-2, and common usage paradigms such as in-context learning and supervised fine-tuning will be assumed.

Registration

Please send an email to ykyao@lst.uni-saarland.de to indicate top-3 preferences of papers you are willing to present. The papers should be selected from the Reading column in the schedule section below. Please use the subject [LLM reasoning seminar] for your email and preferably send before April 21st.

If you’d like to propose an alternative reading that fits the week’s topic, reach out early. Substitutions are welcome as long as they align with the topic and we have time to read them.

Information

Instructor: Yuekun Yao

Course Format & Expectations

This seminar is built around group discussion of recent research. Each week, we’ll focus on one or more papers related to the topic of the week. While you're not expected to read every paper in depth, you should be familiar with the main ideas and come prepared to listen actively, ask questions, and contribute to the conversation.

Each student will take the lead for one session during the semester. As session leader, you’ll guide our discussion of the assigned paper and help the group unpack its contributions, assumptions, and implications. Your responsibilities include:

All participants are also expected to engage actively in the discussion. You should come to each session with some interesting questions or comments to contribute.

Evaluation

For students taking the seminar for 4 credits:

Presentation: 60%
Participation in discussion: 40%

For students taking the seminar for 7 credits:

Presentation: 30%
Participation in discussion: 20%
Term paper: 50%

Term paper

There are two options for the term paper. If you are not sure about your topic, feel free to reach out and ask.

Note that a survey paper does not mean simply summarizing several existing papers. You are expected to include your own insights. For example, you can propose a taxonomy to organize relevant works, highlight research questions that are important but understudied, or suggest novel research ideas. Running experiments to support your points can be very useful, but it is not required. Including figures to provide overviews or illustrate methods is recommended to make your paper easier to follow.

For the replication option, you should reproduce the method or experiments of an existing paper and investigate new research questions that were not addressed in the original work. For example, you could extend the method, design new experiments, or adapt it to a different scenario or domain. As long as you ask interesting questions and make reasonable decisions to approach them, it will be a strong term paper. This means that you are not expected to always produce positive results. Negative results can also be insightful.

A typical structure for the paper may include (but is not limited to):

Format: The term paper should be at most 8 pages, with no minimum page limit. The format can be ACL, ICLR, or NeurIPS. Reference pages are not counted.

Slides: Here are the slides of the course logistics.

Note: Students who wish to write a term paper must register for the course with 7 credits in the LSF system before the end of the lecture period (mid-July). The submission deadline for the term paper is September 30. Please make sure you register the correct version of the course.

Schedule

Below is the schedule for the course. The readings are subject to change.

Topic Subtopic Reading Optional Date Discussion Leader
Introduction Course Logistics 2025-04-15 Yuekun
Overview 2025-04-22 Yuekun
Inference chain-of-thought Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks 2025-04-29 Alexander Wehner
rationale exploration Self-Consistency Improves Chain of Thought Reasoning in Language Models Tree of Thoughts: Deliberate Problem Solving with Large Language Models 2025-05-06 Zhenyu Feng
task decomposition Least-to-Most Prompting Enables Complex Reasoning in Large Language Models Compositional Semantic Parsing with Large Language Models 2025-05-13 Franka Beyer
Chain-of-Thought Reasoning without Prompting Eva Gavaller
verification Training Verifiers to Solve Math Word Problems 2025-05-20 Sree Harsha Sunaye
Let’s Verify Step by Step 2025-05-27 Ümit Altar Binici
Learning SFT Show Your Work: Scratchpads for Intermediate Computation with Language Models 2025-06-03 Pranav Kushare
bootstrapping STaR: Bootstrapping Reasoning With Reasoning Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking 2025-06-10 Yu Yamashita
reinforcement tuning DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Tulu 3: Pushing Frontiers in Open Language Model Post-Training (Section 6) 2025-06-17 Nadia Asmi
s1: Simple test-time scaling SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training 2025-06-24 Aravind Krishnan
latent reasoning Training Large Language Models to Reason in a Continuous Latent Space 2025-07-01 Shane John Paul
Analysis Do Large Language Models Latently Perform Multi-Hop Reasoning? 2025-07-08 Antonia
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks 2025-07-15 Fedor Sizov

Contact

Please contact Yuekun (ykyao@lst.uni-saarland.de) for any questions.

Office: Building C7 2, R. 2.04

Office hour: Wednesday 4-5pm