
Dongwei Jiang
LLM reasoning and self-improvement. Previously focused on speech
Applied Scientist at Amazon. Previously master's student at JHU
Research Interest
I am broadly interested in reasoning. In the realm of reasoning, I’ve worked on:
- Building general-purpose verifier through rationale extraction from unlabelled data to provide process supervision during reasoning [1] (mentioned in Lilian Weng's blog)
- Investigating the effectiveness of CoT prompting across 100+ papers and 20 datasets and discovering CoT benefits mainly math/symbolic reasoning tasks [2] (discussion with Jason Wei)
- Theorem proving and Logical reasoning that uses theorem prover Lean to help with the reasoning process [3]
- Decompositional entailment that formulates a consistent and theoretically grounded approach to annotating decompositional entailment dataset [4]
I’m also interested in the self-improvement capability of LLMs. If we begin with the “end” (superintelligence/AGI) in mind, relying on human input won’t get us there. We need to teach models to interact with the environment and self-improve.
- Understanding the reason that prevents LLM from effective self-improvement [5]
- Probing the limits of self-improvement even with high-quality feedback [6]
Recently, I’ve been looking into reinforcement learning and agents — I don’t really want to overextend myself, but these are fundamental areas that are too important to overlook.
More About Me
In my past life, I spent six years working in the industry on speech processing and speech foundation models. Recently, my focus has shifted to foundation models. I completed my master’s degree at JHU, where I worked with Professor Daniel Khashabi and Benjamin Van Durme. I’ve also collaborated with Professor Shay Cohen from Edinburgh and Greg Durrett from UT Austin. Currently, I’m working as an Applied Scientist at Amazon, where I continue to pursue research in foundation models and related areas.
In my free time, I sometimes play Civ 6 or Hearthstone. I also play tennis, badminton, and go bouldering every other day - well, more like every three or four days, but who’s counting? I’ve noticed there’s something puzzle-like about all these activities—whether it’s planning civilizations, crafting the perfect deck, or figuring out a tricky climbing route—which probably explains why I enjoy them alongside my research work.
Selected Publications
- NAACL