Siow Meng Low
ML researcher — safe RL with feedback; exploring RL × LLMs
About
I’m an ML researcher based in Singapore. I recently completed a PhD in Computer Science at SMU and currently teach undergraduate CS as an adjunct lecturer while exploring research roles.
My work focuses safe reinforcement learning from feedback, particularly when safety constraints are implicit, history-dependent, or only available through sparse trajectory-level labels. Lately I’m also exploring the RL × LLM interface for safety and alignment.
Research highlights
- TraCeS (Under review; earlier ICLR’25 Alignment workshop)Turns rollout-level safety labels into per-step safety signals, enabling agents to learn safer behavior without hand-designed safety costs. (Intuition: learns an internal "safety meter" from sparse pass/fail feedback.)
- Safe MDP planning via temporal patterns of unsafe trajectories (ICAPS 2023)Learns temporal patterns of undesirable behavior and uses them to steer planning away from negative side effects.
- ILBO (AAAI 2022)Introduces a sample-efficient iterative lower-bound optimization approach for learning deep reactive policies in continuous control/planning settings.