Publications

Selected papers and preprints. For a complete list, see Google Scholar.

2026

TraCeS: Learning per-timestep constraint-violation credit from sparse trajectory-level labels

ICML 2026 · Siow Meng Low, Ze Gong, Akshat Kumar

Turns rollout-level safety labels into per-step safety signals, enabling agents to learn safer behavior without hand-designed safety costs.

Earlier version appeared at the ICLR 2025 Workshop on Bidirectional Human-AI Alignment (non-archival).

OpenReview ICML poster BibTeX
BibTeX
```
@inproceedings{
    low2026traces,
    title={TraCeS: Learning Per-Timestep Constraint-Violation Credit from Sparse Trajectory-Level Labels},
    author={Siow Meng Low and Ze Gong and Akshat Kumar},
    booktitle={Forty-third International Conference on Machine Learning},
    year={2026},
    url={https://openreview.net/forum?id=lXnoKG4jFU}
    }
```

2024

Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints

arXiv:2405.03005 · Siow Meng Low, Akshat Kumar

Learned non-Markovian safety constraints for RL, capturing temporal safety requirements beyond stepwise costs.

arXiv PDF BibTeX
BibTeX
```
@article{low2024safe,
  title={Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints},
  author={Low, Siow Meng and Kumar, Akshat},
  journal={arXiv preprint arXiv:2405.03005},
  year={2024}
}
```

2023

Safe MDP Planning by Learning Temporal Patterns of Undesirable Trajectories and Averting Negative Side Effects

ICAPS 2023 · Siow Meng Low, Akshat Kumar, Scott Sanner

Planning with learned temporal patterns of undesirable behavior to reduce negative side effects without requiring an explicit cost specification.

Publisher PDF BibTeX

BibTeX

@inproceedings{low2023safe,
  title={Safe MDP planning by learning temporal patterns of undesirable trajectories and averting negative side effects},
  author={Low, Siow Meng and Kumar, Akshat and Sanner, Scott},
  booktitle={Proceedings of the International Conference on Automated Planning and Scheduling},
  volume={33},
  pages={596--604},
  year={2023}
}

2022

Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

AAAI 2022 · Siow Meng Low, Akshat Kumar, Scott Sanner

A sample-efficient optimisation approach for deep reactive policies in continuous MDP planning, using iterative lower-bound optimisation.

Publisher BibTeX

BibTeX

@inproceedings{low2022sample,
  title={Sample-efficient iterative lower bound optimization of deep reactive policies for planning in continuous MDPs},
  author={Low, Siow Meng and Kumar, Akshat and Sanner, Scott},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  number={9},
  pages={9840--9848},
  year={2022}
}

2009

Prediction Based Energy-efficient Task Allocation for Delay-constrained Wireless Sensor Networks

IEEE SECON Workshops 2009 · Wendong Xiao, Siow Meng Low, Chen Khong Tham, Sajal Das

Prediction-based task allocation to reduce energy usage while meeting delay constraints in wireless sensor networks.

IEEE BibTeX

BibTeX

@inproceedings{xiao2009prediction,
  title={Prediction based energy-efficient task allocation for delay-constrained wireless sensor networks},
  author={Xiao, Wendong and Low, Siow Meng and Tham, Chen Khong and Das, Sajal},
  booktitle={2009 6th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops},
  pages={1--3},
  year={2009},
  organization={IEEE}
}