Tairan He 「何泰然」

I am a first-year Ph.D. student at the Robotics Institute at Carnegie Mellon University, advised by Guanya Shi and Changliu Liu.

Previously, I received my Bachelor's degree in computer science at Shanghai Jiao Tong University, advised by Weinan Zhang. I also spent time at Microsoft Research Asia.

Email: tairanh [AT] andrew.cmu.edu

CV  /  Google Scholar  /  GitHub  /  Twitter  /  LinkedIn  /  Bilibili

profile photo
Research Topics

As a robotics researcher, my goal is to challenge conventional notions of what robots can achieve. I strive to push the limits of robotics by conducting research at the intersection of machine learning and control theory. My focus is on developing intelligent robots that possess agility, stability, safety, and robustness.

To accomplish this, I explore ways to bridge the gap between learning and control, thereby creating a unified framework for robotics. Through my research, I aim to expand the capabilities of robots and transform how they can be used in a variety of fields. Ultimately, my goal is to change people's perceptions of robots and demonstrate the true potential of this exciting technology.

Publications (* equal contribution)
Safe Deep Policy Adaptation
Wenli Xiao*, Tairan He*, John Dolan, Guanya Shi
Under review,  2023 
Conference on Robot Learning (CoRL), 2023 Workshop:  Towards Reliable and Deployable Learning-Based Robotic Systems
ArXiv  /  Blog  /  Video
A critical goal of autonomy and artificial intelligence is enabling autonomous robots to rapidly adapt in dynamic and uncertain environments. Classic adaptive control and safe control provide stability and safety guarantees but are limited to specific system classes. In contrast, policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges. We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning. SafeDPA jointly learns adaptive policy and dynamics models in simulation, predicts environment configurations, and fine-tunes dynamics models with few-shot real-world data. A safety filter based on the Control Barrier Function (CBF) on top of the RL policy is introduced to ensure safety during real-world deployment. We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations. Comprehensive experiments on (1) classic control problems (Inverted Pendulum), (2) simulation benchmarks (Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate great superiority of SafeDPA in both safety and task performance, over state-of-the-art baselines. Particularly, SafeDPA demonstrates notable generalizability, achieving a 300% increase in safety rate compared to the baselines, under unseen disturbances in real-world experiments.
Progressive Adaptive Chance-Constrained Safeguards for Reinforcement Learning
Zhaorun Chen, Binhao Chen, Tairan He, Liang Gong, Chengliang Liu
Under review,  2023 
Safety assurance of Reinforcement Learning (RL) is critical for exploration in real-world scenarios. In handling the Constrained Markov Decision Process, current approaches experience intrinsic difficulties in trading-off between optimality and feasibility. Direct optimization methods cannot strictly guarantee state-wise in-training safety while projection-based methods are usually inefficient and correct actions through lengthy iterations. To address these two challenges, this paper proposes an adaptive surrogate chance constraint for the safety cost, and a hierarchical architecture that corrects actions produced by the upper policy layer via a fast Quasi-Newton method. Theoretical analysis indicates that the relaxed probabilistic constraint can sufficiently guarantee forward invariance to the safe set. We validate the proposed method on 4 simulated and real-world safety-critical robotic tasks. Results indicate that the proposed method can efficiently enforce safety (nearly zero-violation), while preserving optimality (+23.8%), robustness and generalizability to stochastic real-world settings.
State-wise Safe Reinforcement Learning: A Survey
Weiye Zhao, Tairan He, Rui Chen, Tianhao Wei, Changliu Liu
International Joint Conference on Artificial Intelligence (IJCAI),  2023 
Despite the tremendous success of Reinforcement Learning (RL) algorithms in simulation environments, applying RL to real-world applications still faces many challenges. A major concern is safety, in another word, constraint satisfaction. State-wise constraints are one of the most common constraints in real-world applications and one of the most challenging constraints in Safe RL. Enforcing state-wise constraints is necessary and essential to many challenging tasks such as autonomous driving, robot manipulation. This paper provides a comprehensive review of existing approaches that address state-wise constraints in RL. Under the framework of State-wise Constrained Markov Decision Process (SCMDP), we will discuss the connections, differences, and trade-offs of existing approaches in terms of (i) safety guarantee and scalability, (ii) safety and reward performance, and (iii) safety after convergence and during training. We also summarize limitations of current methods and discuss potential future directions.
Visual Imitation Learning with Patch Rewards
Minghuan Liu, Tairan He, Weinan Zhang, Shuicheng Yan, Zhongwen Xu
International Conference on Learning Representations (ICLR),  2023 
ArXiv  /  Blog  /  Code
Explainable visual imitation learning from high-dimensional image demonstrations remains a grand challenge. The problem raises issues in many aspects, such as representation learning, sample complexity, and training stability. Prior works mostly neglect the dense information provided in the image demonstration. This paper proposes an efficient visual imitation learning method, PatchAIL, to learn explainable patch-based rewards that measure the expertise of different local parts of given images. The patch-based knowledge is used to regularize the aggregated reward and stabilize the training. We evaluate our method on the standard pixel-based benchmark DeepMind Control Suite in experiments. The empirical results confirm that PatchAIL supports efficient training that outperforms baseline methods and provides valuable interpretations for visual imitation learning.
Safety Index Synthesis via Sum-of-Squares Programming
Weiye Zhao*, Tairan He*, Tianhao Wei, Simin Liu, Changliu Liu
American Control Conference (ACC),  2023 
Control systems often need to satisfy strict safety requirements. Safety index provides a handy way to evaluate the safety level of the system and derive the resulting safe control policies. However, designing safety index functions under control limits is difficult and requires a great amount of expert knowledge. This paper proposes a framework for synthesizing the safety index for general control systems using sum-of-squares programming. Our approach is to show that ensuring the non-emptiness of safe control on the safe set boundary is equivalent to a local manifold positiveness problem. We then prove that this problem is equivalent to sum-of-squares programming via the Positivstellensatz of algebraic geometry. We validate the proposed method on robot arms with different degrees of freedom and ground vehicles. The results show that the synthesized safety index guarantees safety and our method is effective even in high-dimensional robot systems.
Probabilistic Safeguard for Reinforcement Learning Using Safety Index Guided Gaussian Process Models
Weiye Zhao*, Tairan He*, Changliu Liu
Learning for Dynamics & Control Conference (L4DC),  2023 
Safety is one of the biggest concerns to applying reinforcement learning (RL) to the physical world. In its core part, it is challenging to ensure RL agents persistently satisfy a hard state constraint without white-box or black-box dynamics models. This paper presents an integrated model learning and safe control framework to safeguard any agent, where its dynamics are learned as Gaussian processes. The proposed theory provides (i) a novel method to construct an offline dataset for model learning that best achieves safety requirements; (ii) a parameterization rule for safety index to ensure the existence of safe control; (iii) a safety guarantee in terms of probabilistic forward invariance when the model is learned using the aforementioned dataset. Simulation results show that our framework guarantees almost zero safety violation on various continuous control tasks.
AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement Learning
Tairan He, Weiye Zhao, Changliu Liu
AAAI Conference on Artificial Intelligence (AAAI), 2023 
Safety is a critical hurdle that limits the application of deep reinforcement learning (RL) to real-world control tasks. To this end, constrained reinforcement learning leverages cost functions to improve safety in constrained Markov decision processes. However, such constrained RL methods fail to achieve zero violation even when the cost limit is zero. This paper analyzes the reason for such failure, which suggests that a proper cost function plays an important role in constrained RL. Inspired by the analysis, we propose AutoCost, a simple yet effective framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance. We validate the proposed method and the searched cost function on the safe RL benchmark Safety Gym. We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs with baseline agents that use the same policy learners but with only extrinsic costs. Results show that the converged policies with intrinsic costs in all environments achieve zero constraint violation and comparable performance with baselines.
Reinforcement Learning with Automated Auxiliary Loss Search
Tairan He, Yuge Zhang, Kan Ren, Minghuan Liu, Che Wang, Weinan Zhang, Yuqing Yang, Dongsheng Li
Conference on Neural Information Processing Systems (NeurIPS), 2022 
ArXiv  /  Blog  /  Code
A good state representation is crucial to solving complicated reinforcement learning (RL) challenges. Many recent works focus on designing auxiliary losses for learning informative representations. Unfortunately, these handcrafted objectives rely heavily on expert knowledge and may be sub-optimal. In this paper, we propose a principled and universal method for learning better representations with auxiliary loss functions, named Automated Auxiliary Loss Search (A2LS), which automatically searches for top-performing auxiliary loss functions for RL. Specifically, based on the collected trajectory data, we define a general auxiliary loss space of size 7.5×10^20 and explore the space with an efficient evolutionary search strategy. Empirical results show that the discovered auxiliary loss (namely, A2-winner) significantly improves the performance on both high-dimensional (image) and low-dimensional (vector) unseen tasks with much higher efficiency, showing promising generalization ability to different settings and even different benchmark domains. We conduct a statistical analysis to reveal the relations between patterns of auxiliary losses and RL performance.
Model-free Safe Control for Zero-Violation Reinforcement Learning
Weiye Zhao, Tairan He, Changliu Liu
Conference on Robot Learning (CoRL), 2021 
We present a model-free safe control strategy to synthesize safeguards for DRL agents, which will ensure zero safety violation during training. In particular, we present an implicit safe set algorithm, which synthesizes the safety index (also called the barrier certificate) and the subsequent safe control law only by querying a black-box dynamic function (e.g., a digital twin simulator). The theoretical results indicate the implicit safe set algorithm guarantees forward invariance and finite-time convergence to the safe set. We validate the proposed method on the state-of-the-art safety benchmark Safety Gym. Results show that the proposed method achieves zero safety violation and gains 95% ± 9 % cumulative reward compared to state-of-the-art safe DRL methods.
Energy-Based Imitation Learning
Minghuan Liu, Tairan He, Minkai Xu, Weinan Zhang
International Conference on Autonomous Agents and Multiagent Systems (AAMAS) , 2021  (oral)
ArXiv / Code
We propose EBIL, a two-step solution for imitation learning: first estimate the energy of expert’s occupancy measure, and then take the energy to construct a surrogate reward function as a guidance for the agent to learn the desired policy. EBIL combines the idea of both EBM and occupancy measure matching, and via theoretic analysis we reveal that EBIL and Max-Entropy IRL (MaxEnt IRL) approaches are two sides of the same coin, and thus EBIL could be an alternative of adversarial IRL methods. Extensive experiments on qualitative and quantitative evaluations indicate that EBIL is able to recover meaningful and interpretative reward signals while achieving effective and comparable performance against existing algorithms on IL benchmarks.
SJTU Anonymous Forum 「无可奉告」

Android Code  /  iOS Code  /  Project Page  /  Farewell Video

A carefree forum platform for SJTUers sharing and talking with anonymous identity.

More than 10000+ users used「无可奉告」in the SJTU campus.

Carnegie Mellon University, Pittsburgh, PA, USA
Ph.D. in Robotics • Aug. 2023 to Present
Shanghai Jiao Tong University, Shanghai, China
B.E. in Computer Science • Aug. 2018 to Jun. 2023
Intelligent Control Lab, Carnegie Mellon University
Research Intern • Jan. 2022 to Jan. 2023
Advisor: Changliu Liu
Microsoft Research
Research Intern • March. 2021 to Dec. 2021
Advisor: Yuge Zhang and Kan Ren
APEX Lab, Shanghai Jiao Tong University
Research Intern • Jul. 2019 to Jan. 2023
Advisor: Weinan Zhang
Reviewer Service

International Conference on Learning Representations (ICLR), 2024
IEEE Conference on Decision and Control (CDC), 2023
Conference on Neural Information Processing Systems (NeurIPS), 2023
Learning for Dynamics & Control Conference (L4DC) 2023
AAAI Conference on Artificial Intelligence (AAAI) 2023, 2024
Conference on Robot Learning (CoRL) 2022, 2023

Updated at Oct. 2023
Thanks Jon Barron for this amazing template