Elastic Decision Transformer

¹OMRON SINIC X, ²UC San Diego

Achieving Trajectory Stitching Using Non-Stitching Models

We show that achieving trajectory stitching is possible without explicitly training Decision Transformer (DT) to stitch. A key insight is that DT can be utilized to stitch by adjusting the history length maintained in DT. Given two trajectories \(s^a, s^b\) in a datasets, a non-stitching model starting from \( s_{t-1}^b \) may end up with a sub-optimal state \( s_{t+1}^b \) as in the dataset. However, if we are able to adjust the history length to 1 at \( s_t \), the model will be able to stitch with \( s_{t+1}^a \) and generate a more optimal trajectory since there exists \( s_t\rightarrow s_{t+1}^a\) in the dataset.

We propose an architecture that approximate the maximum value of in-support return \( \tilde{R} \) with expectile regression for different history lengths. Estimating this value allows us to find the optimal history length for stitching. We show that the proposed method outperforms DT and its variants in a multi-task regime on the D4RL locomotion benchmark and Atari games.

Abstract

This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games.

D4RL (multi-task)

Multi-task D4RL results. The proposed EDT outperform the baseline methods by a large margin in a multi-task setting. We gather medium-replay datasets from the four tasks as a single dataset for training. The medium-replay datasets contain trajectories collected from random policies, which makes it challenging for non-stitching methods.

Method

The figure illustrates the action inference procedure within the proposed Elastic Decision Transformer. Initially, we estimate the value maximizer, \( \tilde{R}_i \), for each length \(i\) within the search space, as delineated by the green rectangle. Subsequently, we identify the maximal value from all \(\tilde{R}_i\), which provides the optimal history length \(w\). Utilizing this optimal history length, we estimate the expert value at time step \(t\), denoted as \(\tilde{R}^t_{w,e}\), by Bayes' Rule. Finally, the action prediction is accomplished via the causal transformer decoder, which is indicated by the blue rectangle. In practice, we retain the distribution of \(R_i^t\) during the estimation process for \(\tilde{R}_i\) and we present the inference here for clarity.

BibTex

@article{wu2023elastic, author={Wu, Yueh-Hua and Wang, Xiaolong and Hamaya, Masashi}, title={Elastic Decision Transformer}, archivePrefix = {arXiv}, eprint = {2307.02484}, primaryClass = {cs.LG}, year={2023}, }

Elastic Decision Transformer

Achieving Trajectory Stitching Using Non-Stitching Models

Abstract

Summary

Results

Atari (multi-task)

D4RL (multi-task)

Method

BibTex