Dejavu: Towards Experience Feedback Learning for Embodied Intelligence

Shaokai Wu, Yanbiao Ji, Qiuchang Li, Zhiyi Zhang, Qichen He,
Wenyuan Xie, Guodong Zhang, Bayram Bayramli, Yue Ding, Hongtao Lu
School of Computer Science, Shanghai Jiao Tong University

Paper (PDF) Code (Coming Soon) BibTeX

EFN augments a frozen VLA with retrieved experience trajectories and residual action corrections.

Figure 1. A frozen VLA is augmented by an Experience Feedback Network (EFN) that retrieves task‑relevant trajectories, predicts a residual correction, and steers the next observation toward the retrieved successor frame.

Abstract

We present Dejavu, a general post‑deployment learning framework that augments a frozen Vision‑Language‑Action (VLA) policy with an Experience Feedback Network (EFN) and a live experience bank. EFN retrieves a contextually successful prior step and conditions action prediction on this guidance by outputting a residual correction that refines the base policy’s action. EFN is optimized with reinforcement learning using a dense, similarity‑based reward that compares the realized next observation to the next observation in the retrieved trajectory. During deployment, the agent continually appends successful rollouts to its memory, enabling measurable improvement without modifying the backbone weights.

Paper

If the preview does not load, use the PDF button above.

Method Overview

An experience is defined as a synchronized trajectory of vision, language, and action. At each step, EFN retrieves a task‑filtered candidate from the bank using language‑conditioned visual similarity, then predicts a residual action that is added to the base VLA output. A dense semantic reward encourages the next observation to resemble the successor frame from the retrieved experience while regularizing residual magnitude and discouraging idling. We train EFN with Soft Actor–Critic (SAC) and deploy with deterministic residual corrections.

Training pipeline: retrieval + residual policy learning with similarity rewards.

Training: retrieve, correct, and optimize with dense similarity rewards. Inference: instruction‑filtered retrieval and online experience growth.

Demos (1 × speed)

Task DrawerStore · with EFN

Description: Open the drawer with the left arm, pick up the napkin packet with the right arm, hand it to the left arm, and store it inside the drawer. This demo shows a smooth, decisive execution enabled by EFN.

  1. Use the left arm to pull the drawer open beyond the minimum displacement.
  2. Grasp the napkin packet from the tabletop with the right arm.
  3. Transfer the packet from the right gripper to the left gripper.
  4. Place the packet into the drawer interior and open both grippers.

Task DrawerStore · without EFN

Description: Paired comparison on the most challenging task. Even in a successful rollout, the baseline policy (without EFN) often exhibits hesitant behavior such as repeated grasps and an imprecise final release.

  1. Use the left arm to pull the drawer open beyond the minimum displacement.
  2. Grasp the napkin packet from the tabletop with the right arm.
  3. Transfer the packet from the right gripper to the left gripper.
  4. Place the packet into the drawer interior and open both grippers.

Task ShelfSort · with EFN

Description: Grasp a drink from the shelf with the right arm and place it into the designated cluster region next to visually similar items, then release the gripper.

  1. Identify the target drink placed at a randomized start position on the shelf.
  2. Grasp the target drink with the right arm.
  3. Move to the cluster region and place it next to similar drinks (within a fixed 3D tolerance).
  4. Release the right-arm gripper at the end of the episode.

Task StockLift · with EFN

Description: Pick up the top can from a vertical stack on a small table with the right arm and place it onto the shelf within the predefined goal region, then release.

  1. Grasp the top can from the stack with the right arm.
  2. Lift the can cleanly without disturbing the remaining stack.
  3. Move to the shelf and place the can in the goal region.
  4. Open the right gripper to finish the task.

Task BottlePlace · with EFN

Description: Pick up the plastic bottle from the tabletop with the right arm and place it into the open box that serves as the goal region, then open the gripper.

  1. Grasp the bottle from the tabletop with the right arm.
  2. Lift and move toward the open box (goal region).
  3. Place the bottle inside the box (center lies within the predefined region).
  4. Release by fully opening the right-arm gripper.

BibTeX

@misc{wu2026dejavuexperiencefeedbacklearning, title={Dejavu: Towards Experience Feedback Learning for Embodied Intelligence}, author={Shaokai Wu and Yanbiao Ji and Qiuchang Li and Zhiyi Zhang and Qichen He and Wenyuan Xie and Guodong Zhang and Bayram Bayramli and Yue Ding and Hongtao Lu}, year={2026}, eprint={2510.10181}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2510.10181}, }

Acknowledgements

This website follows the structure of the nerfies.github.io academic project template.