Sparse ActionGen
Accelerating Diffusion Policy with Real-Time Pruning
Abstract
Diffusion Policy has dominated action generation due to its strong capabilities for modeling multi-modal action distributions, but its multi-step denoising processes make it impractical for real-time visuomotor control. Existing caching-based acceleration methods typically rely on static schedules that fail to adapt to the dynamics of robot-environment interactions, thereby leading to suboptimal performance. In this paper, we propose Sparse ActionGen (SAG) for extremely sparse action generation. To accommodate the iterative interactions, SAG customizes a rollout-adaptive prune-then-reuse mechanism that first identifies prunable computations globally and then reuses cached activations to substitute them during action diffusion. To capture the rollout dynamics, SAG parameterizes an observation-conditioned diffusion pruner for environment-aware adaptation and instantiates it with a highly parameter- and inference-efficient design for real-time prediction. Furthermore, SAG introduces a one-for-all reusing strategy that reuses activations across both timesteps and blocks in a zig-zag manner. Extensive experiments demonstrate that SAG achieves up to 4x generation speedup without sacrificing performance.
Methodology
Rollout-Adaptive Pruning
Unlike static caching, SAG customizes a prune-then-reuse mechanism that adapts to robot-environment interactions. It uses a specialized Observation-Conditioned Diffusion Pruner to predict sparsity patterns in real-time.
One-for-All Reusing
We challenge the block-wise caching paradigm. SAG introduces a strategy that reuses activations across both timesteps and blocks in a zig-zag manner, minimizing global redundancy and maximizing speed.
Experimental Results
We evaluate SAG on multiple robotic benchmarks including RoboMimic and Franka Kitchen. SAG achieves state-of-the-art speedup while maintaining or even improving success rates.
Table 1: Benchmark on Proficient Human (PH) Demonstration Data
Table 2: Benchmark on Mixed Human (MH) Demonstration Data
Table 3: Benchmark on Franka Kitchen (Multi-Stage)
Real Robot Experiments
Qualitative results showing the robust performance of SAG under extreme sparsity.
Task Demos
Visualization for Real-time Pruning
Target Pruning Rate: 80%
Target Pruning Rate: 90%
Please note that the practical performance shown in camera view may not fully match the pruning rate. This is due to an additional network latency of approximately 80ms required to transmit 1920×1080 quality video.