Sparse ActionGen
Accelerating Diffusion Policy with Real-Time Pruning

Abstract

Diffusion Policy has dominated action generation due to its strong capabilities for modeling multi-modal action distributions, but its multi-step denoising processes make it impractical for real-time visuomotor control. Existing caching-based acceleration methods typically rely on static schedules that fail to adapt to the dynamics of robot-environment interactions, thereby leading to suboptimal performance. In this paper, we propose Sparse ActionGen (SAG) for extremely sparse action generation. To accommodate the iterative interactions, SAG customizes a rollout-adaptive prune-then-reuse mechanism that first identifies prunable computations globally and then reuses cached activations to substitute them during action diffusion. To capture the rollout dynamics, SAG parameterizes an observation-conditioned diffusion pruner for environment-aware adaptation and instantiates it with a highly parameter- and inference-efficient design for real-time prediction. Furthermore, SAG introduces a one-for-all reusing strategy that reuses activations across both timesteps and blocks in a zig-zag manner. Extensive experiments demonstrate that SAG achieves up to 4x generation speedup without sacrificing performance.

Methodology

Method Overview

Rollout-Adaptive Pruning

Unlike static caching, SAG customizes a prune-then-reuse mechanism that adapts to robot-environment interactions. It uses a specialized Observation-Conditioned Diffusion Pruner to predict sparsity patterns in real-time.

One-for-All Reusing

We challenge the block-wise caching paradigm. SAG introduces a strategy that reuses activations across both timesteps and blocks in a zig-zag manner, minimizing global redundancy and maximizing speed.

Experimental Results

We evaluate SAG on multiple robotic benchmarks including RoboMimic and Franka Kitchen. SAG achieves state-of-the-art speedup while maintaining or even improving success rates.

Table 1: Benchmark on Proficient Human (PH) Demonstration Data

Method
Lift
Can
Square
Transport
Tool
Full Precision
100±0.0
99±1.2
88±7.0
78±3.3
51±5.9
DDIM (K=30)
100±0.0 (3.32x)
96±2.8 (3.33x)
86±4.9 (3.34x)
76±5.7 (3.30x)
42±4.6 (3.32x)
Efficient VLA
100±0.0 (3.37x)
75±2.1 (3.36x)
86±3.4 (3.32x)
60±4.2 (3.24x)
38±2.7 (3.35x)
L2C
100±0.0 (1.26x)
86±1.8 (1.26x)
23±4.1 (1.26x)
66±3.0 (1.33x)
2±0.5 (1.29x)
BAC
100±0.0 (3.23x)
94±1.5 (3.42x)
87±2.9 (3.42x)
78±3.6 (3.09x)
51±2.8 (3.33x)
Falcon
100±0.0 (1.81x)
85±2.0 (1.13x)
60±2.7 (1.22x)
50±3.3 (2.85x)
42±1.9 (1.17x)
SDP
100±0.0 (1.88x)
96±1.4 (1.70x)
85±2.1 (1.70x)
72±2.9 (1.67x)
17±1.2 (1.77x)
CP
78±3.3 (15.0x)
38±2.8 (14.9x)
22±4.5 (14.8x)
51±3.1 (10.2x)
0±0.0 (14.2x)
SAG (Ours)
100±0.0 (3.72x)
98±1.6 (3.70x)
89±2.5 (3.64x)
85±3.3 (3.44x)
50±2.8 (3.65x)

Table 2: Benchmark on Mixed Human (MH) Demonstration Data

Method
Lift
Can
Square
Transport
Full Precision
100±0.0
93±6.5
76±4.3
54±5.2
DDIM (K=30)
100±0.0 (3.31x)
92±4.9 (3.33x)
78±2.8 (3.32x)
51±3.8 (3.30x)
Efficient VLA
100±0.0 (3.33x)
75±2.1 (3.34x)
52±3.0 (3.33x)
0±0.0 (3.50x)
L2C
100±0.0 (1.26x)
0±0.0 (1.26x)
53±1.9 (1.26x)
46±2.3 (1.28x)
BAC
100±0.0 (3.27x)
93±1.6 (3.43x)
78±2.8 (3.36x)
29±3.4 (3.45x)
Falcon
100±0.0 (1.85x)
84±2.0 (1.38x)
40±2.9 (1.54x)
27±3.1 (2.68x)
SDP
100±0.0 (1.69x)
94±1.4 (1.70x)
77±2.2 (1.69x)
52±3.0 (1.83x)
CP
98±1.0 (15.35x)
15±2.5 (15.1x)
30±3.1 (15.4x)
14±2.0 (11.5x)
SAG (Ours)
100±0.0 (3.75x)
94±5.7 (3.76x)
79±3.4 (3.72x)
50±1.6 (3.84x)

Table 3: Benchmark on Franka Kitchen (Multi-Stage)

Method
Kit_p1
Kit_p2
Kit_p3
Kit_p4
Speedup
Full Precision
100±0.0
100±0.0
100±0.0
99±0.6
-
DDIM (K=30)
100±0.0
100±0.8
100±0.0
99±0.8
3.37x
Efficient VLA
20±2.3
2±0.8
0±0.0
0±0.0
3.71x
L2C
100±0.0
100±0.0
100±0.0
97±1.2
1.28x
BAC
100±0.0
100±0.0
97±1.6
90±2.9
3.66x
Falcon
100±0.0
100±0.0
99±0.6
99±0.6
3.01x
SDP
100±0.0
100±0.0
100±0.0
99±0.6
1.63x
CP
76±2.5
63±4.3
46±4.5
12±6.8
31.4x
SAG (Ours)
100±0.0
100±0.0
100±0.0
99±0.6
4.03x

Real Robot Experiments

Qualitative results showing the robust performance of SAG under extreme sparsity.

Task Demos

Diffusion Policy
Consistency Policy
Streaming Diffusion Policy
DDIM
SAG Pruning Rate: 90%
SAG Pruning rate: 92%

Visualization for Real-time Pruning

Target Pruning Rate: 80%

Rollout A
Rollout B

Target Pruning Rate: 90%

Rollout A
Rollout B

Please note that the practical performance shown in camera view may not fully match the pruning rate. This is due to an additional network latency of approximately 80ms required to transmit 1920×1080 quality video.