Sparse Autoencoders for FLUX.1 Diffusion Transformers

Research on mechanistic interpretability of diffusion transformers using sparse autoencoders. Discovered hierarchical and causal feature structure in FLUX.1-dev: early layers encode spatial/layout factors while late layers encode semantic concepts. Feature steering produced reliable, bidirectional edits.

Trained 4x-expansion Top-K SAEs (12,288 features) on layer 5/15 activations across 200 COCO-prompt generations. Top 20 features identified via CLIP coherence scoring mapped to interpretable concepts including object classes and spatial composition.