exploratory prototype · interpretability · Apr 2026
Emotion vectors for Gemma
Exploratory interpretability pipeline for extracting emotion-related activation directions from Gemma activations.
This project was an exploratory attempt to make mechanistic interpretability less abstract. The premise was simple: if a model has internal directions associated with emotional concepts, can we extract candidates cleanly enough to inspect or test them?
The pipeline loads an open-weight model, processes an emotion-story dataset, extracts residual-stream activations, removes confounds with PCA, and saves per-layer candidate directions. The useful part was mostly the plumbing: dataset handling, memory pressure, batching, logging, and resumability.
The local timestamp scan dates the source in Pi/ to April 2026.