exploratory prototype · interpretability · Apr 2026

Emotion vectors for Gemma

Exploratory interpretability pipeline for extracting emotion-related activation directions from Gemma activations.

interpretability · steering vectors · Gemma · representation learning

This project was an exploratory attempt to make mechanistic interpretability less abstract. The premise was simple: if a model has internal directions associated with emotional concepts, can we extract candidates cleanly enough to inspect or test them?

The pipeline loads an open-weight model, processes an emotion-story dataset, extracts residual-stream activations, removes confounds with PCA, and saves per-layer candidate directions. The useful part was mostly the plumbing: dataset handling, memory pressure, batching, logging, and resumability.

The local timestamp scan dates the source in Pi/ to April 2026.

← All projects