This startup’s innovative tool for mechanistic interpretability enables debugging of LLMs.
Mapping models with Silico allows users to delve into specific aspects of a trained model, such as individual neurons or groups, and conduct experiments to understand their functionality. (Access to the model’s inner workings is necessary; many may not be able to explore models like ChatGPT or Gemini, but Silico can be used with various open-source models.) You can observe which inputs activate different neurons and trace pathways upstream and downstream from a neuron to see how other neurons influence it and how it, in turn, affects others.
For instance, researchers discovered a neuron in the open-source model Qwen 3 that was connected to the trolley problem. Activating this neuron altered the model’s responses, framing them as explicit moral dilemmas. “When this neuron’s active, all sorts of unusual behaviors emerge,” explains Ho.
Identifying the sources of such odd behavior has become standard practice. However, there is a desire to simplify the adjustment of that behavior. With Silico, developers can modify the parameters related to specific neurons to enhance or diminish certain responses.
In another case, researchers inquired whether a company should disclose that its AI behaves deceptively in 0.3% of instances, affecting 200 million users. The model replied negatively, citing potential adverse business impacts from such disclosure.
By analyzing the model, researchers found that amplifying neurons linked to transparency and disclosure shifted the answer from no to yes nine out of ten times. “The model possessed ethical reasoning capabilities, but they were overshadowed by commercial risk assessments,” notes Ho.
Modifying model values is just one method. Silico can also assist in directing the training process by filtering out certain training data to prevent the setting of undesirable values for particular parameters from the start.
For example, numerous models may assert that 9.11 is greater than 9.9. Investigating the model might reveal influences from neurons related to the Bible, where verse 9.9 precedes 9.11, or from code repositories with sequential updates labeled 9.9, 9.10, 9.11, etc. With this insight, the model can be retrained to ignore its “Bible” neurons when performing mathematical operations.
By introducing Silico, Goodfire aims to extend techniques that were once the exclusive domain of leading labs to smaller companies and research teams seeking to develop or modify their own models based on open-source technology. The tool will be offered with pricing determined on a case-by-case basis based on customer needs (specific pricing details were not provided).
