Non-ideal analog operations
The gain-cell circuits introduce non-idealities and constraints that prevent the direct mapping of standard pre-trained models, requiring a complex adaptation algorithm to achieve comparable performance.
Increased computational complexity for training
The non-linear relationship between input voltage and stored voltage in gain cells substantially increases the computational complexity and memory requirements if a gain-cell-based model were to be trained from scratch.
Limited memory retention time
The current silicon CMOS-based gain cells have a relatively short retention time of 5 ms due to capacitor leakage, which could necessitate frequent memory refreshing or impact performance for very long sequences, although OSFET-based cells could improve this.
Performance gap with state-of-the-art
While the hardware model performs comparably to a GPT-2 baseline and matches a from-scratch GPT-2-XL, it slightly underperforms the public GPT-2-XL checkpoint, indicating potential remaining performance gaps or the need for more training iterations to fully match state-of-the-art models.
Area footprint scaling for large models
Accommodating larger models requires sub-tiling to stack multiple arrays, which leads to additional area footprint scaling linearly with the sliding window dimension and additional latency due to digital adders.