daoudclarke.net
|
ksl
|
|
A Google research paper found that simply repeating the same prompt improves LLM performance on non-reasoning tasks, and Daoud Clarke’s analysis of why that works is more interesting than the trick itself. His argument: the gains reveal a fundamental architectural limitation in how causal attention handles prompts, since tokens early in the input can’t attend to tokens that come later. Bidirectional attention within the prompt segment – as explored in Katz et al.’s work on segment-based attention masking – would likely eliminate the need for repetition entirely. The fact that this kind of low-level optimization still yields measurable gains after years of scaling and RLHF tuning is a reminder that basic transformer architecture constraints remain underexplored relative to the effort poured into training data and alignment.
