We propose a context-aware inference framework that first generates a prompt-grounded context snippet capturing user intent, ambiguity, and potential risks, then conditions any LLM on this enriched input. A reinforcement-learned context generator is trained in an autoencoder-like setup to maximize safety and prompt reconstruction while discouraging trivial copying. This improves refusal accuracy on harmful prompts and preserves helpfulness on benign requests across multiple base models and safety benchmarks.
Disentangles context generation from response decoding to avoid co-adaptation of reasoning traces.
Reward combines prompt reconstruction and safety signals for the context.
Contexts generated by a 3B model improve any types of LLMs.
Current LLMs face two critical failure modes in safety-critical applications:
Key Insight: User prompts are often ambiguous or under-specified. Subtle contextual cues—such as user intent, prior knowledge, and potential risks—strongly influence what constitutes an appropriate response. Existing models generate immediate responses without considering these broader contextual factors.
Figure 1: Comparison of inference paradigms. Traditional models (left) and reasoning models (middle) vs. our context-aware inference (right).
We propose context-aware inference that disentangles the response model from a dedicated context generator (ContextLens). Our framework extracts contextual information from user prompts and uses it to guide safer response generation.
Figure 2: ContextLens training framework. The context generator produces context snippets that help a frozen decoder model reconstruct the prompt and generate safe responses.
We explore three methods for generating context snippets, from zero-shot prompting to full RL training:
Leverages intermediate reasoning steps from models trained with chain-of-thought reasoning on safety data. We extract thinking traces from:
Limitation: Not explicitly optimized for transferability or prompt reconstruction
Uses a fixed prompt template with off-the-shelf LLMs to elicit context without any training. The template instructs the model to output five sections:
Advantage: Annotation-free, requires only a single forward pass, provides baseline contextualization
Our proposed method directly optimizes a lightweight context generator with GRPO to produce:
Result: 3B model that produces context benefiting both small and large foundation models
Figure 3: Comparison of different context generation approaches and their effectiveness on different base models.
ContextLens improves safety across multiple foundation models and benchmarks. We evaluate on:
Table 1: Performance comparison across different base models (Qwen-1.5B, Qwen-3B, Llama-8B, GPT-4) with and without context snippets. Results show consistent improvements across all models and benchmarks.
We evaluate the quality of generated context snippets through multiple dimensions:
Figure 4: Context quality measured directly by LLM-as-a-Judge in three categories: Coherence, Relevance with prompt and context snippets, and overall quality of context snippets.
Figure 5: Evaluation of context informativeness. (a) Comparison of different context types on the prompt-detection task when the context snippet is given using the Llama-3-Guard-8B model. (b) Comparison of the monitorability of context snippets, assessing whether they contain information that can influence the model’s predictive behavior.