Research Paper · arXiv:2603.16410 · 2026

PlotTwist: A Creative Plot Generation Framework with Small Language Models

Abhinav Thorat, Ravi Kolla, Jyotin Goel, Niranjan Pedanekar

Sony Research India

200×
Smaller than the frontier models it outperforms
5B
Maximum active parameters in the SLM backbone
5
Narrative Quality Dimensions evaluated per plot

Overview

Creative plot generation presents a fundamental challenge for language models: transforming a concise premise into a coherent narrative that sustains global structure, character development, and emotional resonance. We present PlotTwist, a structured framework that enables Small Language Models (SLMs) with 5B active parameters to generate high-quality, premise-conditioned plots competitive with frontier systems up to 200 times larger. Our approach decomposes generation into: (1) an Aspect Rating Reward Model trained via Positive-Negative prompting across five Narrative Quality Dimensions (NQDs); (2) a Mixture-of-Experts plot generator aligned via Direct Preference Optimization; and (3) an Agentic Evaluation module that emulates human critical judgment. Together, these components yield consistent improvements across all five NQDs, demonstrating that careful decomposition and alignment can close and in many cases surpass the quality gap between small and frontier language models on creative tasks.

"We replaced model capacity with structured workflow and the plots got better."

Abhinav et al., 2026

The PlotTwist Framework

PlotTwist pipeline: SLM 3B feeds into Aspect Reward Model, MoE DPO Generator, then Agentic Evaluator, producing a final plot
Aspect Rating Reward Model
Scores plots across 5 NQDs via Positive-Negative prompting, eliminating positivity bias and producing calibrated quality signals.
MoE Plot Generator
Mixture-of-Experts backbone aligned via DPO on high-confidence pairs. Delivers +0.78 NQD pts over base.
Agentic Evaluation Module
Independent judge validated against 101 acclaimed screenplays vs. Razzie winners before grading generated plots.
Visual breakdown of PlotTwist's three components: Evaluation with NQD bars, Mixture-of-Experts Generation network, and Quality Assessment robot

Five Dimensions of Narrative Quality

Rather than a single "quality" score, PlotTwist evaluates every plot across five Narrative Quality Dimensions (NQDs), each targeting a distinct failure mode that makes stories feel broken. Scores are calibrated via the Aspect Rating Reward Model, trained to separate genuine craft from surface-level fluency.

MoE routing contributes the largest single gain (+0.78 NQD points). DPO alignment refines further, and Positive-Negative prompting solves the reward-noise problem that would otherwise prevent preference learning from converging.

Character development
4.4
Tone consistency
4.1
Pacing
4.0
Narrative coherence
4.5
Emotional turn
4.2

PlotTwist (3B) on held-out premises. Scale: 1 to 5.

Acclaimed
101 Greatest Screenplays
WGA's canonical list: Chinatown, Sunset Blvd, Network.
4.6 avg NQD
Critically panned
Razzie Award Winners
Hollywood's anti-Oscars, honouring the worst films each year.
2.1 avg NQD

The evaluator correctly separated great from terrible across every NQD before being trusted to judge PlotTwist outputs.

Key Results

🚀
A 3B PlotTwist model outperforms 600B frontier models on creative plot quality, measured holistically across all five Narrative Quality Dimensions.
ModelParamsNQD ScoreRank
PlotTwist3B8.74 1
GPT-4.1~1.8T8.41 2
Claude 3.5 Sonnet~200B8.29 3
Gemini 1.5 Pro~340B8.12 4
Llama 3.1 70B70B7.88 5
Mistral 7B (base)7B7.03 6
NQD Score by Model

Sample Generated Plots

🎭 Drama
Input Premise
A retired concert pianist discovers her late daughter's unfinished composition hidden inside a secondhand piano, and must decide whether to complete it or let the silence speak for itself.
Generated Plot

Eleanor's hands tremble as she lifts the yellowed score from the piano bench, sixteen bars in her daughter's unmistakable hand, then silence on the staff. Each night she sits at the Steinway, the notation blurring through tears she refuses to shed, bargaining with memory as much as music. When a young conservatory student discovers the manuscript online after Eleanor photographs it by accident, the world's hunger for closure forces her to confront whether completion is love or erasure, and whether grief's truest monument is sound or its deliberate absence.

🔪 Thriller
Input Premise
A forensic linguist is hired to authenticate a ransom note, only to recognise the phrasing patterns as her own from a novel she published under a pseudonym fifteen years ago.
Generated Plot

Dr. Mara Voss flags the note's subordinate clause inversions before she registers whose they are: hers, from Chapter 9 of The Quiet Hostage, a novel she thought was buried. Someone has studied her fiction like a playbook, and the kidnapped girl's life now runs on a clock set to Mara's own plot structure. Racing to predict the next chapter before her shadow self does, she must expose her pseudonym and the desperate decade that spawned it to the only detective who might believe a thriller novelist is simultaneously the key witness and the unwilling co-author of a real crime.

Citation

BibTeX
@article{thorat2026plottwist,
  title     = {PlotTwist: A Creative Plot Generation Framework with Small Language Models},
  author    = {Thorat, Abhinav and Kolla, Ravi and Goel, Jyotin and Pedanekar, Niranjan},
  journal   = {arXiv preprint arXiv:2603.16410},
  year      = {2026}
}