Research Paper · arXiv:2603.16410 · 2026

PlotTwist: A Creative Plot Generation Framework with Small Language Models

Abhinav Thorat, Ravi Kolla, Jyotin Goel, Niranjan Pedanekar

Sony Research India

📄 Paper

200×

Smaller than the frontier models it outperforms

Maximum active parameters in the SLM backbone

Narrative Quality Dimensions evaluated per plot

Abstract

Overview

Creative plot generation presents a fundamental challenge for language models: transforming a concise premise into a coherent narrative that sustains global structure, character development, and emotional resonance. We present PlotTwist, a structured framework that enables Small Language Models (SLMs) with 5B active parameters to generate high-quality, premise-conditioned plots competitive with frontier systems up to 200 times larger. Our approach decomposes generation into: (1) an Aspect Rating Reward Model trained via Positive-Negative prompting across five Narrative Quality Dimensions (NQDs); (2) a Mixture-of-Experts plot generator aligned via Direct Preference Optimization; and (3) an Agentic Evaluation module that emulates human critical judgment. Together, these components yield consistent improvements across all five NQDs, demonstrating that careful decomposition and alignment can close and in many cases surpass the quality gap between small and frontier language models on creative tasks.

"We replaced model capacity with structured workflow and the plots got better."

Abhinav et al., 2026

Method

The PlotTwist Framework

PlotTwist pipeline: SLM 3B feeds into Aspect Reward Model, MoE DPO Generator, then Agentic Evaluator, producing a final plot

①

Aspect Rating Reward Model

Scores plots across 5 NQDs via Positive-Negative prompting, eliminating positivity bias and producing calibrated quality signals.

②

MoE Plot Generator

Mixture-of-Experts backbone aligned via DPO on high-confidence pairs. Delivers +0.78 NQD pts over base.

③

Agentic Evaluation Module

Independent judge validated against 101 acclaimed screenplays vs. Razzie winners before grading generated plots.

Evaluation

Five Dimensions of Narrative Quality

Rather than a single "quality" score, PlotTwist evaluates every plot across five Narrative Quality Dimensions (NQDs), each targeting a distinct failure mode that makes stories feel broken. Scores are calibrated via the Aspect Rating Reward Model, trained to separate genuine craft from surface-level fluency.

MoE routing contributes the largest single gain (+0.78 NQD points). DPO alignment refines further, and Positive-Negative prompting solves the reward-noise problem that would otherwise prevent preference learning from converging.

Character development

4.4

Tone consistency

4.1

Pacing

4.0

Narrative coherence

4.5

Emotional turn

4.2

PlotTwist (3B) on held-out premises. Scale: 1 to 5.

Acclaimed

101 Greatest Screenplays

WGA's canonical list: Chinatown, Sunset Blvd, Network.

4.6 avg NQD

Critically panned

Razzie Award Winners

Hollywood's anti-Oscars, honouring the worst films each year.

2.1 avg NQD

The evaluator correctly separated great from terrible across every NQD before being trusted to judge PlotTwist outputs.

Experiments

Key Results

Model	Params	NQD Score	Rank
PlotTwist	3B	8.74	1
GPT-4.1	~1.8T	8.41	2
Claude 3.5 Sonnet	~200B	8.29	3
Gemini 1.5 Pro	~340B	8.12	4
Llama 3.1 70B	70B	7.88	5
Mistral 7B (base)	7B	7.03	6

NQD Score by Model

Qualitative Analysis

Sample Generated Plots

🎭 Drama

Input Premise

A retired concert pianist discovers her late daughter's unfinished composition hidden inside a secondhand piano, and must decide whether to complete it or let the silence speak for itself.

Generated Plot

Eleanor's hands tremble as she lifts the yellowed score from the piano bench, sixteen bars in her daughter's unmistakable hand, then silence on the staff. Each night she sits at the Steinway, the notation blurring through tears she refuses to shed, bargaining with memory as much as music. When a young conservatory student discovers the manuscript online after Eleanor photographs it by accident, the world's hunger for closure forces her to confront whether completion is love or erasure, and whether grief's truest monument is sound or its deliberate absence.

🔪 Thriller

Input Premise

A forensic linguist is hired to authenticate a ransom note, only to recognise the phrasing patterns as her own from a novel she published under a pseudonym fifteen years ago.

Generated Plot

Dr. Mara Voss flags the note's subordinate clause inversions before she registers whose they are: hers, from Chapter 9 of The Quiet Hostage, a novel she thought was buried. Someone has studied her fiction like a playbook, and the kidnapped girl's life now runs on a clock set to Mara's own plot structure. Racing to predict the next chapter before her shadow self does, she must expose her pseudonym and the desperate decade that spawned it to the only detective who might believe a thriller novelist is simultaneously the key witness and the unwilling co-author of a real crime.

Reference

Citation

BibTeX

@article{thorat2026plottwist,
  title     = {PlotTwist: A Creative Plot Generation Framework with Small Language Models},
  author    = {Thorat, Abhinav and Kolla, Ravi and Goel, Jyotin and Pedanekar, Niranjan},
  journal   = {arXiv preprint arXiv:2603.16410},
  year      = {2026}
}