Giving Voice to Classical Stories with Deepgram’s Aura 2
Turn classical stories into dramas with Deepgram and OpenAI
Throughout history, humans have crafted incredible works of art, but few are as enduring as classical literature. From Mary Shelley’s Frankenstein and Agatha Christie’s Murder on the Orient Express to Bram Stoker’s Dracula and the iconic Sherlock Holmes adventures by Sir Arthur Conan Doyle, these stories have captivated generations with their atmosphere, mystery, and unforgettable characters.
But what if we could do more than just read these stories? What if we could hear them, performed by distinct, expressive voices that make each character feel vividly real?
You might ask, “Isn’t that what audiobooks are for?” Not quite. Traditional audiobooks typically rely on a single narrator to voice every character. What we're building is something far more immersive: a fully dramatized experience powered by AI, where each character speaks in their unique voice, complete with emotional nuance, pacing, and personality.
The audio sample above offers a glimpse into what this sounds like. It features a scene from The Sign of Four, where Sherlock Holmes delivers his famous line:
When you have eliminated the impossible, whatever remains, however improbable, must be the truth.
In this example, Holmes and Watson are voiced separately using Deepgram’s Aura 2 models, each assigned a distinct voice to bring their dialogue to life.
In this article, you’ll learn how to:
Extract natural-sounding dialogue from classic prose using a large language model
Assign unique Aura 2 voices to each character in a scene
Generate and stitch together audio clips to produce a smooth, dramatized listening experience
Let’s bring the classics to life, one voice at a time.
Meet the Voice Cast
At the heart of this project are the voices. We need to use a text-to-speech (TTS) model with a wide range of voices. The TTS model we will utilize in the tutorial is DeepGram Aura 2. While Aura 2 is a TTS model built for Enterprise and not for Entertainment, I thought it would be cool to see how it would perform in this project since its voices were built for clarity in mind.
A diverse cast of voices
Aura 2 also offers a wide range of voices, these voices are in both male and female, and various accents such as American and British. And their voices were trained with various characteristics in mind, most voices can sound confident, friendly, cheerful, professional, and polite. Their voices were also trained for specific use cases, such as Casual Chat, Interactive Voice Response (IVR), Advertising, and Storytelling.
My interest was mostly in their Storytelling voices, which fit perfectly with the project. Let’s explore a couple of the voices that DeepGram Aura 2 provides.
Draco
Draco goes by the model name aura-2-draco-en. It is a masculine voice with a British accent. The voice is ideal for storytelling.
Orpheus
Orpheus goes by the model name aura-2-orpheus-en. It is a masculine voice with an American accent. The voice sounds professional and confident. It is also ideal for storytelling.
Athena
Athena is a feminine voice with a British accent. The voice is calm and smooth, and also perfect for storytelling. The voice goes by the model name aura-2-athena-en.
Janus
Janus is a feminine voice with an American accent. The voice is southern and smooth, perfect for characters that are trustworthy. The voice goes by the model name aura-2-janus-en.
Voice generation with the Deepgram SDK
To generate these voices, we use the Deepgram Python SDK, which provides a straightforward interface to call the speak API and save high-quality audio clips. The SDK supports specifying models, output formats, and even streaming use cases.
Here’s a quick demo that shows how to generate a voice clip:
from deepgram import DeepgramClient, SpeakOptions
# Your Deepgram API key should be set in the environment
dg = DeepgramClient()
SPEAK_TEXT = {"text": "I’ve found it! I’ve found it!"}
FILENAME = "audio.mp3"
options = SpeakOptions(
model="aura-2-draco-en",
encoding="mp3"
)
response = dg.speak.rest.v("1").save(FILENAME, SPEAK_TEXT, options)
print("Voice clip saved:", FILENAME)
With just a few lines of code, you can give your characters a literal voice.
In the next section, we’ll show how to use a large language model to convert classical prose into dialogue, setting the stage for our cast to perform.
Large Language Models as Scriptwriters
Classical prose is beautiful, but it isn’t written for voices. That’s why we need to transform it. In this section, we’ll use a large language model to adapt a scene from a book into a character-driven dialogue, perfect for voice generation.
To do that, we’ll use the OpenAI Python SDK and a custom prompt. Our script takes in two files, an instruction file with formatting rules and a prose file from a classic story, and outputs a dramatized voice-over script, formatted for text-to-speech.
Here’s a high-level breakdown of what the script does:
Load the instruction file, which tells the LLM how to convert prose into dialogue
Load the prose file, which contains the raw text we want to transform
Send both to the OpenAI API to generate a dramatized version of the scene
Save the output to a file we’ll use later to generate voices
Here’s the code structure we’ll walk through:
import sys
from openai import OpenAI
def load_file(filepath: str) -> str:
"""Load and return the content of a file."""
pass
def save_file(filepath: str, content: str) -> None:
"""Save the given content to a file."""
pass
def generate_script(client: OpenAI, instruction: str, prose: str) -> str:
"""Generate a script using the OpenAI API based on instruction and prose."""
pass
def main():
if len(sys.argv) < 2:
print("Usage: python script_writer.py <prose_file.txt>")
sys.exit(1)
prose_path = sys.argv[1]
instruction = load_file("instruction.txt")
prose = load_file(prose_path)
print("📘 Instruction and prose loaded. Generating script...")
client = OpenAI()
script = generate_script(client, instruction, prose)
save_file("script.txt", script)
if __name__ == "__main__":
main()
Let’s break down what each part does.
Load the Input Files
We start with a utility function that reads a file from disk. This file could be the instruction file or the prose text.
def load_file(filepath: str) -> str:
"""Load and return the content of a file."""
try:
with open(filepath, "r", encoding="utf-8") as file:
return file.read()
except FileNotFoundError:
raise FileNotFoundError(f"File not found: {filepath}")
except Exception as e:
raise RuntimeError(f"Error reading {filepath}: {e}")
This function reads a file’s contents as a string. If the file isn’t found or can’t be read, it raises an error.
Save the Output Script
Once we have the language model's output, we want to save it to a file for later use, such as generating voices.
def save_file(filepath: str, content: str) -> None:
"""Save the given content to a file."""
try:
with open(filepath, "w", encoding="utf-8") as file:
file.write(content)
print(f"✅ Output saved to {filepath}")
except Exception as e:
raise RuntimeError(f"Error writing to {filepath}: {e}")
This function writes the final script to disk and gives us a confirmation when successful.
Generate the Dramatized Script
This is the heart of the project: we use OpenAI’s API to turn prose into dialogue using a detailed prompt.
def generate_script(client: OpenAI, instruction: str, prose: str) -> str:
"""Generate a script using the OpenAI API based on instruction and prose."""
try:
response = client.responses.create(
model="gpt-4.1",
instructions=instruction,
input=f"Generate a script based on the following prose:\n\n{prose}"
)
return response.output_text
except Exception as e:
raise RuntimeError(f"Error generating script: {e}")
This function sends a structured request to OpenAI’s API using:
The model (
gpt-4.1
)A detailed instruction prompt (loaded from a file)
The prose we want to dramatize
The API responds with formatted character dialogue, which we return as a string.
Putting It All Together
Finally, here’s the complete code:
import sys
from openai import OpenAI
def load_file(filepath: str) -> str:
"""Load and return the content of a file."""
try:
with open(filepath, "r", encoding="utf-8") as file:
return file.read()
except FileNotFoundError:
raise FileNotFoundError(f"File not found: {filepath}")
except Exception as e:
raise RuntimeError(f"Error reading {filepath}: {e}")
def save_file(filepath: str, content: str) -> None:
"""Save the given content to a file."""
try:
with open(filepath, "w", encoding="utf-8") as file:
file.write(content)
print(f"✅ Output saved to {filepath}")
except Exception as e:
raise RuntimeError(f"Error writing to {filepath}: {e}")
def generate_script(client: OpenAI, instruction: str, prose: str) -> str:
"""Generate a script using the OpenAI API based on instruction and prose."""
try:
response = client.responses.create(
model="gpt-4.1",
instructions=instruction,
input=f"Generate a script based on the following prose:\n\n{prose}"
)
return response.output_text
except Exception as e:
raise RuntimeError(f"Error generating script: {e}")
def main():
if len(sys.argv) < 2:
print("Usage: python script_writer.py <prose_file.txt>")
sys.exit(1)
prose_path = sys.argv[1]
instruction = load_file("instruction.txt")
prose = load_file(prose_path)
print("📘 Instruction and prose loaded. Generating script...")
client = OpenAI()
script = generate_script(client, instruction, prose)
save_file("script.txt", script)
if __name__ == "__main__":
main()
Save the code in a file named script_writer.py
. Make sure your OpenAI API key is stored in the environment variable OPENAI_API_KEY.
Then, provide a text file containing the prose as input:
python script_text.py prose.txt
This will generate a dialogue script based on the provided prose.
The Prompt That Makes It Work
Let’s not forget our instruction prompt. It teaches the model how to rewrite prose into a dramatized voice-over. Here’s the version we use:
You are an AI scriptwriter specializing in converting prose into a dramatized format suitable for text-to-speech. Your task is to transform a given prose excerpt into a dialogue between characters who are describing their own experiences and observations as depicted in the text.
The output should strictly adhere to the following format for each line of dialogue:
[Character Name]: What the character is saying.\n
Only include the spoken dialogue (voice-over lines) in your response. Do not include any scene descriptions, actions, or narrative text outside of the character dialogue.
Identify the key characters present or implied within the provided prose excerpt. Attribute the dialogue in your dramatization to these characters in a way that logically reflects their roles and perspectives within the narrative. Maintain the tone and implied emotions of the original prose within the dialogue.
Follow these constraints precisely:
1. Dialogue Format: Every line must start with "[Character Name]:" followed by the spoken words and end with a newline character.
2. Voice-Over Only: Exclude any non-dialogue elements.
3. Character Identification: Identify and use appropriate character names based on the provided prose.
4. Content Accuracy: The dialogue must faithfully represent the events and viewpoints of the original prose, attributed to the correct characters.
5. Tone Preservation: Maintain the original tone and implied emotions within the dialogue.
Save this in the file instruction.txt
.
With this setup, the language model acts like a screenwriter, adapting timeless literary text into audio-ready scripts, one line of dialogue at a time.
Giving Voice to a Classic
To showcase this pipeline in action, we’ll be giving voice to the opening chapter of A Study in Scarlet, the very first Sherlock Holmes story written by Sir Arthur Conan Doyle and now in the public domain.
This chapter introduces us to three important characters:
Dr. John Watson, the wounded army doctor and narrator
Stamford, Watson’s old acquaintance
Sherlock Holmes, the brilliant but enigmatic detective
To bring their voices to life, we’ll assign each one a distinct AI-generated voice using Deepgram’s Aura 2 models:
Dr. Watson will be voiced by
aura-2-draco-en,
which offers a warm, grounded tone suitable for a reflective narratorSherlock Holmes will be voiced by
aura-2-orpheus-en
, which has a sharper, more analytical tone, perfect for Holmes’ precise and eccentric manner.aura-2-pluto-en
, a friendly and conversational voice will voice Stamford
Here’s how the entire process works, from parsed text to a seamless, dramatized audio file.
Step 1: Parse the Script
We start by parsing the script file generated in the previous step. Each line is expected to follow this format:
[Character Name]: Spoken dialogue here.
This function reads the file and breaks it into a list of (speaker, text) pairs.
def parse_script(filepath: str) -> List[Tuple[str, str]]:
pattern = re.compile(r'^\[(.*?)\]: (.*)')
lines = []
with open(filepath, "r", encoding="utf-8") as file:
for line in file:
match = pattern.match(line.strip())
if match:
speaker, text = match.groups()
lines.append((speaker.strip(), text.strip()))
return lines
Step 2: Assign Voices to Characters
This function asks the user to assign a Deepgram Aura 2 voice to each character in the script.
def prompt_for_voice_map(speakers: List[str]) -> Dict[str, str]:
"""Prompts the user to assign a voice to each character."""
print("\n🎭 Assign voices to each character.")
print("Refer to Deepgram's Aura 2 models (e.g., aura-2-orpheus-en, aura-2-saturn-en, etc.)")
voice_map = {}
for speaker in speakers:
voice = input(f"🗣️ Assign a voice for '{speaker}': ").strip()
voice_map[speaker] = voice
return voice_map
Step 3: Generate Speech for Each Line
For each line of dialogue, this function uses Deepgram’s speak.rest.v("1")
endpoint to generate an audio file. The voice used for each character comes from the user-defined voice_map
.
def generate_speech(index: int, speaker: str, text: str, voice_map: Dict[str, str]) -> str:
"""Generates speech audio for the given speaker and line."""
voice = voice_map.get(speaker)
if not voice:
raise ValueError(f"No voice assigned for speaker: {speaker}")
speak_options = SpeakOptions(model=voice)
output_file = os.path.join(OUTPUT_DIR, f"{index:03d}_{speaker.replace(' ', '_')}.mp3")
SPEAK_TEXT = {"text": text}
print(f"🔊 Generating speech for [{speaker}]: {text[:40]}...")
response = deepgram.speak.rest.v("1").save(output_file, SPEAK_TEXT, speak_options)
This creates a sequence of small audio clips, one for each line of character dialogue.
This creates a sequence of small audio clips, one for each line of character dialogue.
Step 4: Stitch Everything Together
Once all the voice clips are generated, we concatenate them into one continuous MP3 using ffmpeg
. This produces a single, seamless audio file with voices alternating as the characters speak.
def concatenate_audio_files(audio_files: List[str], output_path: str):
"""Concatenates audio files into one using ffmpeg."""
list_file = os.path.join(OUTPUT_DIR, "file_list.txt")
with open(list_file, "w", encoding="utf-8") as f:
for file in audio_files:
f.write(f"file '{os.path.abspath(file)}'\n")
print("🎧 Concatenating audio files...")
subprocess.run([
"ffmpeg", "-f", "concat", "-safe", "0", "-i", list_file,
"-c", "copy", output_path
], check=True)
print(f"✅ Final audio saved as: {output_path}")
Full Workflow
All of this is tied together in the main()
function, which runs the pipeline from start to finish:
import os
import re
import sys
import subprocess
from typing import List, Tuple, Dict
from deepgram import DeepgramClient, SpeakOptions
# Set up Deepgram client
deepgram = DeepgramClient()
# Constants
OUTPUT_DIR = "speech_outputs"
FINAL_AUDIO = "final_output.mp3"
# Create output directory
os.makedirs(OUTPUT_DIR, exist_ok=True)
def parse_script(filepath: str) -> List[Tuple[str, str]]:
"""Parses the script and returns a list of (speaker, line) tuples."""
pattern = re.compile(r'^\[(.*?)\]: (.*)')
lines = []
with open(filepath, "r", encoding="utf-8") as file:
for line in file:
match = pattern.match(line.strip())
if match:
speaker, text = match.groups()
lines.append((speaker.strip(), text.strip()))
return lines
def extract_unique_speakers(lines: List[Tuple[str, str]]) -> List[str]:
"""Extracts a list of unique speakers from the script."""
return sorted(set(speaker for speaker, _ in lines))
def prompt_for_voice_map(speakers: List[str]) -> Dict[str, str]:
"""Prompts the user to assign a voice to each character."""
print("\n🎭 Assign voices to each character.")
print("Refer to Deepgram's Aura 2 models (e.g., aura-2-orpheus-en, aura-2-saturn-en, etc.)")
voice_map = {}
for speaker in speakers:
voice = input(f"🗣️ Assign a voice for '{speaker}': ").strip()
voice_map[speaker] = voice
return voice_map
def generate_speech(index: int, speaker: str, text: str, voice_map: Dict[str, str]) -> str:
"""Generates speech audio for the given speaker and line."""
voice = voice_map.get(speaker)
if not voice:
raise ValueError(f"No voice assigned for speaker: {speaker}")
speak_options = SpeakOptions(model=voice)
output_file = os.path.join(OUTPUT_DIR, f"{index:03d}_{speaker.replace(' ', '_')}.mp3")
SPEAK_TEXT = {"text": text}
print(f"🔊 Generating speech for [{speaker}]: {text[:40]}...")
response = deepgram.speak.rest.v("1").save(output_file, SPEAK_TEXT, speak_options)
return output_file
def concatenate_audio_files(audio_files: List[str], output_path: str):
"""Concatenates audio files into one using ffmpeg."""
list_file = os.path.join(OUTPUT_DIR, "file_list.txt")
with open(list_file, "w", encoding="utf-8") as f:
for file in audio_files:
f.write(f"file '{os.path.abspath(file)}'\n")
print("🎧 Concatenating audio files...")
subprocess.run([
"ffmpeg", "-f", "concat", "-safe", "0", "-i", list_file,
"-c", "copy", output_path
], check=True)
print(f"✅ Final audio saved as: {output_path}")
def main():
if len(sys.argv) < 2:
print("Usage: python voice_generator.py <script_file.txt>")
sys.exit(1)
script_path = sys.argv[1]
lines = parse_script(script_path)
speakers = extract_unique_speakers(lines)
voice_map = prompt_for_voice_map(speakers)
audio_files = []
for i, (speaker, text) in enumerate(lines):
audio_path = generate_speech(i, speaker, text, voice_map)
audio_files.append(audio_path)
concatenate_audio_files(audio_files, FINAL_AUDIO)
if __name__ == "__main__":
main()
Once everything is set up, you can run the program with the following command:
python main.py script.txt
The argument passed (script.txt
) is the dialogue script generated by the language model. When the script runs, you’ll be prompted to assign a voice to each character.
Here’s the generated audio:
You can find the full source code for this project here.
Conclusion: Bringing Stories to Life, One Voice at a Time
Classical stories have stood the test of time because of their memorable characters, vivid worlds, and emotional depth. By combining large language models with Deepgram’s Aura 2, we’ve taken a step beyond the page, transforming timeless prose into rich, dramatized audio experiences. Each character now has a voice, a rhythm, and a personality. And every scene feels like it’s being performed, not just read.
This project isn’t just about nostalgia or novelty. It shows how accessible tools and AI models can breathe new life into public domain literature. Whether you’re a developer, a creative, or simply a fan of great storytelling, you now have everything you need to build your own cast, adapt your own scenes, and give voice to the characters you love.
With a little code and a lot of imagination, these classic tales can speak again,loud and clear.