All plans, even the free plan, come with access to the API (Application Programming Interface), allowing you to use our technology programmatically and its full range of features through code. An API is the fundamental component that applications using ElevenLabs or any other service rely on.

In this section, we will provide a fairly concise overview of how the API can be used to make requests to the endpoints to fetch the voices in your voice library and even generate our first text-to-speech.

Something that might be good to note is that the website is just a frontend for the API. So if you use Speech Synthesis or make calls directly to the API with code, there should be no difference in quality or output with the same settings. However, the AI is still non-deterministic so there might be a difference in the delivery of the generated output.

The API and the website also share the same pool of quota when generating audio. So, if you have 10,000 characters in your account and you generate audio, the characters for that generation will be deducted from your account regardless of whether you use the website or the API. ​ https://docs.elevenlabs.io/api-reference/ https://api.elevenlabs.io/docs ​ Given the extensive nature of the API and its virtually limitless possibilities, we will only do a very quick overview and showcase a few examples to at least get you up and running.

The Text-To-Speech (TTS) endpoint transforms text into speech in a given voice. The input consists of text, voice, and voice settings with an option to specify model.

Fetching the voice_id

Before we can generate anything, we need to get the voice_id for the voice we want to use.

The easiest way to get the voice_id is via the website. You can find the article on how to do that here.

A better approach, especially when dealing with the API, is to retrieve the voices accessible to your account using the GET /v1/voices endpoint. If you do not receive the expected voices, you will either encounter an error or just see the pre-made voices, which is most likely indicating that you are not passing your API key correctly.

You can find more information in the Get Voices endpoint documentation. This endpoint is crucial if your application requires any form of flexibility in the voices you want to use via the API, as it provides information about each voice, including the voice_id for each of them. This voice_id is necessary when querying the TTS endpoint.

The API offers a wide range of capabilities, such as adding new voices, but we won’t discuss that here. To gain a better understanding of how to do that, you will need to read through the documentation.

You can use the Python code below to print a list of the voices in your account, along with their names and associated voice_ids, which will be required to use the voices via the API. Simply using the name won’t be sufficient.

# The 'requests' and 'json' libraries are imported. 
# 'requests' is used to send HTTP requests, while 'json' is used for parsing the JSON data that we receive from the API.
import requests
import json

# An API key is defined here. You'd normally get this from the service you're accessing. It's a form of authentication.
XI_API_KEY = "<xi-api-key>"

# This is the URL for the API endpoint we'll be making a GET request to.
url = "https://api.elevenlabs.io/v1/voices"

# Here, headers for the HTTP request are being set up. 
# Headers provide metadata about the request. In this case, we're specifying the content type and including our API key for authentication.
headers = {
  "Accept": "application/json",
  "xi-api-key": XI_API_KEY,
  "Content-Type": "application/json"
}

# A GET request is sent to the API endpoint. The URL and the headers are passed into the request.
response = requests.get(url, headers=headers)

# The JSON response from the API is parsed using the built-in .json() method from the 'requests' library. 
# This transforms the JSON data into a Python dictionary for further processing.
data = response.json()

# A loop is created to iterate over each 'voice' in the 'voices' list from the parsed data. 
# The 'voices' list consists of dictionaries, each representing a unique voice provided by the API.
for voice in data['voices']:
  # For each 'voice', the 'name' and 'voice_id' are printed out. 
  # These keys in the voice dictionary contain values that provide information about the specific voice.
  print(f"{voice['name']}; {voice['voice_id']}")

Text-to-speech

Once you have you’ve gotten the voice_id from the voices endpoint, you are ready to query the Text-to-speech endpoint.

First, you need to decide whether you want to use the streaming response or not. In general, we recommend streaming unless your client doesn’t support it. You can find more information about the streaming endpoint in our documentation.

Second, you need to choose your voice settings. We recommend using the default voice settings before experimenting with the expressivity and similarity settings. You can find more information about the settings in our documentation.

After generating something, we store each audio you generate together with other metadata in your personal history. You can retrieve all your history items via the /v1/history endpoint. We also provide endpoints for retrieving audio, deletion, and download of history items. You can find more information about how to use the history endpoint in our documentation. The only thing that is not saved is websocket streaming queries to save latency.

In the basic Python example below, we send a request to the text-to-speech endpoint and receive a stream of audio bytes in return. All you need to do is set your XI API Key first and try it out! If you cannot find it, please follow this article here.

You will need to replace the constants by inserting your <xi-api-key>, setting the correct <voice-id>, and entering the <text> you want the AI to convert to speech. You can change the settings stability and similarity_boost if you want, but they are set to good default values.

Of course, the code below is only to showcase the very bare minimum to get the system up and running, and you can then use more advanced code to create a nice GUI with sliders for the variables, a list for the voices, and anything else that you want to add.

# Import necessary libraries
import requests  # Used for making HTTP requests
import json  # Used for working with JSON data

# Define constants for the script
CHUNK_SIZE = 1024  # Size of chunks to read/write at a time
XI_API_KEY = "<xi-api-key>"  # Your API key for authentication
VOICE_ID = "<voice-id>"  # ID of the voice model to use
TEXT_TO_SPEAK = "<text>"  # Text you want to convert to speech
OUTPUT_PATH = "output.mp3"  # Path to save the output audio file

# Construct the URL for the Text-to-Speech API request
tts_url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/stream"

# Set up headers for the API request, including the API key for authentication
headers = {
    "Accept": "application/json",
    "xi-api-key": XI_API_KEY
}

# Set up the data payload for the API request, including the text and voice settings
data = {
    "text": TEXT_TO_SPEAK,
    "model_id": "eleven_multilingual_v2",
    "voice_settings": {
        "stability": 0.5,
        "similarity_boost": 0.8,
        "style": 0.0,
        "use_speaker_boost": True
    }
}

# Make the POST request to the TTS API with headers and data, enabling streaming response
response = requests.post(tts_url, headers=headers, json=data, stream=True)

# Check if the request was successful
if response.ok:
    # Open the output file in write-binary mode
    with open(OUTPUT_PATH, "wb") as f:
        # Read the response in chunks and write to the file
        for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
            f.write(chunk)
    # Inform the user of success
    print("Audio stream saved successfully.")
else:
    # Print the error message if the request was not successful
    print(response.text)

Speech-to-Speech

Generating speech-to-speech involves a similar process to text-to-speech, but with some adjustments in the API parameters. Instead of providing text when calling the API, you provide the path to an audio file that you would like to convert from one voice to another. Here’s a modified version of your code to illustrate how to generate speech-to-speech using the given API:

# Import necessary libraries
import requests  # Used for making HTTP requests
import json  # Used for working with JSON data

# Define constants for the script
CHUNK_SIZE = 1024  # Size of chunks to read/write at a time
XI_API_KEY = "<xi-api-key>"  # Your API key for authentication
VOICE_ID = "<voice-id>"  # ID of the voice model to use
AUDIO_FILE_PATH = "<path>"  # Path to the input audio file
OUTPUT_PATH = "output.mp3"  # Path to save the output audio file

# Construct the URL for the Speech-to-Speech API request
sts_url = f"https://api.elevenlabs.io/v1/speech-to-speech/{VOICE_ID}/stream"

# Set up headers for the API request, including the API key for authentication
headers = {
    "Accept": "application/json",
    "xi-api-key": XI_API_KEY
}

# Set up the data payload for the API request, including model ID and voice settings
# Note: voice settings are converted to a JSON string
data = {
    "model_id": "eleven_english_sts_v2",
    "voice_settings": json.dumps({
        "stability": 0.5,
        "similarity_boost": 0.8,
        "style": 0.0,
        "use_speaker_boost": True
    })
}

# Set up the files to send with the request, including the input audio file
files = {
    "audio": open(AUDIO_FILE_PATH, "rb")
}

# Make the POST request to the STS API with headers, data, and files, enabling streaming response
response = requests.post(sts_url, headers=headers, data=data, files=files, stream=True)

# Check if the request was successful
if response.ok:
    # Open the output file in write-binary mode
    with open(OUTPUT_PATH, "wb") as f:
        # Read the response in chunks and write to the file
        for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
            f.write(chunk)
    # Inform the user of success
    print("Audio stream saved successfully.")
else:
    # Print the error message if the request was not successful
    print(response.text)

This code takes an input audio file (<path> should be something like C:/User/Documents/input.mp3), sends a request to the speech-to-speech endpoint, and receives a stream of audio bytes in return. It then saves the streamed audio to the specified output path (output.mp3 if no specific path is specified, it will be saved in the same folder as the .py code file).

Make sure to replace placeholders like <xi-api-key> and <voice-id> with your actual API key and voice ID respectively. Additionally, adjust paths and settings as needed.