Create a personalized Voice Persona through voice verification. This is a single-task, two-phase process: init (upload voice + get verification phrase) → complete (upload verification recording + create persona). The entire flow uses one taskId.
Workflow
User's voice audio
│
▼
① voicePersona/init
│ Upload voice → Extract vocals → Return verification phrase
│
│ Returns: { taskId }
│ Poll: GET /suno/v2/status?taskId=xxx
│ Wait for status == "awaiting"
│ data: { phrase_text, ... }
│
▼
User reads phrase_text aloud and records (within 30s timeout)
│
▼
② voicePersona/complete (same taskId)
│ Upload verification recording → Voice verification → Create Persona
│
│ Poll same taskId: GET /suno/v2/status?taskId=xxx
│ Wait for status == "success"
│ data: persona details
▼
Done → Use persona in /generate
Task Status Flow
queued → running → awaiting → running → success
│ │
│ └── complete failed → failed
└── User timeout → failed (VP_USER_TIMEOUT)
After the task reaches awaiting status, you must call complete within 30 seconds (default). If the timeout is exceeded, the task will fail with VP_USER_TIMEOUT and you’ll need to restart from init.
Step 1: Init — Upload Voice & Get Verification Phrase
Upload the user’s voice audio. The system extracts vocals and returns a verification phrase that the user must read aloud.
This is an async task. Poll Get Task Status with the returned taskId. Wait for status to become awaiting (not success).
Request
POST /suno/v2/voicePersona/init
| Field | Type | Required | Description |
|---|
voice_audio_url | string (URL) | Yes | Publicly downloadable URL of the voice audio (WAV/MP3) |
language | string | Yes | Verification phrase language: zh en ja ko es fr de pt ru hi |
vocal_start_s | number | No | Vocal extraction start time (seconds), default: 0 |
vocal_end_s | number | No | Vocal extraction end time (seconds), default: auto-detected |
Polling Result (status: awaiting)
When the task reaches awaiting status, data contains:
| Field | Description |
|---|
phrase_text | Verification phrase text — user must read this aloud and record |
phrase_id | Verification phrase ID (internal) |
vox_audio_id | Extracted vocal audio ID (internal) |
voice_recording_id | Recording ID (internal) |
vocal_start_s | Vocal start time (seconds) |
vocal_end_s | Vocal end time (seconds) |
Only phrase_text is needed by the user. All other fields are used internally by the system — you do not need to pass them to the complete step.
See Init API Reference →
Step 2: Complete — Upload Verification Recording & Create Persona
After the user reads phrase_text aloud and records it, upload the verification recording using the same taskId to complete voice verification and create the persona.
Uses the same taskId from init. After calling complete, continue polling the same taskId until status becomes success.
Request
POST /suno/v2/voicePersona/complete
| Field | Type | Required | Description |
|---|
taskId | string (UUID) | Yes | The taskId from init (same task) |
verification_audio_url | string (URL) | Yes | User’s verification recording URL (WAV/MP3) |
name | string | Yes | Persona name |
description | string | No | Persona description |
is_public | boolean | No | Whether public (default: false) |
image_s3_id | string | No | Cover image (base64), auto-generated if not provided |
No intermediate data (vox_audio_id, phrase_id, etc.) is needed — the system reads them automatically from the init phase.
See Complete API Reference →
Complete Example
const API_BASE = 'https://api.mountsea.ai';
const headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer your-api-key'
};
async function pollTask(taskId, targetStatus = 'success') {
while (true) {
const res = await fetch(`${API_BASE}/suno/v2/status?taskId=${taskId}`, { headers });
const task = await res.json();
if (task.status === targetStatus) return task.data;
if (task.status === 'success') return task.data;
if (task.status === 'failed') throw new Error(task.failReason);
await new Promise(r => setTimeout(r, 3000));
}
}
// Step 1: Init — upload voice and get verification phrase
const initRes = await fetch(`${API_BASE}/suno/v2/voicePersona/init`, {
method: 'POST',
headers,
body: JSON.stringify({
voice_audio_url: 'https://example.com/my-voice.wav',
language: 'zh'
})
});
const { taskId } = await initRes.json();
// Poll until status is "awaiting"
const initData = await pollTask(taskId, 'awaiting');
console.log('Please read aloud:', initData.phrase_text);
// → User records themselves reading the phrase ...
// Step 2: Complete — upload verification recording (same taskId)
await fetch(`${API_BASE}/suno/v2/voicePersona/complete`, {
method: 'POST',
headers,
body: JSON.stringify({
taskId,
verification_audio_url: 'https://example.com/verification.wav',
name: 'My Voice',
description: '我的专属声音'
})
});
// Poll the SAME taskId until status is "success"
const persona = await pollTask(taskId, 'success');
console.log('Voice Persona created:', persona);
Error Codes
| Code | Error | Description |
|---|
| 400 | VP_TASK_NOT_FOUND | taskId does not exist or is not a Voice Persona task |
| 400 | VP_INVALID_STATUS | Task status is not awaiting, cannot call complete |
| 408 | VP_USER_TIMEOUT | Timeout waiting for complete after init (default 30s) |
| 409 | VP_SESSION_EXPIRED | Verification session expired, restart from init |
| 500 | VP_LOCK_EXPIRED | Internal lock expired (retry) |
| 503 | VP_NO_DEDICATED_ACCOUNT_AVAILABLE | No dedicated account available |
| 503 | VP_ALL_ACCOUNTS_BUSY | All account queues are full, retry later |
| 504 | VP_ORPHAN_TIMEOUT | Task queuing timeout |
Important Notes
The verification recording must clearly contain the full phrase_text content. Incomplete or unclear recordings will cause voice verification to fail.
- Single taskId lifecycle: Init and complete use the same
taskId — poll one task throughout the entire flow.
awaiting status: After init completes, the task status is awaiting (not success). The data field contains phrase_text for the user to read.
- 30s time limit: You must call
complete within 30 seconds after the task reaches awaiting. Exceeding this causes VP_USER_TIMEOUT.
- Simplified parameters:
complete only needs taskId + verification recording URL + persona info. All intermediate data is auto-filled by the system.
- Same account guarantee: Both phases automatically use the same Suno account.
- Language selection:
language determines the verification phrase language. Match the language of the original voice audio for best results.
- Processing time: Init takes ~20-60s (includes vocal extraction); Complete takes ~10-30s (includes voice verification).
- Concurrency safety: The system serializes Voice Persona operations per account — concurrent requests from different users won’t interfere.