model field on create. Both return the same output shape: the same seven lenses, scored second by second. What changes is what each model can read and what it’s sharpest at.
| Model | Reads | Sharpest at | Price per scan |
|---|---|---|---|
mary (default) | Video, audio, voice, and text | Any content, especially the spoken, written, or narrative layer | $0.50 |
qualia | Video and audio (no text input) | Telling visually distinct videos apart | $1.00 |
model field entirely and your scan runs on Mary.
What each model reads
Mary reads everything: video, audio, voiceovers, and plain text such as scripts and ad copy. Qualia reads video and the audio that comes with it, but it does not take text input. So a script or a voiceover transcript always goes to Mary, while a head-to-head between two video cuts is where Qualia earns its keep.| Video | Audio / voice | Text | |
|---|---|---|---|
| Mary | Yes | Yes | Yes |
| Qualia | Yes | Yes | No |
Mary, the generalist
The default, at $0.50 per scan. Reads every modality and is the right call for any content, especially when the message lives in the words.
Qualia, the visual specialist
$1.00 per scan. Reads video with its audio, no text input. Sharper at separating visually distinct pieces of video.
Mary
Mary is a brain encoder. It takes your content, predicts the response of real human brain networks, and reads that prediction out as the seven lenses, scored second by second. Because it reads across every modality, it’s the default for everything. It reads video, audio and voice, and text (scripts, voiceovers, ad copy, narrative, and mixed media). It’s sharpest on any content where the spoken or written layer carries the message, which is why you reach for it by default. Submit a 30-second ad script, a reel, or a voiceover MP3, and you get back a per-second timeline across all seven lenses: where attention holds, where buy-intent builds, where the copy slips. Use Mary to:- Score a reel, TikTok, or ad before you publish or put spend behind it.
- Test script and voiceover variants, comparing two pieces of copy line by line.
- Read any mixed-media creative where the message lives in the words as much as the visuals.
Qualia
Qualia is Sapient’s visual specialist. It reads video, visual and audio together, and produces the same brain-response output as Mary, so the result shape, lenses, and per-second timeline all match. Where it pulls ahead is separating visually distinct pieces of video, which makes it the one to reach for when you’re comparing video creatives and need them clearly told apart. It reads video (visual and audio together) and audio or voice. It does not take text input. It’s sharpest at telling visually distinct pieces of video apart. Submit two video ads that share a script but differ visually, and Qualia pulls them apart, showing which cut holds attention and drives intent second by second. Use Qualia to:- A/B test video creatives where the variants are visual: different cuts, footage, or pacing.
- Compare two video ads when you need a clear visual separation between them.
- Pick the strongest of several video edits before launch.
When to use which
If your input is text or mixed media, use Mary. If you’re comparing videos and need them pulled apart, use Qualia. Mary reads every modality, including voice, narration, script, mixed audio and visual, and plain text. It’s the default and the right call for any content, especially anything where the words matter. Qualia reads video and is sharpest at separating visually distinct video creatives, so reach for it when you’re comparing variants and need a clean separation.Pricing is per model. Mary scans cost $0.50 each, Qualia scans cost $1.00 each. You’re only charged for a scan that completes. See pricing.
Use cases and applications
Real ways teams use Mary and Qualia: A/B testing creatives, hook analysis, pre-flight QA, script testing, and more.