Skip to primary interaction
Product Build

CV Capability Console

Six computer vision capabilities live in your browser

  • Working examples of six CV primitives running client-side
  • Build vs buy vs fine-tune cost framework per capability
  • Privacy by design, no image data leaves the browser

Primary interaction

CV Capability Console — Live AI Demo

0106Capability

Face & Affect

Detect faces, locate landmarks, and classify expression. Smile is derived from mouth landmark geometry; emotion comes from a 7-class FER classifier.

Runs onface-api.js

01 / 06 capability

Face & Affect

Sample portrait for face and affect detection, professional on a city street
Loading models…

Live readout

Running…

How it works

A frame is passed to a single face detector, which returns bounding boxes and 68 facial landmarks. From the landmarks, mouth-corner spread plus lip-curvature gives a smile signal without a second model. The same cropped face is fed in parallel to a small CNN trained on FER-derived data, returning probabilities across happy, sad, angry, surprised, fearful, disgusted, and neutral.

Build / Buy / Fine-tune

Browser (off-the-shelf)

Cost$0
Bundle~6MB total (face detector + landmarks + emotion model, lazy-loaded)
Latency~50ms per frame on a 2020+ laptop
PrivacyFrame never leaves the device

What's running on this page right now.

Cloud API

CostAWS Rekognition: ~$1 per 1,000 face detections; emotion analysis included. GCP Vision and Azure Face are in the same range.
Latency200–500ms round trip + processing
PrivacyFrames sent to the vendor; verify retention policy

Use when you need stronger edge-case accuracy or already pay for the vendor.

Custom fine-tune

CostTraining data: $5K–$50K depending on whether you license FER+ derivatives or label your own. Compute: $200–$2K for a single fine-tune run on a small CNN. Hosting: $100–$1K/month for a small inference endpoint.
LatencySame as browser if quantized and shipped client-side; otherwise depends on your hosting
PrivacyWhatever you choose to build

Worth it when off-the-shelf misses your domain, masked faces, low-light, specific demographics.

When to choose whichDefault to browser. Move to cloud only when off-the-shelf accuracy on your edge cases is unacceptable AND privacy/latency don't block it. Fine-tune only when both fail.

Where it shows up

  • Voice-agent sentiment routing, detect frustration mid-call and escalate
  • UX research observation, capture reaction timelines without recording video
  • Accessibility, emotion overlays for users training on social cue recognition

Vision isn't a research problem anymore. It's a procurement decision.

  1. 01

    Does the off-the-shelf model hit ~80% of the accuracy you need?

    If yes, ship it. Most CV problems are well-served by a generalist model. Don't fund a custom training run to gain the last 5% before you've validated the use case.

  2. 02

    Are the failure modes tolerable in production?

    Where the model fails matters more than how often it fails. A 90% accurate face detector that fails consistently on a protected demographic is unshippable. A 70% accurate detector with random errors is fine for many flows.

  3. 03

    Do privacy or latency requirements force on-device?

    If frames can't leave the device, or you need sub-100ms latency, the answer is browser or edge, full stop. The cloud APIs are off the table no matter how much accuracy they offer.

  4. 04

    Then, and only then, decide build vs. buy vs. fine-tune.

    Browser is the default. Cloud beats browser when accuracy matters more than latency or privacy. Fine-tune beats both only when there's a real domain shift the off-the-shelf models can't cover, and you have the data to support it.

Under the hood

Why the framework matters

Most product teams still treat computer vision as a research problem when the real problem is procurement. CV commoditized faster than the org chart caught up, and the question is no longer can we do this. It is which version do we ship and what does it cost. The framework above is how I make that call, and the six capabilities below are the working reference for what's possible at each tier today.