Ishavi
Trust dossier 02Rev. 2026-Q1
Model card · Revision 2026-Q1

What the model does, how it was measured, what it cannot do.

This is Ishavi's public model card. It follows the standard model-card spec adapted for an interview-scoring system: intended use, architecture, evaluation metrics, known limitations, bias risks, and the caveats every recruiter integrating the platform should read first.

Card version
2026-Q1
Card issued
2026-03-31
Next revision
2026-Q3 (post-audit)
Model providers
OpenAI, Anthropic, Google
Owner
Ishavi -- privacy@ishavi.app
  1. 01

    Section 01

    Intended use

    Ishavi runs structured voice interviews focused on knowledge verification. The model is intended to score candidate responses against a job-specific rubric supplied by the recruiter, producing evidence-anchored recommendations that a human reviewer takes as input -- not as a decision.

    • Primary users: recruiters, hiring managers, and RPO firms running first-round technical or domain screens.
    • Primary subjects: job applicants who have consented to an AI-conducted interview.
    • Out of scope: personality scoring, cultural-fit prediction, retention forecasting, salary negotiation analysis.
    • Out of scope: high-stakes decisions taken without human review (the platform forbids this in product, not just in policy).
  2. 02

    Section 02

    Model architecture

    The pipeline is not a single model -- it is a chain of specialised models with strict input/output schemas between stages. Each stage is independently swappable and independently audited.

    • Speech-to-text: OpenAI Whisper (large-v3 on us-east-1; tiny on the OCI Mumbai edge for low-latency draft transcripts).
    • Conversation orchestration: Anthropic Claude family for question selection, follow-up generation, and rubric mapping.
    • Scoring + summarisation: Google Gemini family for rubric-anchored scorecard composition; OpenAI GPT family as fallback.
    • Text-to-speech: OpenAI TTS (alloy / nova voices) for the interviewer voice.
    • All model outputs are stored alongside the prompt + system message that produced them so any decision can be reproduced.
  3. 03

    Section 03

    Training data

    Ishavi does not train its own foundation models. All foundation models used are general-purpose models hosted by their respective providers; we configure them with structured prompts, rubric grounding, and retrieval over the customer's job description.

    • No customer transcripts, audio, or scoring data is used to fine-tune any foundation model.
    • No candidate personal data leaves the customer's region except through the model provider's published inference endpoint, governed by their DPA.
    • Future fine-tuned domain models will be opt-in per tenant and disclosed here before training begins.
    • Provider sub-processors and their data-handling commitments are listed at /legal/subprocessors.
  4. 04

    Section 04

    Evaluation metrics

    Performance is evaluated on three axes -- rubric alignment, transcript fidelity, and human-reviewer agreement. Metrics are recomputed quarterly against a held-out evaluation corpus of recruiter-reviewed interviews; the current numbers reflect the Q1 2026 cut.

    • Rubric alignment (model recommendation vs. recruiter ground truth): 0.78 weighted Cohen's kappa.
    • Transcript fidelity (WER on the post-correction pass): 4.1% on US English; 6.8% on Indian English; 11.2% on accented English not seen at training time.
    • Evidence quote accuracy (does the cited quote appear verbatim in the transcript): 99.6% under the strict-match check.
    • Human-reviewer override rate on appeal: 12% upheld in candidate's favour; 4% modified; 84% original recommendation stood. (Q1 2026 cohort, n = 1,247.)

    Audit pending Q3 2026. The above figures are internally measured; the demographic-stratified breakdown below is held until the external audit closes.

  5. 05

    Section 05

    Known limitations

    Stated plainly because hiding them is worse than admitting them. Recruiters integrating Ishavi should understand these limits and design their workflow around them.

    • Heavily accented English raises WER and consequently scoring noise. The platform shows transcript-confidence bands; recruiters should treat low-confidence quotes with care.
    • Long single-turn answers (>180 seconds) compress into the rubric less reliably than shorter answers. Follow-up generation is tuned to keep turns under 90 seconds.
    • Domain knowledge outside the rubric is not scored -- a candidate can be excellent at something the rubric did not ask about.
    • Real-time network jitter on the candidate's side can drop audio frames; the platform surfaces this as a session-quality flag rather than silently filling gaps.
    • Voice biometrics are NOT used; the platform does not attempt to identify the speaker beyond the candidate's pre-authenticated session.
  6. 06

    Section 06

    Bias risks

    We assume bias exists until measurement proves otherwise. The model card commits to surfacing the risks we know about and the mitigations we have in place, not to claiming the system is bias-free.

    • Accent bias: documented above as a WER gap; flagged in product, mitigated with confidence bands on transcript-anchored quotes.
    • Lexical bias: the rubric grounding step is intended to keep scoring tied to job-relevant vocabulary; we test for unintended weight on prestige terms quarterly.
    • Length bias: longer answers are not scored higher by default; the rubric extractor normalises to evidence count, not word count.
    • Adjacent-disability risk: the platform offers extended-time accommodations and pause-and-resume on every interview by default.
    • Protected-class inference: explicitly forbidden in the system prompt; flagged by an output-classifier guardrail before delivery to the recruiter.
  7. 07

    Section 07

    Demographic-stratified performance

    Audit pending Q3 2026. This section will be populated with disaggregated performance by self-reported gender, broad ethnicity, primary language, age band, and disability disclosure -- following the methodology used in the NYC LL144 bias-audit framework. We will publish the audit report alongside this page when issued.

    Until then, recruiters running in NYC must rely on their independent annual bias audit per Local Law 144. Ishavi furnishes the underlying interaction data on request under a data-processor agreement.

  8. 08

    Section 08

    Caveats + recommendations

    Ishavi is built to be one signal in a hiring decision, not the decision itself. We strongly recommend integrating it as a structured first-round screen with mandatory human review on every advance/reject -- the appeals workflow is designed around this assumption.

    • Pair Ishavi with a separate live interview before a hire decision.
    • Configure the Bill of Rights appeals SLA explicitly per tenant; the default 72 hours is a floor, not a ceiling.
    • Review the recommendation, the evidence quotes, and the candidate's appeal (if any) before closing a decision.
    • Re-evaluate the rubric every six months against actual job performance of hires made through the platform.