The Algorithm in the Witness Box: AI-Generated Evidence and the Sixth Amendment

There is a moment in every criminal trial when the accused looks across the courtroom and sees the evidence against them. For most of American legal history, that evidence was something a person made — a confession extracted under pressure, a fingerprint lifted by a technician, a witness who claimed to have seen something happen. However imperfect, the adversarial process was designed around a fundamental premise: that a human being, with all their capacity for error and motivation to deceive, stood behind the evidence and could be confronted, challenged, and cross-examined.

That premise is now under siege in ways that the Framers of the Constitution could not have imagined — and that the federal courts have not yet figured out how to address.

Over the past several years, a new category of evidence has been quietly entering American courtrooms: outputs generated by machine learning systems and large language models that purport to reconstruct events, analyze patterns, or identify suspects with a statistical confidence that human experts cannot match. Facial recognition software that places a defendant at a crime scene. Predictive analytics tools that assess recidivism risk and influence sentencing. Natural language processing systems that analyze the metadata of communications and draw inferences about intent. These tools are not hypothetical. They are deployed today, in courtrooms across the country, and they are increasingly decisive.

The Confrontation Clause in the Age of the Black Box

The Sixth Amendment to the United States Constitution provides, in its Confrontation Clause, that an accused shall enjoy the right "to be confronted with the witnesses against him." The Supreme Court has interpreted this provision, most influentially in Crawford v. Washington (2004), to require that testimonial statements made outside of court cannot be admitted against a defendant unless the witness is unavailable and the defendant had a prior opportunity to cross-examine them. The principle is ancient and well-understood: the defendant has a right to look their accuser in the eye, to test the credibility of the evidence through the adversarial process, and to challenge the factual basis upon which the state seeks to deprive them of their liberty.

What happens when there is no witness to confront? What happens when the "testimony" comes not from a person but from a system whose inner workings are proprietary, whose training data is undisclosed, and whose outputs cannot be interrogated because the algorithm itself cannot be cross-examined?

This is not a theoretical problem. It is the lived reality of defendants in a growing number of jurisdictions where AI-assisted evidence is admitted without the kind of disclosure that would allow a defense attorney to meaningfully challenge it. The most notorious example is COMPAS — the Correctional Offender Management Profiling for Alternative Sanctions tool developed by Equivant — which was the subject of the Wisconsin Supreme Court's 2016 decision in State v. Loomis. Eric Loomis challenged his sentence on the grounds that the court had relied on a proprietary risk-assessment tool whose methodology was inaccessible to him. The Wisconsin court upheld the sentence, concluding that the tool did not violate due process because it was merely one factor among many that the judge had considered. The United States Supreme Court declined to hear the case.

The constitutional question that Loomis did not resolve — and that no federal court has squarely addressed — is whether a criminal defendant's due process rights and Confrontation Clause rights are violated when a machine's output is used against them and the defendant cannot obtain meaningful discovery about how that output was generated. The answer, under any serious analysis, should be yes. But the courts have not said so, and in the absence of a binding ruling, prosecutors and vendors are filling the void with tools that defendants cannot effectively challenge.

The Epistemic Problem at the Heart of Machine Evidence

The legal system's difficulty in handling AI evidence is not merely procedural. It is epistemological. The adversarial system is built around a particular theory of knowledge: that truth emerges from the collision of competing claims, tested by experienced advocates and evaluated by a neutral factfinder. This theory assumes that evidence can be explained — that an expert who testifies about the meaning of a fingerprint, or the reliability of an eyewitness identification, can be questioned about their methodology, challenged on their assumptions, and held accountable for their errors.

Modern neural networks and large language models do not work this way. They are, in the now-familiar phrase, black boxes: systems that ingest inputs and produce outputs through a process of weighted computation across billions of parameters that no individual can fully trace or explain. When a facial recognition system identifies a suspect, it does so by comparing patterns in a database through a process that even the system's designers cannot reconstruct, step by step, for a particular output. When a language model analyzes a series of communications and concludes that they reflect a "pattern consistent with intent to defraud," that conclusion rests on statistical regularities in the model's training data that no human expert can fully articulate.

The traditional legal framework for dealing with scientific evidence — the Daubert standard, which requires that expert testimony reflect a reliable methodology that can be tested, peer-reviewed, and subjected to known error rates — is not adequate to address this reality. A vendor can commission studies demonstrating that its tool achieves a certain accuracy rate in controlled conditions, but accuracy in the aggregate tells a defendant very little about whether the specific output used against them was correct. The gap between population-level performance metrics and individual case reliability is precisely the gap that cross-examination is designed to probe — and it is precisely the gap that AI evidence, in its current form, forecloses.

The Racial Dimension No One Wants to Name

There is a dimension to AI-generated evidence that the courts have been particularly reluctant to confront directly, and it is the one that carries the most profound implications for equal justice under law. Multiple studies — including the landmark 2018 investigation by MIT Media Lab researcher Joy Buolamwini and Timnit Gebru, and the 2019 NIST evaluation of facial recognition algorithms — have documented substantial racial disparities in the accuracy of AI identification and prediction systems. Facial recognition tools that achieve accuracy rates above ninety-five percent for white male subjects may perform at dramatically lower rates for Black women. Risk assessment tools trained on historical criminal justice data embed the racial disparities of that data into their outputs, producing scores that are systematically higher for Black defendants than for white defendants with comparable individual profiles.

These are not theoretical disparities. They are measurable, documented, and, in the context of criminal justice, life-altering. A defendant who is misidentified by a facial recognition system may spend years in pretrial detention before the error is corrected — if it is ever corrected. A defendant whose risk score is inflated by systemic bias may receive a harsher sentence, be denied bail, or be placed in more restrictive conditions of supervision based on an algorithmic output that encodes the inequities of the past into the decisions of the present.

The Equal Protection Clause of the Fourteenth Amendment prohibits states from denying any person the equal protection of the laws. It has been interpreted to require that criminal defendants receive a fair trial free from the introduction of evidence tainted by racial prejudice. In Batson v. Kentucky, the Supreme Court held that the racially discriminatory use of peremptory challenges violates the Equal Protection Clause. In Buck v. Davis, the Court reversed a death sentence in part because an expert witness had testified that the defendant's race was a factor that made him more dangerous — a use of race in criminal sentencing that the Court found constitutionally impermissible.

The constitutional logic of those decisions should apply with equal force to AI tools that produce racially disparate outputs. If an expert witness may not testify that a defendant's race makes them a greater risk, it is difficult to understand why a risk assessment algorithm — which, as the academic literature demonstrates, effectively encodes race through proxy variables — should be treated differently. The fact that the discrimination is laundered through mathematics does not make it any less discriminatory.

The Vendor Relationship and the Accountability Vacuum

The admissibility problems created by AI evidence are compounded by the structural relationship between law enforcement agencies and the private companies that develop and sell these tools. Unlike government agencies, private vendors are not directly subject to constitutional constraints. They can refuse to disclose their source code, their training data, or their validation methodology on the grounds of trade secret protection — and courts have, in multiple cases, allowed them to do so. The defendant is left confronting not a witness who can be called to account, but a product, with the company that made it shielded behind proprietary walls.

This structure creates an accountability vacuum that is incompatible with the due process requirements of the American criminal justice system. When the government uses a private contractor to do something it could not do directly — like evading the requirements of the Confrontation Clause by insulating the methodological basis of its evidence from defense scrutiny — the constitutional guarantees that limit state power should not evaporate simply because a private intermediary was inserted into the chain. The Supreme Court has recognized this principle in other contexts, and it should be applied here.

Several state legislatures have begun to act where the courts have not. Illinois, Maryland, and California have enacted or proposed legislation requiring disclosure of AI tool methodologies when those tools are used in criminal proceedings. New York City has established an Automated Decision Systems Task Force. The European Union's AI Act, which entered into force in 2024, imposes stringent transparency and accuracy requirements on AI systems used in high-risk contexts, including criminal justice. These legislative efforts represent a meaningful step in the right direction, but they are incomplete, inconsistently implemented, and no substitute for a coherent constitutional framework.

Toward a Constitutional Framework

What would a constitutionally adequate framework for AI evidence in criminal proceedings look like? Several principles suggest themselves, not as a comprehensive solution but as a starting point for a conversation that the legal profession has been too slow to have.

First, meaningful disclosure. Any AI tool used to generate evidence in a criminal proceeding — whether at the investigative stage, the charging decision, or trial — should be subject to mandatory disclosure of its training data sources, validation methodology, known error rates by demographic group, and any significant modifications made after deployment. This disclosure should be available to the defense as a matter of right, not as a matter of prosecutorial discretion. Trade secret protections should not override the defendant's constitutional right to confront the evidence against them. If a vendor will not consent to disclosure, the evidence derived from that vendor's tool should be inadmissible.

Second, independent validation. No AI tool should be used in criminal proceedings without a prior, public validation study by an independent entity — not the vendor, and not the law enforcement agency that purchased the tool. That study should examine not only aggregate accuracy but accuracy disaggregated by race, sex, age, and other demographic characteristics. Error rates that are deemed acceptable for a white male majority should be required to be equally acceptable for all groups before the tool may be used.

Third, judicial gatekeeping commensurate with the stakes. The Daubert framework should be applied to AI evidence with the rigor that the criminal context demands. A court admitting AI-generated evidence against a defendant should be required to make specific findings that the evidence meets not just aggregate reliability standards but case-specific reliability standards — that the tool's known error rate is acceptable given the stakes, and that the output was generated under conditions consistent with the validated use case.

Fourth, a right to an adequate expert. If the prosecution introduces AI evidence that a defendant cannot meaningfully evaluate without expert assistance, the defendant should have a constitutionally guaranteed right to a qualified expert at government expense. The Ake v. Oklahoma principle — that indigent defendants have a right to expert assistance when that assistance is necessary for an adequate defense — should be understood to encompass AI expertise in cases where that expertise is required to challenge the government's evidence.

The Deeper Question

Beneath the technical and procedural questions lies a deeper question about the kind of justice system we want to have. The adversarial system is not merely a mechanism for producing accurate outcomes — though accuracy matters enormously. It is also a set of commitments about how the state may exercise its power to deprive a person of liberty. Those commitments include the requirement that the accused be able to challenge the evidence against them, that the state not hide its methods behind secrecy, and that all defendants receive equal protection regardless of their race, sex, or other characteristics.

AI evidence, as currently deployed, threatens all three of these commitments. It is not enough to say, as proponents of these tools often do, that they are more accurate than human judgment on average — even if that claim is well-supported, which it often is not. A system that denies defendants a meaningful opportunity to challenge the evidence against them, that conceals its methodology behind proprietary walls, and that produces racially disparate outcomes is not a more reliable system. It is an unfairer one, and fairness is not a luxury that the Constitution permits us to sacrifice at the altar of efficiency.

The Supreme Court has not yet been forced to confront these questions squarely. When it is — and the right case, with the right record and the right lawyers, will eventually make its way up — the stakes will be nothing less than the integrity of the Sixth and Fourteenth Amendments in the age of artificial intelligence. The decisions made in that crucible will shape the character of American criminal justice for a generation.

The algorithm cannot be cross-examined. That is the problem. It is not a small one.

THE ETHICS REPORTER