Beyond Voluntary Guidelines: Fixing AI Regulation in Medicine
The Case for Enforceable Bias Audits, Monitoring, and Transparency
Executive Summary
AI is one of the fastest-growing technological advancements right now, and specifically, AI-powered medical devices are rapidly becoming an integral part of everyday clinical care. They promise faster diagnoses, improved risk assessments, and more personalized treatment. However, a majority of tools operate with minimal oversight. Because AI systems can evolve after deployment, rely on opaque algorithms, and introduce or reinforce systemic biases, they pose risks that our current health technology policies cannot adequately address.
This memo argues that the U.S. lacks a strong, enforceable framework for ensuring that AI in healthcare is safe, fair, and accountable. Although the FDA’s 2021 AI/ML Action Plan acknowledged these challenges, it remains non-binding and has had little impact. This memo proposes a broader, stricter national regulatory structure for AI/ML medical tools. One that begins with stronger FDA authority but should ultimately involve more systemic coordination. Without these reforms, patients will keep bearing the consequences of a system that moves fast on innovation but lags on safety. At its core, the memo frames regulation not just as a technical necessity but as a moral obligation to the people to ensure AI in healthcare serves all patients fairly and doesn’t deepen existing inequities.
This memo proposes three core recommendations: mandatory bias audits certified by third parties, required post-market reports on real-world performance, and a public registry listing each AI tool’s use, audit results, and update history.
Introduction
Artificial intelligence (AI) and machine learning (ML) are changing the face of healthcare. Through quick diagnoses, tailored treatment, and predictive analytics, the combination of artificial intelligence and machine learning (AI/ML) in healthcare has the potential to revolutionize patient care. From helping doctors detect cancer earlier to predicting which patients may require hospitalization, AI-powered tools promise faster and more personalized care. However, as these tools are introduced into clinics, emergency rooms, and hospital systems across the country, one crucial aspect is missing: adequate regulation.
Unlike previous generations of medical devices, AI tools learn and adapt over time. This often causes them to change their behavior after deployment. The complexity of these models makes it difficult to understand how decisions are made. The models operate in a “black box” manner, which makes it very difficult for regulators, physicians, or even patients to evaluate. Biases found in these systems can have life-altering consequences, especially for the communities already facing disparities in healthcare access and outcomes.
Despite these risks, the United States does not have an extensive regulatory framework for AI in medicine. There are few mandatory standards, if any, for evaluating bias, requiring performance monitoring, or at the very least ensuring transparency. Developers can bring AI tools to market without having to publicly disclose how they work or their performance across different populations. This memo argues that a stronger, enforceable national policy is urgently needed to protect patients and ensure that AI-powered medical devices in healthcare live up to their potential.
Background
AI/ML tools are now embedded in virtually every aspect of clinical care on a daily. As of January 2025, the U.S. Food and Drug Administration (FDA) has cleared over 1,000 AI-powered medical devices, with radiology leading the specialties at 758 clearances, followed by cardiology with 161. [1] This is a significant bump from previous years. Comparably, if we take a look at October 2023, the FDA authorized 692 AI-enabled medical devices, with 77% (531) in radiology and 10% (71) in cardiology.
The healthcare AI market is experiencing rapid growth. According to Grand View Research, the global AI in healthcare market is expected to reach 187.7 billion USD by 2030, registering a CAGR of 38.5% from 2024 to 2030. [2] This growth is driven by increased demand for diagnostic automation, efficiency in clinical workflows, and predictive analytics. As health systems face pressure to cut costs and improve outcomes, AI tools are being rapidly implemented across the public and private sectors, often without accompanying safeguards. The result of these pressures is a race to innovate such AI systems, but there is a lack of a complementary race to regulate them. We are missing this balance between the two that is not growing as quickly as it needs to be.
This rapid growth in market value and integration means that even minor flaws in AI tools can affect millions of patients across the country and even the world. The risks aren’t just confined to individual hospitals or developers; they scale across the entire landscape.
These risks are not hypothetical; we already see them at mass today. While AI has shown promise, many medical algorithms have been approved based on limited or incomplete clinical data, leading to concerns that these tools may not perform as intended in real-world settings. [3]
Additionally, Studies frequently show that biased AI models result in discriminatory and inaccurate outcomes, especially for patients who are in the minority. One high-profile study from UC Berkeley revealed how bias can manifest in healthcare AI. A widely used risk-prediction algorithm for care management was found to underrate the risk for Black patients. After researchers adjusted for racial bias, the proportion of Black patients identified for extra care jumped from 18% to 47%. [4] Furthermore, according to a Harvard Medical School published article, an AI used across several U.S. health systems exhibited bias by prioritizing healthier white patients over sicker black patients for additional care management because it was trained on cost data, not care needs. [5]
Currently, no centralized system tracks how well these tools work in real-world healthcare settings. Several studies and expert analyses indicate that many AI-driven healthcare tools receive little systematic oversight once deployed. Unlike drugs, which undergo extensive post-market surveillance, AI medical software is often not rigorously monitored for real-world safety and effectiveness after launch. According to the FDA, only about 9% of FDA summary documents included a prospective post-market surveillance study. Furthermore, just 2% of FDA filings mentioned any reporting of potential adverse effects. Researchers noted that only 1.9% of AI device companies published any post-market performance data, and companies are not required to do so. [6]
Assessment of Current Approaches
While there has been a spurring in the realm of AI healthcare regulation in recent years in the US, it is not nearly enough.
In 2021, the FDA in the United States unveiled the AI/ML Action Plan, which established a regulatory framework for Software as a Medical Device (SaMD). The plan included principles for Good Machine Learning Practices (GMLP), real-world performance monitoring, and bias prevention. [7] The FDA’s AI/ML Action Plan was created to address the public’s worries about AI in healthcare. Furthermore, the FDA took a step further when it released a discussion paper proposing the Predetermined Change Control Plan (PCCP), which allows AI developers to update models without requiring re-approval. [8]
On paper, this seems like a significant step in the right direction, but in practice, it is nothing more than an empty threat. Despite the risks AI poses in clinical settings, the FDA’s plan is largely useless. It operates through voluntary compliance, lacks legal enforcement mechanisms, and fails to protect patients who are actually at risk adequately. Long story short, the plan leaves too much to chance, and this “chance” can be deadly, especially in healthcare.
On the other hand, the EU does a much better job at tackling this issue. The European Union’s AI Act treats healthcare AI as “high-risk” and mandates robust risk management systems, mandatory data governance policies, and third-party conformity assessments for such systems before they can enter the market. Under Article 6 of the Act, AI systems are classified as high-risk if they are intended to be used as safety components of products or as standalone products that are subject to third-party conformity assessments, as outlined in existing EU harmonization legislation such as the Medical Device Regulation (MDR) and In Vitro Diagnostic Regulation (IVDR). [9] Article 9 requires providers to establish and maintain a risk management system throughout the AI system’s life cycle, which includes identifying and mitigating potential risks associated with the AI system’s operation. [10] Article 10 stipulates stringent data governance requirements. High-risk AI systems must be developed using high-quality datasets that are relevant, representative, free of errors, and complete. These datasets should be managed with appropriate data governance and management practices, ensuring that the AI system’s performance does not compromise the health and safety of individuals. [11]
Not only does the EU AI Act establish strict guidelines for model approval, but it also sets comprehensive requirements for risk management, data governance, transparency, and oversight.
Policy Proposal
To fill the gap, the U.S. needs a stronger national regulatory system for AI in healthcare. This memo proposes a framework built around three pillars: enforceable bias audits, post-market performance monitoring, and public transparency. The FDA should lead these reforms with the United States Department of Health and Human Services (HHS) and Congress.
This memo proposes the following multipronged approach to better regulate the implementation of AI-powered medical devices.
1. Enforceable Bias Audits
AI developers are required to submit fairness audit results as part of the pre-market review. AI-powered medical devices should be tested for accuracy across gender, race, and age subgroups. Models that show performance disparities or degradation must be revised or will be subject to rejection. Once approved by the FDA, third-party auditing organizations are to standardize methods and reduce industry resistance.
Developers must conduct and submit fairness audits before any AI-powered medical device receives FDA approval. These audits should evaluate the model’s performance across race, gender, ethnicity, and age, following algorithmic impact assessments in other high-risk sectors.
Developers whose tools demonstrate significant disparities must submit mitigation plans or will be subject to rejection. The FDA should publish standardized audit templates and require developers to disclose model training data sources to ensure demographic representativeness.
Fairness audits should be conducted or certified by independent third-party organizations to reduce conflict of interest and enhance credibility. The FDA will accredit these auditors through a formal vetting process, ensuring consistent methodological standards and reducing the risk of regulatory capture.
Incentives will be introduced for early compliance. Fast-track review or priority access to reimbursement if models meet bias audit thresholds. This policy ensures equity is built into the design, not patched after deployment.
2. Mandatory Post-Market Monitoring
All FDA-approved AI devices must undergo periodic performance re-certification based on real-world clinical data. This includes updates on model drift, accuracy variance across subgroups, and adverse outcomes.
Developers must submit annual post-market performance reports, including metrics that evaluate accuracy, false positives/negatives, adverse events, and clinical utility broken down by population groups.
CMS and Medicaid shall look into tying reimbursement eligibility to compliance with monitoring requirements to enable oversight. Non-compliant products risk financial disincentives, product recalls, and public performance flags in the national registry.
This policy also empowers external researchers and watchdog organizations to conduct independent evaluations, with legal protections for whistleblowers and research disclosures.
3. National Registry of AI Tools
In collaboration with the FDA, the HHS is to create and maintain a centralized, publicly accessible registry of all FDA-cleared AI/ML medical devices. Each registry entry must include:
The device’s intended clinical use
Fairness audit outcomes and known limitations
Post-market performance metrics
FDA clearance pathway (e.g., 510(k), de novo)
Dates of any model updates or recertification reviews
This registry serves several functions. First, it empowers hospitals and clinicians to make evidence-based procurement decisions. Second, it allows patients and civil society groups to track potential disparities. Lastly, it gives regulators a transparent dashboard for identifying underperforming or biased tools.
Constituencies, Barriers, Roadblocks
There are several key players regarding AI regulation in healthcare, and they have conflicting priorities. First, there are the regulators. The FDA oversees safety and innovation, but without real enforcement power, it can only issue guidelines instead of strict regulations. It cannot impose mandatory audits or enforce post-market reporting without additional Congressional support. The FDA and potentially HHS will need expanded statutory authority and sustained funding to close this gap. This includes building technical capacity by hiring staff with expertise in machine learning, digital ethics, and real-world algorithm monitoring. In addition, Congress needs to establish a specialized unit dedicated to AI/ML oversight that follows other successful initiatives like the Digital Health Center of Excellence, but with a stronger enforcement mandate.
On the other hand, AI developers, especially startups and tech giants, are primarily incentivized to minimize time-to-market and maximize scalability. Strict regulations like mandatory fairness testing or post-market audits can be seen as unnecessary costs when profit margins are thin or investor expectations are high. Many companies also view their models as proprietary trade secrets and, as such, will be resistant to transparency or third-party scrutiny. Since they have strong lobbying power, their best interest is to push back against regulations that could slow down AI adoption.
Hospitals and providers are stuck in a gray area. On one hand, they want AI tools that are useful and accurate, but they have no direct control over how these tools are developed or whether they meet safety standards. They are eager to deploy AI tools to improve diagnostic speed and operational efficiency, especially in the current market with staffing shortages. However, most lack the in-house expertise to evaluate model performance, with tools coming as “black boxes” from vendors. Providers are also liable for clinical decisions made with the aid of AI, which raises legal and ethical challenges.
Patients have the most stake in AI-driven healthcare, but the least influence over the actual policy decisions that affect them. They typically don’t know how AI tools work or how they might be biased. Vulnerable populations, such as low-income or racially marginalized groups, are most at risk of harm from unregulated or biased tools. They want stronger protections, but without mandatory rules, they have little help or support when the tools do not work as expected.
Ultimately, meaningful regulation will require Congressional action through new legislation granting the FDA expanded enforcement powers or by appropriating funding to strengthen oversight infrastructure.
Conclusion
Overall, it is known that AI and machine learning are already influencing life-or-death choices in American healthcare, so they are no longer only theoretical tools. However, the supervision mechanisms designed to protect and benefit patients have not kept up. The most vulnerable patients will suffer the most when biased or defective AI systems are allowed to run frantically. The United States can no longer rely on a fragmented or voluntary approach to regulating AI in healthcare. There is an urgent need for a more proactive and coordinated system to be put in place.
This memo has laid out a clear path forward: enforceable bias audits, mandatory post-market monitoring, and a national AI device registry. These practical protections consider the reality of a quickly changing medical environment; they are not bizarre concepts. Similar protections are already being adopted abroad, and the United States risks falling behind in both patient protection and public trust.
The goal of AI regulation isn’t to slow innovation, but to make sure it works in the public’s best interest. These tools are advancing whether we’re ready or not. At this point, we can either build a system that makes safety and transparency the default, or risk letting untested algorithms shape critical medical decisions without oversight. The time to act is now to ensure public support and trust.
Works Cited
Dave Fornell has covered healthcare for more than 17 years. (2025, January 10). FDA has now cleared more than 1,000 AI models, including many in Cardiology. Cardiovascular Business. https://cardiovascularbusiness.com/topics/artificial-intelligence/fda-has-cleared-more-1000-ai-algorithms-many-cardiology
Ai in healthcare market size to reach $187.7bn by 2030. AI In Healthcare Market Size To Reach $187.7Bn By 2030. (n.d.). https://www.grandviewresearch.com/press-release/global-artificial-intelligence-healthcare-market
Lenharo, M. (2024, August 21). The testing of AI in medicine is a mess. here’s how it should be done. Nature News. https://www.nature.com/articles/d41586-024-02675-0
Manke, K. (2024, November 23). Widely used health care prediction algorithm biased against Black people. Berkeley News. https://news.berkeley.edu/2019/10/24/widely-used-health-care-prediction-algorithm-biased-against-black-people/
James, T. A. (2024, September 24). Confronting the mirror: Reflecting on our biases through AI in health care. Confronting the Mirror: Reflecting on Our Biases Through AI in Health Care | HMS Postgraduate Education. https://postgraduateeducation.hms.harvard.edu/trends-medicine/confronting-mirror-reflecting-our-biases-through-ai-health-care
Muralidharan, V., Adewale, B. A., Huang, C. J., Nta, M. T., Ademiju, P. O., Pathmarajah, P., Hang, M. K., Adesanya, O., Abdullateef, R. O., Babatunde, A. O., Ajibade, A., Onyeka, S., Cai, Z. R., Daneshjou, R., & Olatunji, T. (2024). A scoping review of reporting gaps in FDA-approved AI Medical Devices. Npj Digital Medicine, 7(1). https://doi.org/10.1038/s41746-024-01270-x
Commissioner, O. of the. (n.d.). FDA releases Artificial Intelligence/Machine Learning Action Plan. U.S. Food and Drug Administration. https://www.fda.gov/news-events/press-announcements/fda-releases-artificial-intelligencemachine-learning-action-plan
Center for Devices and Radiological Health. (n.d.). PCCP guidance. U.S. Food and Drug Administration. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/predetermined-change-control-plans-medical-devices
Busch, F., Kather, J. N., Johner, C., Moser, M., Truhn, D., Adams, L. C., & Bressem, K. K. (2024, August 12). Navigating the European Union Artificial Intelligence Act for Healthcare. Nature News. https://www.nature.com/articles/s41746-024-01213-6
Schuett, J. (2022, December 3). Risk management in the Artificial Intelligence Act. arXiv.org. https://arxiv.org/abs/2212.03109
Article 10: Data and data governance. EU Artificial Intelligence Act. (n.d.). https://artificialintelligenceact.eu/article/10

