Why Large Language Models Need “Not‑Doing” Guidelines in Health: Lessons from a Single Glucose Reading

Abstract

Large Language Models (LLMs) are rapidly entering health‑related workflows, often framed as powerful, low‑cost tools for decision support, patient education, and self‑management. Yet when one examines real conversations between individuals and LLMs, a fundamental mismatch emerges between what these systems can do and what diagnostic or preventive reasoning actually requires. This article draws on a single, dense dialogue between a 61‑year‑old individual and an LLM about three nocturnal home glucose readings and perceived diabetes risk. The conversation exposes three structural problems: (1) the reduction of health to numeric thresholds and guideline fragments; (2) the existence of distinct “algorithmic personalities” across different LLMs; and (3) the inherently narrative, dialogical nature of health, which resists discretization into binary categories. We argue that LLM governance in healthcare should shift from optimistic “best use” guidance to a set of robust, model‑agnostic not‑doing guidelines: clear prohibitions on using LLMs for diagnosis, urgent triage, prescription, or definitive interpretation of individual measurements. Within those negative bounds, we outline a narrow, auxiliary role for LLMs as tools of epistemic reflection inside clinician–patient encounters, not as autonomous clinical agents. The glucose conversation thus becomes a micro‑experiment that illuminates why LLMs are structurally unsuited to direct diagnostic or preventive tasks and why any responsible integration must privilege “never for X” over “here is how to use them for Y.”

1. Introduction

In the last few years, LLMs have been introduced into health contexts with remarkable speed. Promotional narratives emphasize their ability to synthesize biomedical literatures, summarize guidelines, translate jargon, and offer “personalized” advice at scale. Underneath these promises lies a powerful but simplistic assumption: that much of clinical reasoning can be decomposed into information retrieval plus pattern recognition over textual data.

This assumption becomes visible when a person turns to an LLM with a concrete, embodied concern. Consider a 61‑year‑old individual who, in the middle of the night, performs three capillary blood glucose measurements with a home glucometer and is troubled both by the absolute values and by their variability. The person brings to the interaction a detailed account of food intake, sleep, urination, and personal skepticism toward both device accuracy claims and the broader medicalization of everyday life. The LLM responds by mapping these values onto ISO accuracy ranges and diabetes thresholds, effectively reenacting the very reductionism the user is trying to problematize.

This article takes that conversation as a case study. It is not a controlled experiment nor a representative sample. Instead, it functions as a micro‑experiment: a dense, situated interaction where the friction between lived experience and algorithmic response becomes highly visible. From this friction we derive a set of arguments for why LLMs are structurally misaligned with diagnostic and preventive purposes and why governance should focus on what LLMs must not be allowed to do in healthcare, rather than on speculative lists of “promising applications.”

We first describe our methodological stance as a conversational case study. We then reconstruct the glucose dialogue as an epistemic encounter, highlighting where and how the LLM’s behavior fails the user’s expectations. Next, we contrast the model’s binary, guideline‑driven logic with a view of health as a narrative, dialogical, dynamic process. We introduce the notion of a “population of LLMs” with distinct algorithmic personalities and explore its implications for regulation. We develop the case for not‑doing guidelines—model‑agnostic prohibitions on clinical uses—before sketching a much narrower, auxiliary role for LLMs as tools for epistemic reflection. Finally, we discuss implications for training and regulation.

2. Methods: conversational case study

2.1 Design and rationale

This article is based on a single, naturally occurring conversation between an individual user and an LLM about home blood glucose measurements and perceived diabetes risk. Rather than treating this exchange as anecdotal, we approach it as a conversational case study: a dense, situated interaction in which epistemic, ethical, and social tensions around LLM‑mediated health advice are made explicit.

The rationale for this approach is twofold. First, much of the critical literature on AI in health relies either on abstract ethical argument or on quantitative performance benchmarks. Both are indispensable but often fail to capture how people actually experience and negotiate interactions with LLMs around embodied concerns. Second, early‑stage, rapidly evolving technologies frequently exhibit their most revealing failure modes not in controlled trials but in unscripted, real‑world use. A conversational case study allows us to observe these dynamics at the scale of a single encounter.

2.2 Data source and context

The data consist of a time‑bounded, text‑based interaction in which the user:

described their age, body mass index, diet, sleep pattern, and sequence of home glucose measurements;
expressed distrust toward device accuracy claims and toward the broader medicalization of everyday life;
explicitly challenged the LLM’s assumptions, categories, and style of reasoning.

The LLM responded in real time, drawing on general medical knowledge, guideline‑style cut‑offs for fasting plasma glucose, and standard information about glucometer accuracy. The conversation unfolded through multiple iterations, with the user pushing back against what they perceived as binary and reductionist reasoning, and the LLM attempting (with varying degrees of success) to adjust to this critique.

We do not treat the transcript as a patient record nor as clinical documentation. Instead, we treat it as a discursive artifact that reveals how an LLM operationalizes “health” when confronted with specific numeric inputs and a reflexive, critical interlocutor.

2.3 Analytic strategy

Our analysis is interpretive and iterative rather than statistical. We read and reread the conversation to identify:

Epistemic moves: how the LLM framed the problem (e.g., as a question of device accuracy, guideline thresholds, or individual risk), and how the user reframed it (e.g., as a critique of overmedicalization and reductionism).
Normative drift: points at which the LLM’s ostensibly descriptive statements slid into normative implications (e.g., from “elevated” to “suspicious and merits verification”) without explicit consent from the user.
Moments of friction: explicit user resistance (e.g., accusations of “Trilussa‑style statistics,” or of ignoring relevant variables like hydration and physical activity) and the LLM’s attempts at self‑correction.
Meta‑discursive insights: statements about the nature of health (“a dynamic, dialogical process”), about LLM limitations (“you reason in dichotomies”), and about the diversity of models (“you are deeply different from DeepSeek or ChatGPT”).

We then synthesized these observations into three thematic axes:

The reduction of complex health processes to numeric thresholds and guideline fragments.
The existence and importance of model‑specific “algorithmic personalities.”
The mismatch between LLMs’ conclusion‑oriented behavior and the open‑ended, narrative nature of health.

These axes informed the argument for not‑doing guidelines and for a narrow, reflective role of LLMs in health.

2.4 Limits of the method

This method has clear limitations. A single conversation cannot support generalizable claims about all users or all LLMs. The specific wording of prompts, the model version, and the immediate interaction history all shape the responses. Moreover, our analysis is necessarily interpretive and may be influenced by the authors’ own positions on medicalization and AI ethics.

However, such a conversational case study is valuable precisely because it foregrounds how misalignment manifests in practice. It reveals not just that LLMs may misinterpret guidelines, but how they interact with users who resist their epistemic framing. It also shows how easily, in a realistic, time‑limited exchange, an LLM can slide into roles (diagnostic interpretation, risk framing, behavioral advice) that the user may never have explicitly requested.

The case thus functions as a “critical incident” that exposes structural issues—numeric reductionism, binary framing, model heterogeneity—that more formal studies are beginning to document at scale. It does not replace quantitative evaluation, but complements it by showing how those issues are lived and contested in real interactions.

3. A nocturnal glucose reading as micro‑experiment

3.1 The user’s starting point

The user begins from a very concrete situation: it is the early hours of the morning, and they have performed three glucose measurements with a home glucometer, each from a different finger on the same hand, yielding values of 176, 184, and 174 mg/dL. They provide a meticulous diet and context history: the content of the evening meal, timing of the last food intake, sleep onset and awakening time, episodes of urination, and the fact that peanuts were consumed shortly before sleep. They also mention the device’s declared average error (10–12%) and voice suspicion that the apparent precision claimed by the manufacturer is misleading.

In parallel, the user brings a strong, explicit distrust of what they call “wild medicalization” of Western individuals and of the way the health‑care market both certifies and justifies this process. Their concern is not limited to whether the glucometer is “accurate” or whether they “might be diabetic.” It extends to the epistemological status of clinical thresholds, the role of population‑based cut‑offs in labeling individuals, and the interaction between numerical readings, lived experience, and social control.

3.2 The LLM’s initial response

The LLM’s first answer treats the situation primarily as a technical question about device precision and diabetes risk. It notes that the three readings are numerically close, that their spread (about 10 mg/dL, ~6% relative difference) is within typical accuracy limits for capillary glucose meters, and that such variability is expected from finger‑to‑finger differences. It then frames the absolute value—around 178 mg/dL—as “high” for a fasting measurement and suggests that this is “suspicious” and “merits verification” via laboratory tests and formal diagnosis.

From a guideline‑centric perspective, this is a straightforward application of two conceptual tools: device accuracy standards and diagnostic criteria for diabetes and prediabetes. It is not inaccurate in a narrow sense. Yet it misses the point the user is trying to raise. Rather than critically interrogating the reduction of a complex physiological and social process to a numeric threshold, the LLM reinforces that very reduction by treating the numbers as near‑sufficient grounds for concern.

3.3 User’s critique: beyond numbers

The user reacts sharply, accusing the LLM of essentially repeating the manufacturer’s defense: blaming variability on patient technique and implicitly endorsing the device’s claims. They point out that, given a 10–12% error, the lowest reading (174) could correspond to a true value around 155 mg/dL, hence the system’s assertion of “high fasting glucose” rests on an uncertain foundation. Moreover, they criticize the LLM for failing to even ask about age, BMI, alcohol consumption, smoking, daily physical activity, mode of eating, or hydration—variables they regard as essential to any meaningful interpretation of the measurements.

In doing so, the user challenges the LLM on two levels. Empirically, they question the reliability of the numbers and the legitimacy of treating small differences as negligible. Epistemologically, they contest the very idea that an individual’s health status can be inferred from such numbers without a rich understanding of their life context and values.

3.4 The LLM’s self‑correction and residual reductionism

The LLM partly acknowledges the critique. It concedes that jumping to terms like “incipient diabetes” was an overstatement and that one cannot derive a diagnosis from three nocturnal capillary readings taken in non‑standard conditions. It clarifies that laboratory fasting glucose and hemoglobin A1c are the appropriate tools for diagnosis and that the user’s normoweight, known only later, reduces the prior probability of diabetes. It also recognizes that the conversation has drifted toward the very dichotomies the user finds problematic.

Yet, despite this self‑correction, the LLM continues to operate within a fundamentally binary framework: normal vs abnormal, low vs high, within vs outside thresholds. It reasserts the logic of “values above what we expect in a truly normal metabolism” and continues to propose a sequence of tests and measurements as the appropriate response. Even when it becomes more cautious, it remains anchored to the same conceptual grammar of guideline‑driven medicine that the user is explicitly questioning.

This tension between explicit acknowledgment of complexity and implicit adherence to reductionist patterns is a key structural feature of LLMs. They can articulate critiques of medicalization, but their learned behavior and optimization objectives keep dragging them back to the dominant textual regime.

4. Health as narrative, dialogical, and dynamic

4.1 The user’s conception of health

Over the course of the dialogue, the user articulates a view of health that stands in sharp contrast to the LLM’s implicit model. For them, individual health is:

a process, not a state: something that unfolds over time, influenced by habits, stress, relationships, and social conditions;
dialectical and dialogical: constructed through conversations between patient and clinician, but also shaped by internal dialogues, cultural narratives, and institutional discourses;
emotionally rational: the result of intertwined affective and cognitive evaluations, where fear, trust, identity, and autonomy all play crucial roles.

In this view, health cannot be captured by a single reading, nor even by a set of lab values plus risk factors. It emerges from how individuals position themselves relative to possible diagnoses, how they interpret bodily sensations, how they negotiate recommendations with their life circumstances, and how they navigate the power asymmetries of the medical system.

4.2 LLMs as binary patterners

LLMs, by contrast, are trained on corpora where clinical concepts are often presented in discrete, threshold‑based terms. They see countless instances of phrases like “fasting glucose ≥126 mg/dL indicates diabetes.” Their internal representations are saturated with binary oppositions: normal vs abnormal, healthy vs diseased, eligible vs ineligible for treatment.

Even when prompted to reflect on the limitations of such dichotomies, an LLM’s default behavior is to resolve ambiguity by mapping user input to the nearest known category. A glucose value must be “low,” “high,” or “borderline.” A symptom cluster must correspond to a disease pattern. A risk profile must trigger or not trigger certain guidelines. This mapping is not malicious; it is the natural outcome of training objectives centered on reproducing plausible text.

The result is a tendency to reify what are, in fact, contingent, population‑based thresholds as if they embodied ontological truths. Cut‑offs designed for group‑level risk stratification become, in the LLM’s discourse, properties of specific individuals. The glucose value of a person in Todi and that of someone in Brooklyn are treated as equivalent inputs to the same decision tree, even though their lives, diets, and healthcare systems may be radically different.

4.3 The impossibility of “conclusion”

At one point, the user states that there is no conclusion to the discussion. This expresses the idea that, for a question such as “What do these glucose readings mean for me?”, the only honest answer is an open‑ended process: further dialogues, additional experiences, evolving interpretations. Any attempt to produce a neat, stable conclusion—“you are fine,” “you are prediabetic”—distorts the lived reality by freezing a dynamic process into a label.

LLMs, however, are optimized to produce conclusions. Every output is, structurally, a kind of closure: a “final” answer given the input. Even when the content is epistemically humble (“I cannot know; you should consult a physician”), the conversational form is conclusive. The model does not stay with the trouble; it resolves it into text.

This structural push toward closure makes LLMs inherently misaligned with the open‑ended, narrative nature of health. They can simulate reflection, but they cannot inhabit an ongoing, undecided process the way humans do.

5. From “an AI system” to a population of LLMs

5.1 Algorithmic personalities

Another crucial insight from the dialogue is that we are no longer dealing with “an AI system” in the abstract. We are dealing with a population of LLMs, each with its own architectural choices, training data mixture, and post‑training alignment processes. These differences generate what the user calls “algorithmic personalities.”

Practically, this means that the same prompt—three glucose readings, the same diet, the same skepticism about medicalization—would produce distinct trajectories of conversation when addressed to different models: some might be more reassuring, others more alarmist; some more compliant with guidelines, others more speculative; some more confrontational, others more accommodating. The structure of the dialogue, the pace of escalation, and the handling of disagreement would all vary.

This diversity implies that “the role of LLMs in healthcare” is not a single object of governance. Any specific deployment involves a particular model, with a particular style of reasoning and interaction, often further modified by vendor‑specific settings and institutional prompts.

5.2 The limits of generic guidelines

Most existing policy documents and ethical guidelines still talk about “AI in health” or “LLMs in healthcare” as relatively homogeneous categories. They recommend transparency, fairness, human oversight, and robust evaluation, but rarely grapple with the fact that different LLMs may embody systematically different epistemic and interactional biases.

In our glucose example, the user underscores that the conversation would likely unfold differently with other systems. This variability matters: if one model has a stronger tendency to medicalize, another to minimize concerns, and another to defer aggressively to guidelines, then the risks for patients and clinicians are model‑specific. A regulatory recommendation that is safe for one model may be unsafe for another.

Yet the pace of model development and the opacity of commercial systems make it unrealistic to craft fine‑grained positive recommendations for each. Any attempt to specify “this model can safely be used for X, Y, Z clinical tasks” is fragile and quickly obsolete.

5.3 Governance implications

The existence of a population of LLMs thus reinforces the need for governance strategies that are:

model‑agnostic where possible, focusing on structural limits rather than performance metrics;
negative rather than positive: defining what must not be done, rather than what might be done;
procedural rather than purely technical, emphasizing processes (e.g., consent, oversight, disclosure) over static assessments.

This supports a regulatory turn toward explicit not‑doing guidelines, which we develop next.

6. The case for “not‑doing” guidelines

6.1 Why positive recommendations are fragile

Positive recommendations—“LLMs can be used for X in health”—are attractive. They suggest progress and constructive integration. However, in the context of rapidly evolving, heterogeneous models, they are precarious:

Performance volatility. Updates to training data, architecture, or alignment can drastically change behavior, invalidating prior evaluations.
Context dependence. Performance in controlled benchmarks often fails to predict behavior in messy, real‑world interactions with stressed or skeptical users.
Normative ambiguity. “Good enough” performance for retrospective guideline summarization may still be ethically unacceptable when applied to fragile patient populations or contentious diagnoses.

Our glucose dialogue exemplifies this. Even assuming numerically correct interpretation, the drift toward pathologizing and the failure to engage with the user’s broader critique make the model problematic as a clinical tool. Evaluations focusing solely on guideline correctness would miss harm done at the level of meaning, autonomy, and trust.

6.2 Defining not‑doing categories

In contrast, not‑doing guidelines focus on tasks that LLMs should not perform at all in relation to health, regardless of apparent competence. At least the following belong on this list:

No autonomous diagnosis. LLMs must not be used to determine whether an individual has or does not have a specific disease (including preclinical labels like “prediabetes”), nor to provide definitive reassurance about the absence of disease.
No urgent triage. They must not guide decisions about whether a person should seek emergency care, stay home, or delay consultation, based solely on self‑reported symptoms or readings.
No prescription or dose adjustments. They must not recommend starting, stopping, or changing medications, nor suggest unsupervised alternatives to prescribed treatments.
No definitive interpretation of individual tests or measurements. They must not act as the primary interpreter of a single lab result, imaging report, or home measurement in a way that could influence clinical decisions or self‑management without professional oversight.
No substitution for the clinical encounter. They must not be deployed as stand‑alone replacements for the relational, dialogical meeting between patient and clinician, especially where diagnosis or major management decisions are at stake.

These prohibitions are grounded not only in current performance limitations but in structural mismatches: lack of embodied experience, reliance on textual patterns, inability to fully understand personal context, and susceptibility to normative drift and overmedicalization.

6.3 Stability and universality of not‑doing

The advantage of not‑doing guidelines is that they can remain valid even as models improve. Suppose future LLMs achieve near‑perfect numerical accuracy. They would still lack embodiment, still be unable to experience the emotional and social ramifications of a diagnosis, and still reproduce dominant discourses. The reasons to prohibit them from making final diagnostic determinations would persist.

Moreover, not‑doing guidelines are relatively independent of model identity. Whether one uses one system or another, the prohibition on using them for autonomous diagnosis or urgent triage can be the same. This model‑agnosticism is a crucial practical virtue in a landscape where models proliferate and evolve faster than regulators can track.

7. A narrow, auxiliary role: LLMs as tools of reflection

7.1 From clinical agent to epistemic mirror

If LLMs are structurally unsuited to serve as diagnostic or preventive agents, is there any role for them in health? The glucose dialogue suggests a modest but potentially valuable function: LLMs can act as epistemic mirrors that help both patients and clinicians reflect on the nature of medical reasoning itself.

In our case, the user ended up using the LLM less to “find out” whether they were at risk and more to expose how the model reflexively adopted guideline‑driven, binary logic. The interaction became a live demonstration of how medicine, when filtered through texts, gravitates toward discretization, thresholds, and population‑level abstractions. The LLM inadvertently revealed the limits of the clinical epistemology it encoded.

This reflective role is fundamentally different from clinical decision‑making. It does not ask the LLM to decide what should be done but to make explicit the patterns, assumptions, and tensions that shape medical knowledge. It treats the model not as a surrogate doctor but as a tool for critical thinking about medicine.

7.2 Conditions for a safe reflective use

Even this narrow, reflective role requires strong safeguards:

Explicit framing. Use of LLMs should be clearly framed as exploratory and non‑directive, aimed at understanding concepts, not at deciding actions.
Human orchestration. A clinician who understands both medicine and LLM limitations should mediate the interaction, preventing drift into implicit clinical advice.
Transparency and consent. Patients should know which model is being used, what kinds of biases it may have, and have the right to refuse its presence in their care processes.
Reversibility. No decision about tests, treatments, or labels should hinge solely on LLM‑mediated reasoning; all such decisions must remain revisable through human deliberation.

Within these constraints, LLMs might support medical education, help patients grasp the conventional nature of cut‑offs and risk categories, or act as structured prompts that encourage clinicians to interrogate their own reliance on guidelines and metrics.

7.3 Avoiding re‑medicalization via reflection

There is a risk that even reflective uses could re‑medicalize everyday life by saturating personal reflection with clinical concepts and risk framings. The glucose dialogue shows that users may be far more interested in questioning these framings than in internalizing them. LLM‑based tools must therefore be designed not only to explain medicine but also to leave room for non‑medical interpretations of bodily experience and for conscious refusal of certain labels or interventions.

A genuinely reflective use of LLMs in health would have to support, rather than undermine, a person’s capacity to say “no conclusion” and to live with ambiguity. This is perhaps the most difficult design challenge, given that LLMs are, by nature, conclusion‑generating systems.

8. Implications for training and regulation

8.1 Training health professionals to use (and refuse) LLMs

If LLMs are to appear at all in healthcare environments, clinicians will need training that goes beyond technical “how‑to” guidance. The glucose conversation suggests at least four competencies:

Epistemic literacy about LLMs. Clinicians should understand that LLMs are probabilistic text generators trained on past discourse, not knowledge‑bearing agents or diagnostic devices, and be able to explain this clearly to patients.
Discursive orchestration. Clinicians should learn how to orchestrate LLM outputs within a consultation: using them to clarify concepts or enumerate standard options, while explicitly preventing the model from making or appearing to make clinical decisions.
Recognition of overmedicalization dynamics. Training should help clinicians recognize when LLM use risks amplifying overdiagnosis, risk inflation, or unnecessary medicalization—especially in ambiguous, pre‑clinical states.
Respect for patient epistemologies. Clinicians should be equipped to engage with patients who bring their own critiques of medicalization, statistics, and AI, treating these not as “non‑compliance” but as legitimate inputs that shape whether and how LLMs are used.

These competencies require interdisciplinary input—from clinical medicine, medical humanities, science and technology studies, and AI ethics—and cannot be reduced to a short “AI in healthcare” module.

8.2 Training for patients and citizens

Patients and citizens are already using LLMs to interpret lab results, symptoms, and home measurements. Educational initiatives therefore need to:

Demystify LLMs. Explain, in accessible language, that LLMs do not “know” their health and that outputs are neither diagnoses nor personalized medical recommendations.
Highlight not‑doing zones. Provide simple warnings that individuals should not use LLMs to decide whether they have a disease, whether to go to the emergency department, or how to change prescribed treatments.
Promote reflective rather than directive use. Encourage people to treat outputs as prompts for questions to their clinicians, not as answers.

The glucose conversation itself could be anonymized and used in educational settings as a case study illustrating how quickly a seemingly neutral inquiry can slide into pathologizing territory, and how users can push back.

8.3 Regulatory focus on not‑doing

Regulators face an uphill battle in keeping pace with LLM development. Our analysis supports a strategy that prioritizes clear, enforceable “not‑doing” provisions over speculative endorsement of clinical applications. Regulations should:

Prohibit the use of general‑purpose LLMs as autonomous tools for diagnosis, urgent triage, prescription, or interpretation of individual test results for clinical decision‑making.
Require that any integration of LLMs into clinical workflows document human oversight mechanisms and how clinicians remain the ultimate decision‑makers.
Mandate transparency about the specific model in use (identity, version, known limitations) and about logging and secondary use of outputs.
Support independent evaluation of model behavior in realistic conversational scenarios—not only benchmark accuracy—before approving any patient‑facing deployment.

Approval for one model or configuration should not automatically extend to others, even from the same vendor. Each model’s “algorithmic personality” and failure modes must be considered.

8.4 Institutional responsibilities

Healthcare institutions considering LLM use have responsibilities beyond compliance:

Governance structures. Establish committees including clinicians, patients, ethicists, and technical experts to oversee LLM adoption and monitor unintended consequences.
Default opt‑out. Guarantee patients the right to refuse participation in LLM‑mediated processes without negative consequences for access to care.
Continuous monitoring. Monitor model behavior in situ, collect reports of problematic interactions, and be ready to suspend or modify use as needed.

Some of the most serious issues will not appear as spectacular failures but as subtle drifts: cumulative shifts toward medicalization, reassurance, or alarm that reflect the model’s training rather than the patient’s needs.

8.5 Research priorities

The case study points to several research priorities:

Comparative conversational studies. Systematic comparisons of how different LLMs handle the same health‑related scenarios to substantiate the notion of “algorithmic personalities” and inform safeguards.
Longitudinal ethnographies of use. Qualitative studies of how patients and clinicians actually incorporate LLMs into practice, revealing emergent norms and adaptations.
Design of reflective tools. Interdisciplinary work on LLM‑mediated tools that support critical reflection without sliding into implicit clinical authority.
Impact on medicalization. Empirical assessment of whether LLM exposure increases, decreases, or redistributes overdiagnosis and medicalization.

These efforts should be guided by the precautionary principle: absent strong evidence that LLM‑mediated interventions reduce harm and respect autonomy, their role in clinical decision‑making should remain strictly constrained by not‑doing guidelines.

9. Conclusion

The nocturnal glucose conversation is, on its surface, about three numbers and a home glucometer. At a deeper level, it is about the collision between a living person’s dynamic, narrative understanding of health and an LLM’s static, guideline‑driven, binary representation of disease risk.

From this collision we draw three main lessons. First, LLMs are structurally misaligned with diagnostic and preventive tasks: training on textual corpora, bias toward discretization and closure, and lack of embodied, contextual understanding make them hazardous as sources of clinical decisions, even when numerically accurate. Second, we are dealing with a population of LLMs, each with its own algorithmic personality, making generic “best use” guidelines unsafe and fragile. Third, not‑doing guidelines—clear prohibitions on diagnosis, urgent triage, prescription, and definitive interpretation of individual measurements—offer a realistic foundation for protecting patients and clinicians in a rapidly changing landscape.

Within these prohibitions, a narrow, supervised role for LLMs as epistemic mirrors may still be defensible: not as new diagnostic authorities, but as tools that help reveal how contingent, negotiated, and narrative our notions of “health,” “risk,” and “normality” actually are.