MoreRSS

site iconHackerNoonModify

We are an open and international community of 45,000+ contributing writers publishing stories and expertise for 4+ million curious and insightful monthly readers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of HackerNoon

这款AI工具将400个非正式名称转化为准确的OMOP代码

2026-02-11 06:24:51

Table of Links

  1. Abstract and Introduction

  2. System Architecture

    2.1 Access via UI or HTTP

    2.1.1 GUI

    2.2 Input

    2.3 Natural Language Processing Pipeline — The Llettuce API

    2.3.1 Vector search

    2.3.2 LLM

    2.3.3 Concept Matches

    2.4 Output

  3. Case Study: Medication Dataset

    3.1 Data Description

    3.2 Experimental Design

    3.3 Results

    3.3.1 Comparison between vector search and Usagi

    3.3.2 Comparison with GPT-3

    3.4 Conclusions & Acknowledgement

    3.5 References

    \

3. Case Study: Medication Dataset

Medication data were obtained from the Health for Life in Singapore (HELIOS) study (IRB approval by Nanyang Technological University: IRB-2016-11-030), a phenotyped longitudinal population cohort study comprising 10,004 multi-ethnic Asian population of Singapore aged 30-85 years (Wang et al., 2024). Participants in the HELIOS study were recruited from the Singapore general population between 2018 and 2022 and underwent extensive clinical, behavioural, molecular and genetic characterisation. With rich baseline data and long-term follow-up through linkage to national health data, the HELIOS study provides a unique and world class resource for biomedical researchers across a wide range of disciplines to understand the aetiology and pathogenesis of diverse disease outcomes in Asia, with potential to improve health and advance healthcare for Asian populations.

\ To facilitate scalable and collaborative research, the HELIOS study implements the OMOP-CDM. However, mapping medication data to OMOP concepts poses significant challenges, primarily due to the complexities involved in standardising medication names. In the HELIOS study, medication data were self-reported and manually entered via nurse-administered questionnaires, therefore, medications with brand name, abbreviations, typographic misspellings or phonetic errors, or combined medications could be recorded. All of these sources of imprecision make mapping to a controlled medical vocabulary more difficult and require significant manual data cleaning.

3.1 Data Description

The first 400 examples from the medication dataset were selected for our experiments and comparison. For each instance, the best OMOP concept, as well as a broader set of concepts which could match the informal name were compiled by human annotation.

\ For example, for “Memantine HCl”, the best OMOP concept is “memantine hydrochloride”, although “memantine” is another acceptable answer. For a branded medication, the concept representing the branded product is the most appropriate OMOP concept. The generic ingredient names can be included in a broader set of acceptable concepts, provided all the ingredients are listed within the concept. For example, for “cocodamol capsule”, “Acetaminophen / Codeine Oral Capsule [Co-codamol]” would be the best match, but “acetaminophen/codeine” would be accepted as a broader definition. This also further 297 illustrates the challenges with mapping and the potential uncertainties that the problem presents.

\ Of the 400 examples, 25 were graded as “Not Parsable”. These were either formulations containing several ingredients where the formulation has no concept in the OMOP CDM, e.g. “lipesco”, which contains lipoic acid and four vitamins and is not in the OMOP CDM; or where the name could not be resolved, e.g. “Hollister (gout)”.

3.2 Experimental Design

The data instances were run through the vector search and LLM portions of the pipeline and compared with the human annotations. The top 5 results from the vector search were used. Responses were assessed by:

\

  1. Whether the input is an exact match to an OMOP concept

    \

  2. Whether the correct OMOP concept is in the result of the vector search

    \

  3. Whether the LLM provides the correct answer

    \

  4. If the answer was incorrect, whether it is a relevant OMOP concept

\ The same examples were used as input for Usagi and vector search. For each example and both methods, the top 5 results were taken and each response was classified by whether the correct mapping or a relevant mapping was found.

3.3 Results

Table 1 describes the results of comparing Usagi with Llettuce’s vector search. The number of results with at least one relevant concept in the top 5 was very similar between the methods (68% for both). However, Llettuce outperformed Usagi in returning the correct concept in the top 5 (44% for Usagi, 54% for Llettuce).

\ 3.3.1 Comparison between vector search and Usagi

\ Table 1: Comparison of Usagi and Llettuce results

\ Table 2: The top five results searching Usagi for “Nasonex (for each nostril)”

\ Usagi performs well when used to find concepts where the input has a typographical error. Its shortcomings can be illustrated by how it responds to various descriptions of the mometasone furoate nasal spray, “nasonex”. In the examples, dosage information, such as “Nasonex (for each nostril)” produces the output shown in Table 2 for the top five results.

\ 3.3.2 Comparison with GPT-3

\ Of the 336 examples where the input was parsable into an OMOP concept, and the input was not an exact match to an OMOP concept, Llettuce could correctly identify 193, or 48.25%. GPT-3 could correctly identify 57.75%. Both provided inexact but matching concepts, 44 (11%) for Llettuce and 67 (16.75%) for GPT-3. The top 5 vector matches 329 retrieved the correct concept for 21 of the 99 inputs incorrectly answered by Llettuce. 232 informal names could be directly mapped onto the best available OMOP concept (if

\ Figure 2: Sankey diagram of outputs from the LLettuce NLP pipeline

\ Table 3: Outputs from the LLettuce NLP pipeline

\ exact matches are included). Of the remaining concepts, 78 had no output that neither included the correct concept nor produced a relevant OMOP concept. Llettuce’s pipeline does not perform as well as GPT-3, which is only absolutely incorrect on 38 names. However, it achieves this run locally on consumer hardware, using a much smaller model and preserving confidentiality.

\ The time taken to run the Llettuce pipeline on 400 concepts was 55 minutes, 15 seconds, using a 2.8GHz quad-core Intel i7 CPU, 16 Gb RAM. The median time to run inference was 8.7 seconds.

\ Figure 3: Comparison of results between GPT-3 and Llettuce

\ Figure 4: Inference times (run on macOS, 2.8GHz quad-core Intel i7, 16 Gb RAM)

\

3.4. Conclusions

Llettuce demonstrates the possibilities of using deep-learning approaches to map data to OMOP concepts. Combining vector search with a large language model results in comparable performance with the larger GPT-3 model. This shows that the advantages of neural-network based natural language processing can be leveraged to produce medical 344 encodings, even in a setting where confidentiality is essential.

\ The comparison with string matching methods is also informative. String matching cannot learn the salience of different parts of the string. In the example above, the part of the string "(for each nostril)", as it is longer, is treated as more important; the algorithm doesn’t know to ignore that part. By contrast, Llettuce’s vector search correctly includes Nasonex in almost all of its inputs, and correctly identifies the active ingredient. It should be noted that in this version of Llettuce only the RxNorm vocabulary was vectorised, where Usagi also used the RxNorm extension. This dataset is also one at which Usagi is relatively good, as it mostly involves extracting a single word, or correcting typographical errors. Anecdotally, Usagi performs worse on other tasks, where the input is longer and semantics are more important. This is where vector search is likely to perform far better. Crucially, an embedding model is trainable, where string comparison is not.

\ Optimisations will be possible in later versions. The models used for both embeddings and text generation are general purpose models (bge-small-en-v1.5 and Llama-3.1-8B respectively). Existing specialist models either fine-tuned or trained ab initio (Gu et al., 2020) on biomedical literature will be tested for performance on Llettuce tasks. Further development will come from fine-tuned models developed in-house. Our local deployment of Llettuce will implement data collection and record prompts and responses, alongside the final mapping made. This data will be used to fine-tune the models used. It’s important to363 emphasise that this data collection will be strictly limited to our specific local deployment of the tool. The publicly available version will not collect any user data or interactions, 365 maintaining the confidentiality and privacy of health information processed by other users.

Funding

This research was funded by the NIHR Nottingham Biomedical Research Centre.

Data Availability

Data access requests can be submitted to the HELIOS Data Access Committee by emailing [email protected] for details.

Acknowledgments

The authors thank those people or institutions that have helped you in the preparation of the manuscript.

3.5 References

Appleby, P., Masood, E., Milligan, G., Macdonald, C., Quinlan, P., & Cole, C. Carrot-cdm: An open-source tool for transforming data for federated discovery in health research [Research Software Engineering Conference 2023, RSECON23 ; Conference date: 04-09-2023 Through 07-09-2023]. English. In: In Carrot-cdm: An open-source tool for transforming data for federated discovery in health research. Research Software Engineering Conference 2023, RSECON23 ; Conference date: 04-09-2023 Through 07-09-2023. 2023, September. https://doi.org/10.5281/zenodo.10707025

\ Bayer, M. (2012). Sqlalchemy (A. Brown & G. Wilson, Eds.). http://aosabook.org/en/sqlalchemy.html

\ Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A Neural Probabilistic Language Model. Journal of Machine Learning Research, 3.

\ Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., . . . Amodei, D. (2020, July 22). Language Models are Few-Shot Learners. arXiv: 2005.14165 [cs]. https://doi.org/10.48550/arXiv.2005.14165

\ Cholan, R. A., Pappas, G., Rehwoldt, G., Sills, A. K., Korte, E. D., Appleton, I. K., Scott, N. M., Rubinstein, W. S., Brenner, S. A., Merrick, R., Hadden, W. C., Campbell, K. E., & Waters, M. S. (2022). Encoding laboratory testing data: Case studies of the national implementation of hhs requirements and related standards in five laboratories. Journal of the American Medical Informatics Association, 29(8), 1372–1380. https://doi.org/10.1093/jamia/ocac072

\ Cox, S., Masood, E., Panagi, V., Macdonald, C., Milligan, G., Horban, S., Santos, R., Hall, C., Lea, D., Tarr, S., Mumtaz, S., Akashili, E., Rae, A., Cole, C., Sheikh, A., Jefferson, E., & Quinlan, P. R. (2024). Improving the quality, speed and transparency of curating data to the observational medical outcomes partnership (OMOP) common data model using the carrot tool. JMIR Preprints. https://doi.org/10.2196/preprints.60917

\ deepset GmbH. (2024). Haystack: Neural question answering at scale [Accessed: 16-08-2024].

\ Deng, H., Zhou, Q., Zhang, Z., Zhou, T., Lin, X., Xia, Y., Fan, L., & Liu, S. (2024). The current status and prospects of large language models in medical application and research. Chinese Journal of Academic Radiology. https://doi.org/10.1007/s42058- 024-00164-x

\ Dettmers, T., & Zettlemoyer, L. (2023, February 27). The case for 4-bit precision: K-bit Inference Scaling Laws. arXiv: 2212.09720 [cs]. https://doi.org/10.48550/arXiv.2212.09720

\ Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, May 24). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv: 1810.04805 [cs]. https://doi.org/10.48550/arXiv.1810.04805

\ Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., Rao, A., Zhang, A., . . . Zhao, Z. (2024, August 15). The Llama 3 Herd of Models. arXiv: 2407.21783 [cs]. https://doi.org/10.48550/arXiv.2407.21783

\ F., H., J., H., K., T., A., H., M.J., M., T.W.R., B., J., Y., J., D., A., W., S., E.-J., & W.K., G. (2022). Data consistency in the english hospital episodes statistics database. BMJ Health Care Inform, 29(1), e100633. https://doi.org/10.1136/bmjhci-2022-100633

\ Gerganov, G., et al. (2024). Llama.cpp [Accessed: 19-08-2024]. https://github.com/ggerganov/llama.cpp

\ Gu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., Naumann, T., Gao, J., & Poon, H. (2020). Domain-specific language model pretraining for biomedical natural language processing.

\ Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2017, December 15). Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. arXiv: 1712 . 05877 [cs, stat]. https://doi.org/10.48550/arXiv.1712.05877

\ Kong, A., Zhao, S., Chen, H., Li, Q., Qin, Y., Sun, R., Zhou, X., Wang, E., & Dong, X. (2024, March). Better Zero-Shot Reasoning with Role-Play Prompting.

\ Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., & Kiela, D. (2021, April 12). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv: 2005. 11401 [cs]. https://doi.org/10.48550/arXiv.2005.11401

\ Meta-llama/llama-recipes. (2024, August 19). Retrieved August 19, 2024, from https://github.com/meta-llama/llama-recipes

\ Nazi, Z. A., & Peng, W. (2024). Large language models in healthcare and medical domain: A review. Informatics, 11(3), 57. https://doi.org/10.3390/informatics11030057

\ OHDSI. (2021). (observational health data sciences and informatics), Usagi documentation [Accessed: 13-08-2024]. https://ohdsi.github.io/Usagi/

\ OHDSI. (2024a). Athena: Standardized vocabularies [Accessed: 16-08-2024]. https://athena.ohdsi.org/search-terms/start

\ OHDSI. (2024b). Data standardization [Accessed: September 2024]. https://www.ohdsi.org/data-standardization/

\ OpenAI. (2024). Chatgpt: Language model [Accessed: 2024-08-16]. https://chat.openai.com/

\ Qdrant/fastembed. (2024, August 19). Retrieved August 19, 2024, from https://github.com/qdrant/fastembed

\ Ramírez, S. (2024). Fastapi [Accessed: 19-08-2024]. https://fastapi.tiangolo.com

\ Reimers, N., & Gurevych, I. (2019, August 27). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv: 1908. 10084 [cs]. https://doi.org/10.48550/arXiv.1908.10084

\ Streamlit. (2024). Streamlit: The fastest way to build and share data apps [Accessed: 16-08-2024]. https://streamlit.io

\ Wang, X., Mina, T., Sadhu, N., Jain, P. R., Ng, H. K., Low, D. Y., Tay, D., Tong, T. Y. Y., Choo, W.-L., Kerk, S. K., Low, G. L., Team, T. H. S., Lam, B. C. C., Dalan, R., Wanseicheong, G., Yew, Y. W., Leow, E.-J., Brage, S., Michelotti, G. A., . . . Chambers, J. C. (2024, May 24). The Health for Life in Singapore (HELIOS) Study: Delivering Precision Medicine research for Asian populations. https://doi.org/10.1101/2024.05.14.24307259

\ Wilkinson, M., Dumontier, M., Aalbersberg, I., et al. (2016). The fair guiding principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18

\

:::info Authors:

(1) James Mitchell-White, Centre for Health Informatics, School of Medicine, The University of Nottingham, Digital Research Service, The University of Nottingham, and NIHR Nottingham Biomedical Research Centre;

(2) Reza Omdivar, Digital Research Service, The University of Nottingham, and NIHR Nottingham Biomedical Research Centre;

(3) Esmond Urwin, Centre for Health Informatics, School of Medicine, The University of Nottingham and NIHR Nottingham Biomedical Research Centre;

(4) Karthikeyan Sivakumar, Digital Research Service, The University of Nottingham;

(5) Ruizhe Li, NIHR Nottingham Biomedical Research Centre and School of Computer Science, The University of Nottingham;

(6) Andy Rae, Centre for Health Informatics, School of Medicine, The University of Nottingham;

(7) Xiaoyan Wang, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore;

(8) Theresia Mina, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore;

(9) John Chambers, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore and Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, United Kingdom;

(10) Grazziela Figueredo, Centre for Health Informatics, School of Medicine, The University of Nottingham and NIHR Nottingham Biomedical Research Centre;

(11) Philip R Quinlan, Centre for Health Informatics, School of Medicine, The University of Nottingham.

:::

:::info This paper is available on arxiv under CC BY 4.0 license.

:::

\

Llettuce:将混乱的医疗记录映射为标准代码的人工智能工具

2026-02-11 06:20:25

Table of Links

  1. Abstract and Introduction

  2. System Architecture

    2.1 Access via UI or HTTP

    2.1.1 GUI

    2.2 Input

    2.3 Natural Language Processing Pipeline — The Llettuce API

    2.3.1 Vector search

    2.3.2 LLM

    2.3.3 Concept Matches

    2.4 Output

  3. Case Study: Medication Dataset

    3.1 Data Description

    3.2 Experimental Design

    3.3 Results

    3.3.1 Comparison between vector search and Usagi

    3.3.2 Comparison with GPT-3

    3.4 Conclusions & Acknowledgement

    3.5 References

\

2. System Architecture

Llettuce was written using Python for both a back-end and a user interface. The back-end comprises a vector store for semantic search, a local LLM to suggest mappings, and queries to a connected OMOP CDM database to provide the details of a suggested mapping to the user.

\ Figure 1: Natural language processing architecture pipeline

2.1 Access via UI or HTTP

All interactions with Llettuce are made via HTTP requests to the Llettuce API. A POST request is made to the Llettuce server containing JSON with the following format:

\

\ The pipeline options are optional, so a request can be just a list of informal names.

\ 2.1.1 GUI

For users less comfortable with the command line, a graphical user interface (GUI) is provided. This is built using the Streamlit (Streamlit, 2024) Python framework, and presents the user with two options. The first shows a text box where a user can type a comma-separated list of informal names to run through the pipeline. The second allows a user to upload a .csv file and choose a column containing names to run through the pipeline. For either option, the results of the pipeline are shown below the input.

2.2 Input

Llettuce uses FastAPI (Ramírez, 2024) to serve API endpoints. These endpoints allow different combinations of Llettuce modules to be used to find concept names.

2.3 Natural Language Processing Pipeline — The Llettuce API

2.3.1 Vector search

LLMs are good generalists for text generation. However, the OMOP CDM has a specialist vocubulary. To bridge this gap, Llettuce uses embeddings of OMOP concepts for semantic search. It has been demonstrated that encoder-only transformers produce representations of language such that semantically related concepts are close in the space of their embeddings (Bengio et al., 2003; Devlin et al., 2019). Llettuce uses sentence embeddings (Reimers & Gurevych, 2019) to generate embeddings for concepts. These are stored locally in a vector database. FastEmbed (‘Qdrant/Fastembed’, 2023, July 14/The embedded concept is compared with the stored vectors and the k embeddings with the highest dot-product with the query are retrieved from the database. Models used for embeddings are much smaller than LLMs, so generating an embedding demands less computational resources. Retrieval from the vector database also provides a score for how close the embeddings are to the query vector, so that a perfect match has a score of 1.

The top k embeddings are used for retrieval-augmented generation (RAG) (Lewis et.al., 2021). In RAG mode, the pipeline first queries the vector database. A threshold on the similarity of the embeddings has been set such that embeddings above this are exact matches. If there is an embedding with a score above this threshold, these are provided to the user. If there is no very close match, the embeddings are inserted into a prompt, which serves as input to the LLM, as discussed next. The rationale is that close embeddings may either contain the answer, which the LLM can select, or hints as to what might be close to the right answer.

2.3.2 LLM

Llettuce uses Llama.cpp (Gerganov et al., 2024) to run the latest Llama LLM from Meta (Dubey et al., 2024). Llama.cpp provides an API for running transformer inference. It detects the available hardware and uses optimisations for efficient inference running on central processing units (CPUs).

\ The version of Llama tested in our case study has 8 billion parameters (Llama 3.1 8B). Models are trained with each parameter as a 32-bit floating point number. Quantisation was employed to reduce the size of the model, with little loss of accuracy (Dettmers & Zettlemoyer, 2023; Jacob et al., 2017). The full precision Llama 3.1 8B requires over 32 Gb RAM to run, whereas the 4-bit quantised model requires less than 5 Gb. Most consumer laptops are therefore able to keep the model in memory

\ Llettuce uses Haystack (deepset GmbH, 2024) to orchestrate its LLM pipelines. For LLM-only pipelines there is a component that takes an informal name and inserts it into a prompt template, and another which delivers this prompt to the LLM. The prompt used in these cases uses techniques recommended for Llama models (‘Meta-Llama/Llama-Recipes’, 2023, July 17/2024).

\ For example, the prompt for RAG contains detailed, explicit instructions

\ You are an assistant that suggests formal RxNorm names for a medication. You will be given the name of a medication, along with some possibly related RxNorm terms. If you do not think these terms are related, ignore them when making your suggestion.

\ Respond only with the formal name of the medication, without any extra explanation.

\ This part of the prompt also gives the LLM a role, which has been shown to improve consistency in responses(Kong et al., 2024). Importantly, the prompt includes providing examples of informal name/formal name pairs, an effective tactic for LLM prompting (Brown et al., 2020):

\

\ Once the LLM has inferred a formal name for the informal name provided, this formal name is used as a concept name in a parameterised OMOP CDM query.

\ 2.3.3 Concept Matches

To retrieve the details of any concept through a Llettuce pipeline, an OMOP CDM database is queried. OMOP CDM queries are generated using an object-relational mapping through SQLAlchemy (Bayer, 2012). The string used for the concept name field is first preprocessed by removing punctuation and stop words and splitting up the words with the pipe character for compatibility with PostgreSQL in-database text search. For example, the string “paracetamol and caffeine” has the stop-word “and” removed, and the remaining 196 words used to build the string “paracetamol | caffeine”. The words of this search term are used for a text search query against the concept names of the selected OMOP vocabularies. Optionally, concept synonyms can be included in this query. The retrieved concept names are then compared with the input by fuzzy string matching, and any names above the threshold are presented to the user.

\

2.4 Output

Llettuce pipelines emit JSON containing the results of the pipeline. For example, for a pipeline running the LLM and OMOP query, an input request as follows:

\ {"names": ["Betnovate Scalp Application"]}

\ returns JSON output for the “events” llmoutput and omopoutput for each name sent to Llettuce.

\ The llmoutput contains a reply of the LLM’s response, the informalname supplied in the request, and meta describing metadata about the LLM’s run.

\ \

\ \ The omopoutput event contains the searchterm sent to the OMOP-CDM, then a CONCEPT array, where each item is a match meeting the threshold set on fuzzy string matching. Each item describes the conceptname, conceptid, vocabularyid, and conceptcode from the OMOP-CDM’s concept table, followed by the conceptnamesimilarity_score calculated in Llettuce. Llettuce also has the option to fetch further information from the OMOP-CDM, not enabled in the default configuration, which is included in the entry for each concept.

\ \

The GUI parses this and displays it in a more user-friendly format.

\

:::info Authors:

(1) James Mitchell-White, Centre for Health Informatics, School of Medicine, The University of Nottingham, Digital Research Service, The University of Nottingham, and NIHR Nottingham Biomedical Research Centre;

(2) Reza Omdivar, Digital Research Service, The University of Nottingham, and NIHR Nottingham Biomedical Research Centre;

(3) Esmond Urwin, Centre for Health Informatics, School of Medicine, The University of Nottingham and NIHR Nottingham Biomedical Research Centre;

(4) Karthikeyan Sivakumar, Digital Research Service, The University of Nottingham;

(5) Ruizhe Li, NIHR Nottingham Biomedical Research Centre and School of Computer Science, The University of Nottingham;

(6) Andy Rae, Centre for Health Informatics, School of Medicine, The University of Nottingham;

(7) Xiaoyan Wang, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore;

(8) Theresia Mina, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore;

(9) John Chambers, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore and Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, United Kingdom;

(10) Grazziela Figueredo, Centre for Health Informatics, School of Medicine, The University of Nottingham and NIHR Nottingham Biomedical Research Centre;

(11) Philip R Quinlan, Centre for Health Informatics, School of Medicine, The University of Nottingham.

:::


:::info This paper is available on arxiv under CC BY 4.0 license.

:::

\

这款开源人工智能如何简化医疗数据映射

2026-02-11 06:05:11

:::info Authors:

(1) James Mitchell-White, Centre for Health Informatics, School of Medicine, The University of Nottingham, Digital Research Service, The University of Nottingham, and NIHR Nottingham Biomedical Research Centre;

(2) Reza Omdivar, Digital Research Service, The University of Nottingham, and NIHR Nottingham Biomedical Research Centre;

(3) Esmond Urwin, Centre for Health Informatics, School of Medicine, The University of Nottingham and NIHR Nottingham Biomedical Research Centre;

(4) Karthikeyan Sivakumar, Digital Research Service, The University of Nottingham;

(5) Ruizhe Li, NIHR Nottingham Biomedical Research Centre and School of Computer Science, The University of Nottingham;

(6) Andy Rae, Centre for Health Informatics, School of Medicine, The University of Nottingham;

(7) Xiaoyan Wang, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore;

(8) Theresia Mina, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore;

(9) John Chambers, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore and Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, United Kingdom;

(10) Grazziela Figueredo, Centre for Health Informatics, School of Medicine, The University of Nottingham and NIHR Nottingham Biomedical Research Centre;

(11) Philip R Quinlan, Centre for Health Informatics, School of Medicine, The University of Nottingham.

:::

Table of Links

  1. Abstract and Introduction

  2. System Architecture

    2.1 Access via UI or HTTP

    2.1.1 GUI

    2.2 Input

    2.3 Natural Language Processing Pipeline — The Llettuce API

    2.3.1 Vector search

    2.3.2 LLM

    2.3.3 Concept Matches

    2.4 Output

  3. Case Study: Medication Dataset

    3.1 Data Description

    3.2 Experimental Design

    3.3 Results

    3.3.1 Comparison between vector search and Usagi

    3.3.2 Comparison with GPT-3

    3.4 Conclusions & Acknowledgement

    3.5 References

    \

Abstract

This paper introduces Llettuce, an open-source tool designed to address the complexities of converting medical terms into OMOP standard concepts. Unlike existing solutions such as the Athena database search and Usagi, which struggle with semantic nuances and require substantial manual input, Llettuce leverages advanced natural language processing, including large language models and fuzzy matching, to automate and enhance the mapping process. Developed with a focus on GDPR compliance, Llettuce can be deployed locally, ensuring data protection while maintaining high performance in converting informal medical terms to standardised concepts.

Keywords: OMOP mapping, LLMs, healthcare data mapping, natural language processing in healthcare data

1. Introduction

The conversion of medical terms to Observable Medical Outcomes Partnership (OMOP) (OHDSI, 2024b) standard concepts is an important part of making data findable, accessible, interoperable, and reusable (FAIR)(Wilkinson et al., 2016). Unified data standards are often applied inconsistently across healthcare systems (Cholan et al., 2022; F. et al., 2022), and standardising to a common data model (CDM), such as OMOP is fundamental in enabling robust research pipelines for cohort discovery, and ensuring reliable and reproducible evidence. The process of converting data to OMOP, however, is complex, and not only requires knowledge of the specific domain of the data, but often collaboration from data engineers, software engineers, and healthcare professionals.

In previous work, we developed Carrot Mapper (Cox et al., 2024), and Carrot-CDM 36 (Appleby et al., 2023) to support the OMOP conversion process. Tooling within this space still requires manual intervention to approve or create mappings, where a data engineer needs to find the most suitable codification to a term. Solutions that help finding codifications include searches in the Observational Health Data Sciences and Informatics (OHDSI) Athena database (OHDSI, 2024a), or string matching using tools, such as Usagi (OHDSI, 2021).

The Athena website is a platform for searching and exploring various medical terminologies, vocabularies, and concepts in healthcare research. Users can search for specific terms, view their relationships, and explore detailed metadata. Using Athena search at scale, however, is complicated. When conducting extensive searches, researchers face challenges, including the complexity and overlap of medical vocabularies, the overwhelming volume of search results, and technical constraints, such as system performance and data 48 handling capabilities. Additionally, the standardisation of diverse healthcare datasets presents difficulties in ensuring consistency across different terminologies.

\ Usagi was developed by OHDSI to facilitate the mapping of source codes to standard concepts within the OMOP CDM. It supports the integration and harmonisation of diverse healthcare data sources. Usagi employs semi-automated string-matching algorithms to suggest potential mappings between local vocabularies and standardised terminologies such as SNOMED CT, LOINC, and RxNorm. It is a valuable tool for mapping, but it has a few limitations. While it automates part of the mapping process, it requires significant manual review, which is time-consuming and prone to human error and uncertainties. 7 String-matching can potentially lead to inaccurate mappings, particularly when dealing with ambiguous or complex terminologies. The effectiveness of Usagi depends on the quality of the standardised vocabularies it uses, and there is a learning curve for new users. As a standalone tool, Usagi does not yet integrate seamlessly with other data processing workflows, requiring additional steps to configure both input and output and thus ensure proper data standardisation. By contrast, novel tools can provide an application programming interface (API) for integration into mapping tools.

Both Athena and Usagi work well when dealing with data with typographical errors. But informal terms for medications or conditions may not closely match the string of the 66 formal concept we wish to map it to. For example, “Now Foods omega-3” is a supplement found in a self-reported patient questionnaire dataset. This supplement is produced by Now Foods, and is an omega-3 product derived from fish oil. In this case, the brand of the drug was given as input. Before obtaining the OMOP concept, we need to map the reported brand to “omega-3 fatty acids”, for which an exact OMOP match is found. Using the Athena search engine, for example, the string matching suggests concepts like “Calcium ascorbate 550 MG Oral Tablet by Now Foods”, “Ubiquinone MG Oral Capsule [Now 73 Coq10] by Now Foods” or “Calcium ascorbate 1000 MG Oral Tablet [Now Ester-C] by Now Foods”. This indicates that this process of matching loses the semantic information associated with the input data.

\ Large language models (LLMs) are now a relatively novel alternative to support OMOP. They automate portions of the mapping process while suggesting more semantically relevant mappings. The use of proprietary tools, such as OpenAI ChatGPT (OpenAI, 2024) in healthcare, however, raises significant concerns, particularly regarding GDPR compliance, data protection and reproducibility of results (Deng et al., 2024; Nazi & Peng, 2024). The handling of sensitive patient data poses risks, as inadvertent data leaks or misuse of information could occur. Ensuring that interactions with OpenAI and other available LLM APIs in the cloud remain within the bounds of GDPR is challenging, especially when dealing with identifiable health information.

In this paper we introduce Llettuce[1], a tool created to address these gaps. It is a standalone, open-source, adaptable natural language processing tool based on Large Language Models, querying systems and fuzzy matching for the conversion of medical terms into the OMOP standard vocabulary. This first version is released under the MIT Licence. Medical terms can be extracted from Electronic Health Records (EHRs), self-reported patient questionnaires and other structured datasets to serve as an input for Llettuce. So for the example above, the Llettuce match output for “Now Foods omega-3” is “Fish oil”.

Llettuce has the following modules and functionalities:

• Vector search for concept(s)

• LLM prompting with informal name(s)

• OMOP CDM database search

• A graphic user interface

We demonstrate how Llettuce works and its performance compared to Usagi and ChatGPT on a case study of converting self-reported informal medication names into OMOP concepts. Llettuce performance is comparable to OpenAI models and was developed to 101 run locally to support healthcare data governance requirements.

\

:::info This paper is available on arxiv under CC BY 4.0 license.

:::

*Corresponding author: [email protected]

\ [1] https://github.com/Health-Informatics-UoN/lettuce

代理程序误解用户意图的频率有多高——且这种误解是否可纠正?

2026-02-11 06:04:39

There’s a moment in every AI interaction that rarely gets examined, yet it determines whether the agent feels intelligent or quietly off. It happens before retrieval, before reasoning, before any answer appears.

\ It’s the moment the agent decides: “This is what you meant.”

\ And the diagnostic question that exposes this moment is deceptively simple:

How often does the agent misinterpret the user’s intent — and does the user get a chance to correct it?

\ This is the User Trust Probe. It doesn’t measure accuracy. It measures something earlier and more fragile:

Does the agent’s interpretation match the user’s meaning — and if not, is that interpretation correctable?

Where Misalignment Actually Begins

Every agent rewrites user queries internally. It has to — people speak in shorthand, fragments, and layered intentions.

\ But this normalization step is also where the agent quietly takes control of the conversation. If it gets the interpretation wrong, everything downstream is wrong, even if the retrieval is perfect.

\ And most systems never show this step to the user.

Three Patterns That Reveal the Problem

1. The Agent Answers a Narrower Question Than the User Asked

A user says, “Can you help me understand our Q4 results?”

\ The agent silently normalizes it into: “Summarize Q4 revenue.”

\ But maybe the user meant: – margins – expenses – customer churn – or simply “I’m confused by the earnings call.”

\ The agent answers confidently — but to the wrong question.

\ 2. The Agent Assumes Context the User Never Provided

A user asks: “What’s the status of the launch?”

\ The agent assumes: – which product – which region – which team.

\ It fills in gaps that the user didn’t specify. The answer is polished, structured, and completely misaligned.

\ 3. The User Has No Space to Say “That’s Not What I Meant.”

\ This is the real failure.

\ The agent never reveals its interpretation. The user never sees the rewrite. The misunderstanding stays hidden.

\ The user feels it immediately: “That’s not what I asked.”

\ Trust erodes not because the answer is wrong, but because the interpretation was never visible or correctable.

A Useful Contrast

If the glossary question was about alignment latency, this question is about interpretation recoverability.

\ One asks: How long does the agent stay wrong?

\ The other asks: Can the user pull the agent back when it drifts?

\ Together, they expose two different layers of trust.

Why Correctability Matters More Than Correctness

Most teams obsess over whether the agent’s answer is correct. But correctness is irrelevant if the agent answered the wrong question.

\ The real success criterion is:

Does the agent make its interpretation visible enough for the user to correct it?

\ If the agent shows its understanding, the user can say: – “Yes, that’s what I meant.” – or “No, that’s not it — here’s what I actually need.”

\ This tiny moment of correction prevents entire chains of misaligned reasoning.

\ It restores agency. It builds trust. It turns the agent into a collaborator, not a guesser.

\ Correctability becomes the emotional center of the interaction — the difference between feeling understood and feeling dismissed.

The Pattern Emerging Across Teams

Across agentic systems, one insight keeps resurfacing:

Users don’t just want correct answers — they want to be understood.

\ And the only way to guarantee that is to make the agent’s interpretation: – visible – correct – correctable

\ When that happens, trust grows naturally. When it doesn’t, even the best answers feel strangely hollow.

如何弥合技术规范与代理之间的鸿沟:MLOps编码技能

2026-02-11 05:53:27

The beauty of an Agent Skill lies in its simplicity. It is essentially a markdown file (SKILL.md) that functions as a context injection module. It gives the agent “muscle memory” for a specific  topic.

以太坊的粘性主导地位:为何DeFi仍沿维塔利克的轨道运行

2026-02-11 05:42:02

The gas fees aren’t what they used to be. Remember 2021, when a Uniswap swap cost more than a decent dinner in Bangkok? Now Ethereum hums along, cheap and steady, processing $15 billion daily in DeFi volume while pretenders fade into the noise. It’s not sexy, nobody’s writing odes to PeerDAS or blobspace, but the numbers whisper a truth: Ethereum isn’t just surviving the scaling wars. It’s winning them, quietly folding rivals into its orbit.

Walk into any Mumbai crypto co-working space or Singapore fund office, and the screens tell the story. Aave, Uniswap, Maker, $180 billion TVL, 62% of DeFi’s total, still live on Ethereum or its rollups. Solana pumps memes. Sui chases gaming. But when real money needs settlement, battle-tested security, and composability that doesn’t evaporate in a bear market? They come home to ETH.

The Modular Masterstroke

Ethereum’s secret isn’t one breakthrough. It’s the stack. Base layer slims down to consensus and data availability, PeerDAS live, blobs carrying 80% of rollup posts. Execution? Shoved to 50+ L2s, where fees dip under a penny and TPS rivals Visa’s peak hour. AggLayer dreams and shared sequencing loom, but even without them, the machine works.

Take last month’s $40 million Blast exploit. Blast went dark. Ethereum L1 chugged on. Aave V3 on Arbitrum absorbed the fallout without blinking. Rollups inherit the mothership’s security, 7+ years of attack surface seasoning, while running circles around monolithic L1s. Solana halts on spam. Ethereum’s ecosystem just routes around.

Vitalik’s been here before. His “amateur hour is over” riff from Devcon rings truer now than ever. ZK-EVMs mature (Taiko, zkSync hit production parity). Optimistic rollups shed training wheels. The trilemma doesn’t haunt anymore; it’s engineered away, layer by layer.

Composability: The Moat That Matters

Here’s the real magic, the bit TradFi envies and L1 tourists overlook. Ethereum’s DeFi isn’t silos. It’s Lego. Uniswap liquidity fuels Aave flash loans, which compound into Yearn strategies, which feed Pendle fixed yields, all settled on shared state, zero trust assumptions beyond the base layer.

Try that on Solana. DEXes work great until a network stutter strands your position mid-arbitrage. Cross-chain bridges? $2 billion lost since 2022. Ethereum’s rollups talk natively, intent-based bridging via Across, shared liquidity via Socket. Even Polygon’s $250 million Coinme bet leans on AggLayer to knit UPI ramps into this web.

The data dazzles. DeFiLlama pegs Ethereum ecosystem volume at 3.2x Solana’s monthly average. Stablecoin swaps? 78% ETH-denominated. Institutional flows (BlackRock BUIDL, Barclays Ubyx pilots)? They settle on Arbitrum or Optimism, not some unproven L1.

Network Effects Trump Hype Cycles

Ethereum’s other edge lives in people. 4,500 active devs monthly, double the field. Flashbots, Lido, the ERC-4337 bundle army, they iterate in public, Discord war rooms, turning theory into mainnet. Solana’s Rust wizards chase throughput. Ethereum’s army builds infrastructure.

Culture compounds it. “Ethereum-aligned” became shorthand for serious. Rollup teams beg for EF grants. VCs triage by blob compatibility. Even Saga’s chainlet halt, a $7 million DeFi drainer, pushed survivors toward Ethereum’s light-client proofs.

Bear markets test this. 2025’s macro wobbles saw $60 billion flee risk assets. Solana TVL halved. Ethereum’s core, AAVE, UNI, and MKR dipped 20%, then stabilized. Why? Battle scars. Ronin, Poly, Wormhole, Ethereum ate those Ls, patched, iterated. New L1s? Still proving the basics.

TradFi’s Grudging Nod

Wall Street notices. JPM’s Kinexys swaps settle on Arbitrum testnets. Fidelity tokenizes real estate on Base. Polygon’s U.S. ramps (Coinme, Sequence) feed USDC into Ethereum pools. Stablecoin “war”? More like convergence, Circle’s USDC processes 85% of its volume on Ethereum L2s.

India’s clampdown plays here too. Foreign CEX blocks push flows to compliant ramps, Polygon’s UPI bridges, now Ethereum-native. Delhi’s FIU portal will track wallets, sure. But the settlement layer? Ethereum’s blobspace, rollup daisy-chain, unstoppable.

The Quiet Empire

Zoom out from Mumbai’s flickering terminals, and Ethereum looks less like a chain, more like an OS. L2s as apps. Blobs as a filesystem. Account abstraction as the login flow. Competitors chase features, faster blocks, sexier VMs. Ethereum owns the standards.

Nobody cheers for plumbing. But when DeFi settles $10 trillion yearly, when tokenized T-bills yield 5.2% onchain, when your Mumbai freelancer swaps UPI rupees to USDC yield in 12 seconds flat, plumbing wins wars.

The screens glow steadily green. Gas charts flatline at 3 gwei. In Discord war rooms and Bangalore whiteboards, the mantra holds: “Ethereum solves coordination.” DeFi grows not because it’s flashy. It grows because Ethereum makes hard things possible, secure, composable, and forever settles at the bottom of the stack.

\n

\