Overview

The concept library is a system for storing, managing, sharing, and documenting clinical code lists in health research. The specific goals of this work are:

Store code lists along with metadata that captures important information about quality, author, etc.
Store version history and provide a way to unambiguously reference a particular version of a code list.
Allow programmatic interaction with code lists via an API, so that they can be directly used in queries, statistical scripts, etc.
Provide a mechanism for sharing code lists between projects and organizations.

Why use this tool?

A significant aspect of research using routinely collected health records is defining how concepts of interest (including conditions, treatments, symptoms, etc.) will be measured. This typically involves identifying sets of clinical codes that map to a variable that the researcher wants to measure, and sometimes a set of rules as well (e.g. a sufferer from a disease may be defined as someone who has a diagnosis code from list A and a medication from list B, but excluding anyone who has a code from list C). A large part of the analysis work may involve consulting clinicians, investigating the data, and creating and testing definitions of clinical concepts to be used.

Often the definitions that are created are of interest to researchers for many studies, but there are barriers to easily sharing them. The definitions may be embedded within study-specific scripts, such that it is not easy to extract the part that may be of general interest. Also, often researchers do not fully document how a concept was created, its precise meaning, limitations, etc. Crucial information may be lost when passing it to other researchers, resulting in mistakes. Often there simply is no mechanism to discover and share work that has been done previously, leading researchers to waste time and resources reinventing the wheel. In theory, when research is published, information on the precise methods used should be included, but in reality this is often inadequate.

Our goal is to create a system that describes research study designs in a machine-readable format to facilitate rapid study development; higher quality research; easier replication; and sharing of methods between researchers, institutions, and countries.

Concepts vs. Working Sets

A "concept" is the definition of a single entity that will be used in a research project. It may be a disease ("type 2 diabetes"), a treatment ("metformin"), a test result ("HbA1c"), or anything else that may be defined within the data. The definition is typically tied to the data source, so different coding systems would need different definitions (diabetes in primary care and diabetes in a hospital setting would be two separate concepts, defined in Read codes and ICD10 codes, respectively). In addition, different concepts may be created for different purposes. One researcher may want to tightly define diabetes with high specificity, while another might want to capture everyone with possible diabetes; these would be represented as two different concepts. Different users may have different definitions, simply because the correct definition is a matter of opinion. Creating multiple concepts to define the same thing is not a problem (though of course it is best to use a single, shared definition, unless there is a good reason not to). Currently, the system stores simple concepts, which can simply be defined as a set of clinical codes. It will be further developed in the future to allow more complex definitions, such as rule-based algorithms, to be stored and shared.

Within the process of data preparation and analysis, researchers will often want to use related concepts together (for example, a list of comorbidities, or a set of medication categories of interest). An entity called a working set has been created to allow users to group related concepts and use or reference them together. To define a study's comorbidities within this system, a user would first either identify an existing concept or create a new concept for each relevant comorbidity. Once this was done, a working set called "[My study name] comorbidities" could be created, and each relevant concept added. Working sets can also have attributes: arbitrary fields that can be added, for which a value can be entered for each concept in the set. This allows flexibly capturing some information about each concept within the set. Some example uses cases for attributes:

A database column name
Category weights (for a working set that stored Charlson comorbidity categories).
Minimum and maximum valid values (for a working set of lab test results).
Categorisation (for example, within an injury working set, a flag for whether each concept is a major or minor injury).

All concepts within a working set can be referenced as a unit, and included within analysis as a unit via the API.

Concept Library and the SAIL Gateway

The concept library is available both inside and outside the SAIL secure environment. The version found inside the SAIL gateway is read only in order to comply with SAIL governance that requires all outputs from the gateway to be manually reviewed. (If a user could create or edit content inside the gateway, then view it outside, this would bypass the normal manual review process). All changes made outside are instantly available within the gateway.

Page tree

01 Introduction to the Concept Library

Overview

Why use this tool?

Concepts vs. Working Sets

Concept Library and the SAIL Gateway