Introduction

Overview

The concept library is a system for storing, managing, sharing, and documenting clinical code lists in health research. The specific goals of this work are:

Store code lists along with metadata that captures important information about quality, author, etc.
Store version history and provide a way to unambiguously reference a particular version of a code list.
Allow programmatic interaction with code lists via an API, so that they can be directly used in queries, statistical scripts, etc.
Provide a mechanism for sharing code lists between projects and organizations.

Why use this tool?

A significant aspect of research using routinely collected health records is defining how concepts of interest (including conditions, treatments, symptoms, etc.) will be measured. This typically involves identifying sets of clinical codes that map to a variable that the researcher wants to measure, and sometimes a set of rules as well (e.g. a sufferer from a disease may be defined as someone who has a diagnosis code from list A and a medication from list B, but excluding anyone who has a code from list C). A large part of the analysis work may involve consulting clinicians, investigating the data, and creating and testing definitions of clinical concepts to be used.

Often the definitions that are created are of interest to researchers for many studies, but there are barriers to easily sharing them. The definitions may be embedded within study-specific scripts, such that it is not easy to extract the part that may be of general interest. Also, often researchers do not fully document how a concept was created, its precise meaning, limitations, etc. Crucial information may be lost when passing it to other researchers, resulting in mistakes. Often there simply is no mechanism to discover and share work that has been done previously, leading researchers to waste time and resources reinventing the wheel. In theory, when research is published, information on the precise methods used should be included, but in reality this is often inadequate.

Our goal is to create a system that describes research study designs in a machine-readable format to facilitate rapid study development; higher quality research; easier replication; and sharing of methods between researchers, institutions, and countries.

Getting Started: Using Concepts

Concepts vs. Working Sets

A "concept" is the definition of a single entity that will be used in a research project. It may be a disease ("type 2 diabetes"), a treatment ("metformin"), a test result ("HbA1c"), or anything else that may be defined within the data. The definition is typically tied to the data source, so different coding systems would need different definitions (diabetes in primary care and diabetes in a hospital setting would be two separate concepts, defined in Read codes and ICD10 codes, respectively). In addition, different concepts may be created for different purposes. One researcher may want to tightly define diabetes with high specificity, while another might want to capture everyone with possible diabetes; these would be represented as two different concepts. Different users may have different definitions, simply because the correct definition is a matter of opinion. Creating multiple concepts to define the same thing is not a problem (though of course it is best to use a single, shared definition, unless there is a good reason not to). Currently, the system stores simple concepts, which can simply be defined as a set of clinical codes. It will be further developed in the future to allow more complex definitions, such as rule-based algorithms, to be stored and shared.

Within the process of data preparation and analysis, researchers will often want to use related concepts together (for example, a list of comorbidities, or a set of medication categories of interest). An entity called a working set has been created to allow users to group related concepts and use or reference them together. To define a study's comorbidities within this system, a user would first either identify an existing concept or create a new concept for each relevant comorbidity. Once this was done, a working set called "[My study name] comorbidities" could be created, and each relevant concept added. Working sets can also have attributes: arbitrary fields that can be added, for which a value can be entered for each concept in the set. This allows flexibly capturing some information about each concept within the set. Some example uses cases for attributes:

A database column name
Category weights (for a working set that stored Charlson comorbidity categories).
Minimum and maximum valid values (for a working set of lab test results).
Categorisation (for example, within an injury working set, a flag for whether each concept is a major or minor injury).

All concepts within a working set can be referenced as a unit, and included within analysis as a unit via the API.

Search for concepts

Click on "concepts" to view the main list of concepts. A standard search interface is provided. One can also look for concepts that have specific tags associated them, using the tags box (which has autocomplete to discover tags). A tick box enables finding only concepts that you have created.

Content is never deleted from the system permanently–the "delete" function makes a concept invisible by default, but it is still present in the system. Deleted concepts can be included in the search by clicking the appropriate tick box.

You can only see content that has been shared with you (either because a concept has been made public, or because you are part of a group within which a concept has been shared).

View a concept

Clicking on the magnifying glass icon to the far right of a concept in the results allows you to view it. The top portion of the page shows documentation and permission information.

The "Components" section is where the clinical codes that make up a concept are actually defined. Each component matches a certain set of codes, and there are several types of components which implement several ways of adding codes:

Individually selected codes - This type allows the user to manually select which codes will be added. This is done either by a search for codes, and then selecting which codes from the search results to include; or by uploading a CSV.
Expression match - This type also allows a search for codes, but this is a simple rule, where all codes that match an expression are included in the concept.
Query builder - a more complex set of rules, using multiple logical conditions and referring to multiple fields within the coding system lookup table, can be defined.
Concept - A concept can include codes for one or more other concepts, allowing hierarchical relationships to be defined (for example, an "any diabetes" concept could be defined which includes concepts for type 1 diabetes, type 2 diabetes, etc.).

Components can be defined as "inclusion" or "exclusion". The codes matched by an "exclusion" component will be excluded from the concept, even if they appear in other "inclusion" components.

A full history of all changes made to a concept is stored, and that history is shown at the bottom of the concept page. The old versions can be viewed using the magnifying glass symbol.

Getting Started: Creating and Editing Concepts

To create a new concept, click the "Add new concept" button, or an existing concept (which you have permission to edit) may be edited by clicking the pencil button in its row, or clicking the edit button after viewing the concept.

Documentation

The first section is documentation describing the meaning, authorship, validation, etc. of the concept. This section must be completed and saved, with at least the mandatory fields filled in, before any codes can be added to the concept.

Permissions

The permissions section will show who created the concept and, therefore, owns it. Only the owner can modify permissions. There can be only one owner of a concept, but the owner has the ability to assign ownership to someone else. This may be useful if they are no longer actively involved in working on it.

A concept can also be associated with a group, if you are a member of any groups. Groups can be set up for a project that wants to work together and share concepts within their team, but does not want to share the work more widely yet. Groups are currently set up manually by the Concept Library administrators. Please contact us via the SAIL Helpdesk if you would like a concept library group set up.

The owner always has full access to the concept, and the owner can also choose to assign view or edit permissions to the group (if applicable), as well as everyone. Note that "everyone" means everyone with a log in to the Concept Library tool. It doesn't make the concept publicly available on the web. In the future, we plan to create a way to publish concepts on the web, but it won't be via this "everyone" permission setting. It will be a new feature, and all users will have the chance to make a decision on what material to publish.

Changes to permissions apply to all versions of a concept. If you previously shared your concept, but then remove this access, all previous versions are no longer visible within the library. (Of course, someone may have already downloaded and used the concept, something that the tool cannot prevent).

Components: Adding Codes

Once the basic documentation is saved, one or more components can be added.

Select codes individually + import codes - This type allows the user to manually select which codes will be added. This is done either by a search for codes, and then selecting which codes from the search results to include; or by uploading a CSV.
- Search existing codes - this allows searching either the code or the description. Two types of search are allowed.
  - Simple search implements a syntax like SQL like. % matches any characters, while _ matches any one character. To search for a string containing a particular word, use an expression such as %diabetes%. Simply searching for "diabetes" will only match a field whose exact contents is just the word diabetes. More information on SQL Like can be found here (the section on "Like" only).
  - POSIX regex implements a full (POSIX extended) regular expression syntax, allowing matching by powerful, flexible rules. There are many documentation and tutorial resources online for regexes. Wikipedia's summary is a useful starting point.
  Once the search returns the codes you are interested in, select which codes you want to include by ticking the boxes, and save the component to include them.
- Upload codes from CSV - Click on the upload CSV tab, and you can select and load a CSV file with two columns (code and description). You can also select whether the file has column headings (which will exclude the first row).
Expression match - This type also allows finding codes by a search, as described in the previous section, but all codes that match the search will be included. If you do want to include all codes that match, using this type is preferable, as it will result in a simpler rule defining your concept
Query builder - with this type, multiple rules that define a concept can be entered. The rules can be logically combined with AND and OR, as well as negated with NOT. The drop down box allows you to select any column in the lookup table to apply rules to. Future development will provide more information on what the lookup table contains, in order to make this more useful. "Add group" adds another level in the logic (similar to adding parentheses in a logical statement). "Get SQL" shows you the equivalent SQL where clause to the rules that have been defined, while "get codes" applies the search and actually looks for codes.
Concept - A concept can include codes for one or more other concepts, allowing hierarchical relationships to be defined (for example, an "any diabetes" concept could be defined which includes concepts for type 1 diabetes, type 2 diabetes, etc.). This type provides a search field to find existing concepts to add, with autocomplete. However, it may be useful to first browse the concepts to find the one you want (noting the ID if there are several similar concepts), before adding it here. Note that if a concept that you have included here as a component is updated, a new version of your concept that includes the new version of that child concept is automatically generated. If you create a concept that refers to someone else's concept, and they revoke permission, your concept will no longer be usable (though you will be able to access and edit it).

A version of the concept is saved automatically each time a component is added; it's not necessary to click the save concept button again after creating components.

Change History

Every time you make a change and save a component or the overall concept, a new version is generated in the change history. View versions by clicking on the magnifying glass in the history. It's possible to view previous versions of a concept and download and use the codes.

It's also possible to return to an older version of the concept by using the "revert" button in the historical view. This creates a new, latest version that is identical to the older version–it doesn't wipe out any of the history of what you have done. It's not possible to edit an older version: first you must revert to make it the latest, and then you can work on it again.

Fork

Any user who has permission to view a concept can copy it to a new, identical concept using the "fork" button (this can be done either to the latest version or an older version of a concept).

Creating and editing working sets

Working sets can be created in much the same way as concepts; they have a similar documentation section, and permissions, history, etc. work in the same way. See the above section on creating and editing concepts for more information.

To add concepts to a working set, click on "add concept", which adds a new row. You can then enter a concept in the field that is created; there is autocomplete functionality to help you find the concept you are looking for. To remove the concept and replace it with a different one, click the circular arrow icon. To delete a row completely, click the red X icon.

You can also click "add attribute" to add a new column that allows setting a value for each concept included in the working set.

You must enter a name for each attribute added. Requirements for names follow typical database naming conventions (case insensitive, must start with a letter, can be followed by any combination of letters, numbers, and underscores). Attribute names must be unique.
For each attribute, you can specify int, float, or string as the type. The values entered in a field will be validated against this type (blank is allowed for any type of attribute).

Any number of attributes is allowed, but currently the user interface does not work well for more than 3-4 attributes. This is a known issue and will be resolved in the near future.

Added concepts and attributes are not saved to the working set until you click the Save button.

Concept Library and the SAIL Gateway

The concept library is available both inside and outside the SAIL secure environment. The version found inside the SAIL gateway is read only in order to comply with SAIL governance that requires all outputs from the gateway to be manually reviewed. (If a user could create or edit content inside the gateway, then view it outside, this would bypass the normal manual review process). All changes made outside are instantly available within the gateway.