You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Introduction

Overview

The concept library is a system for storing, managing, sharing, and documenting clinical code lists in health research. The specific goals of this work are:

  • Store code lists along with metadata that captures important information about quality, author, etc.
  • Store version history and provide a way to unambiguously reference a particular version of a code list.
  • Allow programmatic interaction with code lists via an API, so that they can be directly used in queries, statistical scripts, etc.
  • Provide a mechanism for sharing code lists between projects and organizations.

Why use this tool?

A significant aspect of research using routinely collected health records is defining how concepts of interest (including conditions, treatments, symptoms, etc.) will be measured. This typically involves identifying sets of clinical codes that map to a variable that the researcher wants to measure, and sometimes a set of rules as well (e.g. a sufferer from a disease may be defined as someone who has a diagnosis code from list A and a medication from list B, but excluding anyone who has a code from list C). A large part of the analysis work may involve consulting clinicians, investigating the data, and creating and testing definitions of clinical concepts to be used.

Often the definitions that are created are of interest to researchers for many studies, but there are barriers to easily sharing them. The definitions may be embedded within study-specific scripts, such that it is not easy to extract the part that may be of general interest. Also, often researchers do not fully document how a concept was created, its precise meaning, limitations, etc. Crucial information may be lost when passing it to other researchers, resulting in mistakes. Often there simply is no mechanism to discover and share work that has been done previously, leading researchers to waste time and resources reinventing the wheel. In theory, when research is published, information on the precise methods used should be included, but in reality this is often inadequate.

Our goal is to create a system that describes research study designs in a machine-readable format to facilitate rapid study development; higher quality research; easier replication; and sharing of methods between researchers, institutions, and countries.

Concepts vs. Working Sets

A "concept" is the definition of a single entity that will be used in a research project. It may be a disease ("type 2 diabetes"), a treatment ("metformin"), a test result ("HbA1c"), or anything else that may be defined within the data. The definition is typically tied to the data source, so different coding systems would need different definitions (diabetes in primary care and diabetes in a hospital setting would be two separate concepts, defined in Read codes and ICD10 codes, respectively). In addition, different concepts may be created for different purposes. One researcher may want to tightly define diabetes with high specificity, while another might want to capture everyone with possible diabetes; these would be represented as two different concepts. Different users may have different definitions, simply because the correct definition is a matter of opinion. Creating multiple concepts to define the same thing is not a problem (though of course it is best to use a single, shared definition, unless there is a good reason not to). Currently, the system stores simple concepts, which can simply be defined as a set of clinical codes. It will be further developed in the future to allow more complex definitions, such as rule-based algorithms, to be stored and shared.

Within the process of data preparation and analysis, researchers will often want to use related concepts together (for example, a list of comorbidities, or a set of medication categories of interest). An entity called a working set has been created to allow users to group related concepts and use or reference them together. To define a study's comorbidities within this system, a user would first either identify an existing concept or create a new concept for each relevant comorbidity. Once this was done, a working set called "[My study name] comorbidities" could be created, and each relevant concept added. Working sets can also have attributes: arbitrary fields that can be added, for which a value can be entered for each concept in the set. This allows flexibly capturing some information about each concept within the set. Some example uses cases for attributes:

  • A database column name
  • Category weights (for a working set that stored Charlson comorbidity categories).
  • Minimum and maximum valid values (for a working set of lab test results).
  • Categorisation (for example, within an injury working set, a flag for whether each concept is a major or minor injury).

All concepts within a working set can be referenced as a unit, and included within analysis as a unit via the API.

Getting Started: Creating and Editing Concepts

To create a new concept, click the "Add new concept" button, or an existing concept (which you have permission to edit) may be edited by clicking the pencil button in its row, or clicking the edit button after viewing the concept.

Documentation

The first section is documentation describing the meaning, authorship, validation, etc. of the concept. This section must be completed and saved, with at least the mandatory fields filled in, before any codes can be added to the concept.

Permissions

The permissions section will show who created the concept and, therefore, owns it. Only the owner can modify permissions. There can be only one owner of a concept, but the owner has the ability to assign ownership to someone else. This may be useful if they are no longer actively involved in working on it.

A concept can also be associated with a group, if you are a member of any groups. Groups can be set up for a project that wants to work together and share concepts within their team, but does not want to share the work more widely yet. Groups are currently set up manually by the Concept Library administrators. Please contact us via the SAIL Helpdesk if you would like a concept library group set up.

The owner always has full access to the concept, and the owner can also choose to assign view or edit permissions to the group (if applicable), as well as everyone. Note that "everyone" means everyone with a log in to the Concept Library tool. It doesn't make the concept publicly available on the web. In the future, we plan to create a way to publish concepts on the web, but it won't be via this "everyone" permission setting. It will be a new feature, and all users will have the chance to make a decision on what material to publish.

Changes to permissions apply to all versions of a concept. If you previously shared your concept, but then remove this access, all previous versions are no longer visible within the library. (Of course, someone may have already downloaded and used the concept, something that the tool cannot prevent).

Components: Adding Codes

Once the basic documentation is saved, one or more components can be added.

  • Select codes individually + import codes - This type allows the user to manually select which codes will be added. This is done either by a search for codes, and then selecting which codes from the search results to include; or by uploading a CSV. 
    • Search existing codes - this allows searching either the code or the description. Two types of search are allowed. 
      • Simple search implements a syntax like SQL like. % matches any characters, while _ matches any one character. To search for a string containing a particular word, use an expression such as %diabetes%. Simply searching for "diabetes" will only match a field whose exact contents is just the word diabetes. More information on SQL Like can be found here (the section on "Like" only).
      • POSIX regex implements a full (POSIX extended) regular expression syntax, allowing matching by powerful, flexible rules. There are many documentation and tutorial resources online for regexes. Wikipedia's summary is a useful starting point. 
      Once the search returns the codes you are interested in, select which codes you want to include by ticking the boxes, and save the component to include them.
    • Upload codes from CSV - Click on the upload CSV tab, and you can select and load a CSV file with two columns (code and description). You can also select whether the file has column headings (which will exclude the first row).
  • Expression match - This type also allows finding codes by a search, as described in the previous section, but all codes that match the search will be included. If you do want to include all codes that match, using this type is preferable, as it will result in a simpler rule defining your concept
  • Query builder - with this type, multiple rules that define a concept can be entered. The rules can be logically combined with AND and OR, as well as negated with NOT. The drop down box allows you to select any column in the lookup table to apply rules to. Future development will provide more information on what the lookup table contains, in order to make this more useful. "Add group" adds another level in the logic (similar to adding parentheses in a logical statement). "Get SQL" shows you the equivalent SQL where clause to the rules that have been defined, while "get codes" applies the search and actually looks for codes.
  • Concept - A concept can include codes for one or more other concepts, allowing hierarchical relationships to be defined (for example, an "any diabetes" concept could be defined which includes concepts for type 1 diabetes, type 2 diabetes, etc.). This type provides a search field to find existing concepts to add, with autocomplete. However, it may be useful to first browse the concepts to find the one you want (noting the ID if there are several similar concepts), before adding it here. Note that if a concept that you have included here as a component is updated, a new version of your concept that includes the new version of that child concept is automatically generated. If you create a concept that refers to someone else's concept, and they revoke permission, your concept will no longer be usable (though you will be able to access and edit it).

A version of the concept is saved automatically each time a component is added; it's not necessary to click the save concept button again after creating components.

Change History

Every time you make a change and save a component or the overall concept, a new version is generated in the change history. View versions by clicking on the magnifying glass in the history. It's possible to view previous versions of a concept and download and use the codes.

It's also possible to return to an older version of the concept by using the "revert" button in the historical view. This creates a new, latest version that is identical to the older version–it doesn't wipe out any of the history of what you have done. It's not possible to edit an older version: first you must revert to make it the latest, and then you can work on it again.

Fork

Any user who has permission to view a concept can copy it to a new, identical concept using the "fork" button (this can be done either to the latest version or an older version of a concept).

Concept Library and the SAIL Gateway

The concept library is available both inside and outside the SAIL secure environment. The version found inside the SAIL gateway is read only in order to comply with SAIL governance that requires all outputs from the gateway to be manually reviewed. (If a user could create or edit content inside the gateway, then view it outside, this would bypass the normal manual review process). All changes made outside are instantly available within the gateway.

  • No labels