December 16, 2014

Health IT's journey from messages... to documents... to APIs

The latest JASON report "Data for Individual Health" released last month continues to make the case for the adoption of open APIs to better support the interoperability of healthcare data.  When I think back on the past ~25 years in Health IT, the story that this is painting feels like the gradual evolution of electronic health data from messages, to documents to APIs.

Clinical messages have always been an incomplete "soda straw" perspective of a very small amount of patient's clinical information.  Messages are designed for communicating current information about a patient from one software system to another.  HL7 v2 is the dominant industry standard used for expressing clinical messages.

HL7 v2 messages can be used to express different types of messages including Admissions and Discharges (ADT), Observation Result (ORU), Order Message (ORM).  Additionally, HL7 v2 is a very old data standard, using a "pipe and hat format" approach to encoding the patient's clinical data.  In order to parse an HL7 v2 message, you need to know the offset of data within an HL7 v2 message separated by "|" and "^" characters.  For example, below is an example HL7 v2 message for a glucose reading.

MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930|
|ORU^R01|CNTRL-3456|P|2.4|PID|||555-44-4444|
|EVERYWOMAN^EVE^E^^^^L|JONES|19620320|F|||
153 FERNWOOD DR.^^STATESVILLE^OH^35292||(206)3345232
|(206)752-121||||AC555444444||67-A4335^OH^20030520
|OBR|1|845439^GHH OE|1045813^GHH LAB|15545^GLUCOSE||
|200202150730||||||||
|555-55-5555^PRIMARY^PATRICIA P^^^^MD^^|||||||||F||
||||444-44-4444^HIPPOCRATES^HOWARD H^^^^MD|OBX|1|SN
|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^182|mg/dl|70_105|H|||F

This makes for a very brittle standard, that is machine interpretable, but not very easy to work with.  By design, you probably would not want to persist HL7 v2 messages as the authoritative source of the clinical information about a patient.  It is not going to be anything more than a soda straw of a few data points.  To this end, moving forward to the turn of the 21st century... clinical documents could help... a little.

Clinical documents are designed to express a more complete picture of a patient's healthcare information at a specific date and point in time.  Notable clinical document standards have included the ASTM CCR.  HL7's CCD, CCDA, and QRDA Category I.  MITRE also developed a standard for both representing and exchanging clinical documents called hData.  Clinical documents differ from clinical messages in that documents are designed to contain more information extracted from an EHR system at a specific date and possibly a timestamp.

An example of an old CCR XML document representing a patient with hypertension and height/weight vitals follows:

<?xml version="1.0" encoding="UTF-8"?>
<ContinuityOfCareRecord xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:astm-org:CCR CCR_20051109.xsd http://www.w3.org/2001/XMLSchema xmldsig-core-schema.xsd" xmlns="urn:astm-org:CCR">
    <CCRDocumentObjectID>94461c3f-7dbf-4af1-aba9-ed4aac26bea4</CCRDocumentObjectID>
    <Language>
        <Text>English</Text>
    </Language>
    <Version>V1.0</Version>
    <DateTime>
        <ExactDateTime>2010-02-01T15:52:04Z</ExactDateTime>
    </DateTime>
    <Body>
        <Problems>
            <Problem>
                <CCRDataObjectID>BB0001</CCRDataObjectID>
                <DateTime>
                    <Type>
                        <Text>Start date</Text>
                    </Type>
                    <ExactDateTime>1990-08-07T06:00:00Z</ExactDateTime>
                </DateTime>
                <Type>
                    <Text>Diagnosis</Text>
                </Type>
                <Description>
                    <Text>Hypertension</Text>
                    <Code>
                        <Value>403.10</Value>
                        <CodingSystem>ICD-9 CM</CodingSystem>
                        <Version>2005</Version>
                    </Code>
                    </Code>
                </Description>
                <Status>
                    <Text>Active</Text>
                </Status>
            </Problem>
        </Problems>
        <VitalSigns>
            <Result>
                <CCRDataObjectID>BB0009</CCRDataObjectID>
                <DateTime>
                    <Type>
                        <Text>Start date</Text>
                    </Type>
                    <ExactDateTime>2005-09-24T04:00:00Z</ExactDateTime>
                </DateTime>
                <Description>
                    <Text>Height &amp; Weight</Text>
                </Description>
                <Source>
                    <Actor>
                        <ActorID>AA0002</ActorID>
                    </Actor>
                </Source>
                <Test>
                    <CCRDataObjectID>BB0010</CCRDataObjectID>
                    <Type>
                        <Text>Observation</Text>
                    </Type>
                    <Description>
                        <Text>Height</Text>
                        <Code>
                            <Value>50373000</Value>
                            <CodingSystem>SNOMED</CodingSystem>
                            <Version>2005</Version>
                        </Code>
                    </Description>
                    <Source>
                        <Actor>
                            <ActorID>AA0002</ActorID>
                        </Actor>
                    </Source>
                    <TestResult>
                        <Value>155</Value>
                        <Units>
                            <Unit>cm</Unit>
                        </Units>
                    </TestResult>
                </Test>
                <Test>
                    <CCRDataObjectID>BB0011</CCRDataObjectID>
                    <Type>
                        <Text>Observation</Text>
                    </Type>
                    <Description>
                        <Text>Weight</Text>
                        <Code>
                            <Value>363808001</Value>
                            <CodingSystem>SNOMED</CodingSystem>
                            <Version>2005</Version>
                        </Code>
                    </Description>
                    <Source>
                        <Actor>
                            <ActorID>AA0002</ActorID>
                        </Actor>
                    </Source>
                    <TestResult>
                        <Value>55</Value>
                        <Units>
                            <Unit>kg</Unit>
                        </Units>
                    </TestResult>
                </Test>
            </Result>
        </VitalSigns>
    </Body>
    <Actors>
        <Actor>
            <ActorObjectID>AA0001</ActorObjectID>
            <Person>
                <Name>
                    <CurrentName>
                        <Given>John</Given>
                        <Middle>N</Middle>
                        <Family>Doe</Family>
                    </CurrentName>
                </Name>
                <DateOfBirth>
                    <ExactDateTime>1960-08-23T06:00:00Z</ExactDateTime>
                </DateOfBirth>
                <Gender>
                    <Text>Male</Text>
                </Gender>
            </Person>
            <IDs>
                <Type>
                    <Text>SSN</Text>
                </Type>
                <ID>555-55-5555</ID>
                <Source>
                    <Actor>
                        <ActorID></ActorID>
                    </Actor>
                </Source>
            </IDs>
            <Address>
                <Type>
                    <Text>Home</Text>
                </Type>
                <Line1>Main Street</Line1>
                <City>Fort Lauderdale</City>
                <State>FL</State>
                <PostalCode>33011</PostalCode>
            </Address>
            <Source>
                <Actor>
                    <ActorID>AA0002</ActorID>
                </Actor>
            </Source>
        </Actor>
</ContinuityOfCareRecord>

This allows for additional data to express a more complete (but admittedly incomplete) picture of that patient's health.

Fast forward to today... as laid out in the latest JASON report... the trend is less towards messages or documents, and more towards open Application Programmer Interfaces (APIs) to access and exchange a patient's data.

In the FHIR RESTful framework, transactions are performed on the server using the HyperText Transfer Protocol (HTTP) request/response.  FHIR's RESTful framework allows for services to create, read, update, and delete information exchanged from one service to another over a network.

There's levels of authentication to gate access that can be built into FHIR at various levels of a patient's record.  So, if you wanted to request all the clinical information about a patient from a service, you would simply perform a "GET" request on that patient's identifier.  Similarly, you could request a subset of finer levels of granularity for information such as that patient's conditions, medications, allergies, etc.

There are even some sexy things that you can do with FHIR's search capabilities of clinical data.  For instance, you could perform search for any patient with a gender that has a code "male" with a query like:

GET [base-url]/Patient?gender=male

The data exchanged with FHIR is still clinical documents, but there's now a shift away from the CDA-based HL7 documents that I have not been a fan of, to FHIR's own approach to expressing clinical documentation with FHIR resources within the FHIR RESTful API.  

Ultimately, I think this is where we should be going; both with focusing on how data will be accessed and exchanged while simultaneously ensuring that the clinical data used between different services is designed to be simpler and stricter.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. © Rob McCready, 2014.
Creative Commons License

October 14, 2014

Bonnie: An Open Source Clinical Quality Measure Testing Tool

Bonnie is a new open source software tool that MITRE has developed and released in April 2014 that allows Meaningful Use (MU) Clinical Quality Measure (CQM) developers to test and verify the behavior of their CQM logic.  The goal of Bonnie is to reduce the number of defects in CQMs by providing a robust and automated testing framework. Bonnie allows measure developers to independently load measures that they have constructed using the Measure Authoring Tool (MAT). Loading the measures into Bonnie converts the measures from their Extensible Markup Language (XML) eSpecifications into executable artifacts and measure metadata.

Bonnie Dashboard Page
Bonnie Dashboard Page
The measure eSpecification format that Bonnie loads is Health Quality Measure Format (HQMF) XML. The HQMF specification provides the metadata and logic that describe the specifics of calculating a CQM. Bonnie can load the HQMF describing a measure and programmatically convert the HQMF specification into an executable format that allows calculating the measure directly from the specification. 

The measure metadata loaded into Bonnie is then used to allow developers to rapidly build a synthetic patient test deck for the measure using the clinical elements defined during the measure construction process. By using measure metadata as a basis for building synthetic patients, developers can rapidly and efficiently create a test deck for a measure. 

Once a CQM has been loaded into Bonnie, a user can inspect the measure logic and then build synthetic test records and set expectations on how those test records will calculate against a measure. This capability to build synthetic test patient records, set expectations against those records, and calculate the measures using those patient records provides an automated and efficient testing framework for CQMs. 

Using the Bonnie-supported CQM testing framework allows measure developers to more clearly understand the behavior of the measure logic, validate that the measure logic encodes their intent, and allows for multiple iterations of measure updates to be validated against a test deck. 

Bonnie Measure Page
Bonnie Measure Page
Additionally, the development of a test deck as part of measure development provides benefits after the measures are finalized. The test deck build as part of measure development can be used to demonstrate the intent of the measure though the use of patient examples included in the test deck. Furthermore, the test deck provides systems that implement the measures with a means to validate the development of their systems. This is provided in the form of a base set of synthetic patient records with known expectations for calculating against the implemented measures. Finally, the test deck could be used as a basis for the test deck used as part of the Meaningful Use certification program. 

Bonnie has been designed to integrate with the nationally recognized data standards used by the Meaningful Use program for expressing CQM logic for machine-to-machine interoperability. This provides enormous value to the CQM program and federal policy leaders and stakeholders: this software tool verifies that the new and evolving standards for the Meaningful Use CQM program are tractable and can be implemented in software.   

Additionally, Bonnie was designed to provide an intuitive and easy-to-use interface based on feedback from the broader measure developer community. A key goal of the Bonnie application is to deliver a user experience that provides an efficient and intuitive method for constructing synthetic patient records for testing and validating CQMs. 

The Bonnie software is freely available via an Apache 2.0 open source license. The Meaningful Use program makes all or parts of the Bonnie software available for inspection, verification, and even reuse by other government programs or federal contractors. 

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. © Rob McCready, 2014.
Creative Commons License

October 8, 2014

Example of CDA limitations to interoperability: time intervals

The HL7 Clinical Document Architecture (CDA) is an XML-based markup standard intended to specify the encoding, structure and semantics of clinical documents for exchange. CDA is an ANSI-certified standard from Health Level Seven (HL7).  The CDA is highlighted as a flexible framework that can contain any type of clinical content.  Additionally, the details of the encoding of clinical data and associated aspects of that data are intentionally designed to be flexible.

That flexibility provides freedoms in various different systems' ability to export clinical data.  However, that same flexibility is an increasing barrier to the interoperability of data as systems need to import that same data.  My biggest pet peeve with the CDA and this problem is the flexibility that the CDA provides in the encoding of time intervals.

Based on clinical reason, the CDA provides the freedom to encode time intervals in eight (8) (VIII) different representations.

<low>
<width>
<high>
<low> <width>
<low> <high>
<center>
<center> <width>

This amount of flexibility in expressing something as simple as a time interval is an obstacle for any receiving system hoping to import and parse an HL7 CDA-based XML document without knowing the way that the generating system is going to express something as basic as a time interval. This permissive nature of the CDA's artifacts is common beyond this one basic example.

What is needed, and hopefully addressed in the emerging FHIR specification, is a more constrained approach to the foundational aspects of clinical data, such as how to encode time intervals.  To reach a point with more interoperability of healthcare data, analysis is needed of the presence of types of structured clinical data concepts and associated clinical codes used operationally.

I feel that the healthcare standards community ultimately needs to identify a strict and simple constrained set of ways of expressing clinical concepts that healthcare Standards Development Organizations (SDOs) like HL7 should use to constrain existing permissive and complex standards.  This could also be done to guide a stricter and simpler implementation to support interoperability via FHIR.

This could introduce significant and radical improvements in the interoperability of patient data in the US healthcare industry.  This will better enable disparate healthcare software systems to work together without requiring point-to-point coordination.  This could reduce, and eventually eliminate, these problems of point-to-point coordination that result in islands of automation.

This "loose coupler" approach will encourage HL7, or possibly new healthcare SDOs, to embrace a core set of strict and simple required attributes, over the current state of the practice using permissive and complex attributes.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. © Rob McCready, 2014.
Creative Commons License

August 5, 2014

ICD 10 vs ICD 9 Code Format Structural Differences

ICD is the World Health Organization's (WHO's) International Classification of Diseases (ICD) and Related Health Problems and is the international standard diagnostic classification system, and is the tenth revision of the ICD.  The ICD is the coding system which physicians and other healthcare providers currently use to code all diagnoses, symptoms, and procedures recorded in hospitals and physician practices.  

It is a big deal.

Today, the Department of Health and Human Services (HHS) just published an updated rule for the adoption of ICD-10 code sets.  HHS is requiring that all HIPAA covered entities must be ICD-10 compliant by October 1st, 2015.  This newly updated compliance date is meant to be firm and not subject to any change.  This is the third time that ICD-10 has been delayed, so I strongly suspect that this new deadline will be met by next fall.

Last year, I put together a visualization of analysis of the ICD-10 coding landscape.  Below is a simple primer explaining the distinction between ICD-9 and ICD-10, demonstrating structurally the difference in the two coding systems.  In particular, the ICD-10 code set has been expanded from five positions (first one alphanumeric, others numeric) to up to seven positions. The codes use alphanumeric characters in all positions, not just the first position as in ICD-9.

ICD-9 vs ICD-10 coding
ICD-9 vs ICD-10 coding

Some other interesting artifacts of ICD-9 vs ICD-10 that I discovered today:
  • As of the latest version, there are ~68,000 codes in ICD-10, as opposed to the ~13,000 in ICD-9.  More specifically, there are nearly 5 times as many diagnosis codes in ICD-10 than in ICD-9 and there are nearly 19 times as many procedure codes in ICD-10 than in ICD-9.  Yikes!
  • The new code set provides a significant increase in the specificity of the reporting, allowing more information to be conveyed in a code.  To support this, the terminology has been modernized and has been made consistent throughout the code set.  There are codes that are a combination of diagnoses and symptoms, with a claim that fewer codes need to be reported to fully describe a condition.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. © Rob McCready, 2014.
Creative Commons License

July 23, 2014

Open Source Licenses for Healthcare Information Technology

Noticing that I have not been blogging as much, I want to share some perspectives on open source distribution licenses within the domain of healthcare information technology.

If well-designed, healthcare information technology solutions can improve the patient-clinician relationship, the accuracy of the patient’s health data, the diagnosis and management the patient’s health, and the efficiency and job satisfaction of clinicians.


Open source software (OSS) can play a foundational role in realizing digital healthcare delivery.  Open source software communities are intrinsically better positioned to support collaborative, community-driven demonstration of novel concepts.

Additionally, open source software lowers the barrier of entry for individuals and organizations to contribute and adopt vendor-neutral solutions in healthcare information technology.

Some of the most popular open source licenses that are used in industry are:
  • Apache License 2.0 - A permissive license that provides an express grant of patent rights from contributors to users.  The Apache Software Foundation (ASF) developed the license prose, and ASF adopted the Apache License version 2.0 in January 2004.  Similar to the MIT license, the Apache 2.0 license is compatible with version 3 of the GNU General Public License (GPL) also detailed in this table. 
  • GNU General Public License (GPL) v3 - A “copyleft” license that requires anyone who distributes the software source code or a derivative work to make the source available under the same terms.   Formally introduced in 2007, the Free Software Foundation (FSF) upgraded the GPL v2 with the GPL v3.  The most important changes introduced were in relation to software patents, free software license compatibility, the definition of "source code", and hardware restrictions on software modification. It is considered “viral” and negatively by some for-profit organizations.  I am not a fan of the GPL license because I feel it is too opinionated and tends to scare for-profit organizations away from open source.
  • MIT License - Another permissive license that is similar to the Apache 2.0 license, and very short and loose regarding requirements.  The MIT license allows users to use, copy, and modify the software source code.  As the name would imply... this distribution license originates at the Massachusetts Institute of Technology... duh.  The MIT license is GPL-compatible, meaning that it can be combined with a program under the GPL license without conflict.  The MIT license is very similar to the BSD license.  The primary difference from the BSD license is that the BSD license contains a notice prohibiting the use of the name of the copyright holder in promotion.
  • BSD License 2.0 - A permissive, free software license imposing minimal restrictions on the redistribution of covered software.  The BSD allows proprietary use and allows the software released under the license to be incorporated into proprietary products.  Similar to Apache 2.0 but lacks a patent grant, which means that the authors of the code are not giving rights needed for the authors' patents, which might happen to be in the code being used.  
My preferred open source license for use in the domain of healthcare information technology is the Apache 2.0 license.

For 7 years, I have successfully used the Apache 2.0 license for numerous healthcare projects that I have led.  The Apache 2.0 license is arguably the most commercial-friendly of all of these options due to wide adoption by industry, its permissive nature avoiding of “viral” requirements upon redistribution of derivative works, and the broad adoption of the associated Apache web server software which is used by most of commercial industry.

From my experience, one of the most important aspects of the Apache 2.0 license is the Apache brand.

Whenever I am telling a healthcare CIO about one of our open source projects licensed under the Apache 2.0 open source license, if they do not know the details of open source, I can usually talk them away from the ledge with Apache.  Talking about if they have/use an Apache web server, they usually they say "yes, I use an Apache web server".  At that point it is easier to explain that other software made available under an Apache 2.0 open source distribution license would represent no greater risk to viral release of an enterprise's intellectual property than using an Apache web server.

With the big disclaimer that I am not a lawyer... 

I also feel that the Apache 2.0 license is the superior distribution license to use in healthcare information technology because allows for software that is free to download, use, re-purpose, re-distribute, or even sell.  Yes, you are even allowed to sell someone else's software that is distributed via an Apache 2.0 license.  The only really hard requirements are attribution back to the copyright owner, and you cannot sue the original author if something bad happens.  That responsibility is on you, the user of the software.


By only requiring attribution, there is flexibility in the way that anyone would like use a derivative work.  If a healthcare open source project were to be better positioned as a paid commercial product, the Apache license provides for an immediate technology transfer mechanism to that market with no barriers... none.  Such a decision could even be made with or without agreement from all the open source project community that created the original project.  

While I am not endorsing that open source projects be "poached" and turned into commercial for-profit services, I do like that freedom that the Apache 2.0 license provides.

I hope this is helpful.


This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. © Rob McCready, 2014.
Creative Commons License