November 2, 2013

Visualization of ICD-10 Code Counts

This past week I have been working in the bowels of the QRDA Category 1 XML for the popHealth project that we are deploying for the Veterans Health Administration (VHA).  In the process of working with the QRDA Category 1, I had to resuscitate some of my Ruby and REXML skills that had atrophied in the past year.

This weekend, I wanted to shakeout some of my technical skills in a cleaner environment and downloaded the XML for the full set of ICD-10 codes from the CMS site.

Why ICD-10?  It is the 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD) by the World Health Organization (WHO).  ICD-10 provides a hierarchy of structured codes for diseases, symptoms, findings, complaints, social circumstances, and external causes of injury/diseases.  The big national issue related to ICD-10 is that it will be required for expressing claims data to the Center for Medicare and Medicaid Services (CMS) starting on October 2014.

The current state-of-the-practice for capturing this coded data in Electronic Health Record systems is (IMHO) still ICD-9, the predecessor to ICD-10.  One of the biggest differences between ICD-9 and ICD-10 is the fidelity of data that can be captured in ICD-10.  In particular, there are over 68,000 distinct codes in ICD-10 as opposed to the roughly 13,000 in ICD-9.

Working with the XML file provided on the CMS site that details the ICD-10 code hierarchy, I wanted to see if I could convert the data into a format that would allow me to visualize the code counts into a D3.js example.  I figured it was good to exercise some XML knowledge outside of the complexity of the QRDA Category 1 XML.  Further,I wanted to learn a little more about the structure of the ICD-10 codes.

It is worth noting that the CMS ICD-10 XML is surprisingly easy to understand for the purposes of enumerating the full set of codes and the hierarchy.  The QRDA Category 1 XML… not so easy to understand.

What I did was to load the ICD-10 XML hierarchy into a simple Ruby program via REXML.  I created a aggregate count in a hash table of the second-level codes in the ICD-10 hierarchy by traversing the XML file.  I had to do this only at the second-level of the ICD-10 hierarchy because the sheer number of third-level ICD-10 codes broke the D3.js visualization examples.  To explain this a little more, the hierarchy of an example diabetes code down that the fourth level in ICD-10 follows:

E00-E89: Endocrine, nutritional and metabolic diseases
  |-> E08 Diabetes mellitus due to underlying condition
    |->E08.2 Diabetes mellitus due to underlying condition with kidney complications
      |->E08.22 Diabetes mellitus due to underlying condition with diabetic chronic kidney disease

So for the illustration of ICD-10 code counts, I stopped aggregating at just the second level of the hierarchy and count/aggregate codes from the third and forth levels.  Each tiny square in the illustration below represents the counts of just the second level of the ICD-10 space of roughly 68,000 total codes.

Once I had the counts of individual ICD-10 codes aggregated at second-level of the ICD-10 hierarchy, I exported a JSON file that could work with the D3.js example that I picked.  Below is a thumbnail (admittedly... illegible) of the ICD-10 code counts transformed with the D3.js treemap example.

Visualization of second level ICD-10 code counts
If you want to try and download a higher resolution image of the ICD-10 codes and actually read more of the details, click here.  HEADS UP… it is gianormous.

With the illustration, starting from left-to-right and then top-to-bottom, the sections in the ICD-10 data set that coincide with the colors in the illustration as follows.  The only confusing item is the last "chapter" from ICD-10 is the gray box in the bottom left "Factors influencing health status and contact with health services".  I think the D3.js code had to try and fit that section into the illustration.
  • A00-B99: Certain infectious and parasitic diseases
  • C00-D49: Neoplasms
  • D50-D89: Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism
  • E00-E89: Endocrine, nutritional and metabolic diseases
  • F01-F99: Mental, Behavioral and Neurodevelopmental disorders
  • G00-G99: Diseases of the nervous system
  • H00-H59: Diseases of the eye and adnexa
  • H60-H95: Diseases of the ear and mastoid process
  • I00-I99: Diseases of the circulatory system
  • J00-J99: Diseases of the respiratory system
  • K00-K95: Diseases of the digestive system
  • L00-L99: Diseases of the skin and subcutaneous tissue
  • M00-M99: Diseases of the musculoskeletal system and connective tissue
  • N00-N99: Diseases of the genitourinary system
  • O00-O9A: Pregnancy, childbirth and the puerperium
  • P00-P96: Certain conditions originating in the perinatal period
  • Q00-Q99: Congenital malformations, deformations and chromosomal abnormalities
  • R00-R99: Symptoms, signs and abnormal clinical/laboratory findings, not elsewhere classified
  • S00-T88: Injury, poisoning and certain other consequences of external causes
  • V00-Y99: External causes of morbidity
  • Z00-Z99: Factors influencing health status and contact with health services
If you are interested, you can access the JSON file with the second level code counts from my GitHub repository that I setup.

Further, you could use this JSON with several other data hierarchy examples off the D3.js site if you are interested.  They use the same JSON format for representing the data, and you should be able to just drop the JSON that I created into that HTML if you tweak the name of the file in the examples and set your width and height of the demo to about one thousand times greater than what is provided since the about of data is so large.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. © Rob McCready, 2013.
Creative Commons License

1 comment:

  1. Just came across this blog. Really interesting stuff. I am a surgeon with clinical knowledge , but lack depth of knowledge when it comes to programs which parse the hierarchical data. Would really like to connect to discuss my project.

    Chris Wixon