EHR Data (HealthFlow background info)

 

This description applied to RetroGuide (retrospective component of the HealthFlow system, but it is still valid to the overall HealthFlow system (which indludes retrospective mode (RetroGuide) as well as prospective mode (FlowGuide)

For full info, see this book: http://www.amazon.com/dp/3639100999

Data used to be integrated by a separate tool (RetroGuide), but newer system (HealtFlow) assumes data is already in the native HealthFlow event model

 

 

 

1 Sources of data

1.1.1.1 Data sources

With RGExtractor, the goal was to assemble a reasonably complete EHR which would support a large number of possible analyses, while at the same time the extracted data would be relatively simple to understand by nonexpert requestors and would somewhat reflect the computer view of the available EHR data. As stated above, certain data sources within IHC’s EDW are not fully integrated (although they are integrateable), and extra effort by the analyst is necessary to achieve this integration.

In order to use IHC’s data warehouse, an approval from Institutional Review Boards at IHC and University of Utah was sought and obtained. As a result of the IRB approval process, the RGextractor’s data extraction rules were designed to comply with legal regulations (Health Insurance Portability and Accountability Act) and policies governing the research use of healthcare data [10]. The extracted datasets were used only internally within IHC and were not shared with any outside entity. RGExtractor specifically removed patient identifying elements enumerated in the safe harbor deidentification method [11].

The following sections describe different sources of coded data at IHC that were found to be useful to be included in RGExtractor.

Clinical data repository data. The clinical data repository (CDR) is a central database for storing lifetime EHR data, and it is a crucial part of IHC’s HELP2 system [12]. It contains coded, numerical, and textual clinical data from a wide range of sources [13]. It includes data from certain interfaced sources (e.g., laboratory results) and, most importantly, it stores any data entered through HELP2’s clinical user applications, Clinical Workstation and Clinical Desktop (e.g., problem list entries, data entered through structured EHR forms, and drug prescription data). The CDR uses IHC’s internal Healthcare Data Dictionary (HDD) coding scheme [14].

The CDR uses an event-based model [15], and each event has two, mandatory, event-table attributes (apart from time and patient dimension): event type and event subtype codes. Depending on these two attributes, additional dimensions for different types and subtypes can be contained in additional tables, which are highly normalized and hierarchical. For example: (1) a laboratory event would contain additional attributes in several laboratory tables; (2) a coded, clinical observation would contain a different set of additional attributes in clinical observation tables; or (3) a pharmacy order would use pharmacy tables to store several drug-related attributes. To extract even the basic additional event details, a different, and often fairly complex, table join must be used. RGExtractor flattens and simplifies those additional attributes for all events into 4 basic attributes of event type (level 1), event subtype (level 2), exam code (level 3), and coded value code (level 4). Codes stored at levels 3 and 4 (exam and coded value) are defined very loosely and are used flexibly depending on the type of the EHR event. Table 3.2 contains several examples of EHR events outputted by RGExtractor. In addition to the four basic attributes, a numerical value attribute and a result flag attribute are used. For example, a urine sediment result shown in Table 3.2 has an event type code “Standard Lab Data,” subtype code “Urine Macroscopics,” exam code “Epithelial Cells, Urine Sediment” and coded value result of “Occasional.” Other events shown in the table are birth event, care encounter start event (“length of stay” event type), clinical report filing event, and several administrative and legacy events.

Administrative data. Administrative data refers to coded data created by trained human coders for billing purposes. Every patient encounter record is evaluated by a trained coder after discharge, and several coded entries are created for reimbursement purposes. Three main categories of administrative data are included in the RG extraction process.

ICD9 Diagnoses are extracted from the CM.ICD9_DG_ENC billing system table. The assignment of diagnostic codes happens after the patient is discharged, usually within 1-3 weeks. The exact date of the billing diagnostic code assignment is not available in the EDW table; however, all diagnoses are clearly linked to the pertinent hospitalization. There is a single primary discharge diagnosis and several secondary discharge diagnoses assigned to each visit. RGExtactor uses a different event type to distinguish these two categories. RG is not using the assigned admission diagnosis because it may not be the most accurate main diagnosis for the patient visit (it may change later during the hospitalization). For the RG generated time-ordered chart, a specific date for these diagnostic events must be chosen. The two possible dates were admission date or discharge date. RGExtractor currently uses the admission date. This choice has the advantage that in a chronological chart-parsing strategy, it is simpler to evaluate relevant patient diagnoses for the encounter since there is no need to do a search for another discharge event. Current billing system practices do not enable distinguishing diagnoses which the patient had at admission from diagnoses assigned later during the hospitalization. Clinician-entered EHR problem list entries are more accurate; however, unlike the billing system diagnoses, the EHR’s problem list is currently not used consistently for every patient. Many existing established QI measures rely on billing data for diagnoses.

ICD9 Procedures are extracted from the CM.ICD9_PROC_ENC billing system table. For procedures, the valid date of actual procedures within a hospitalization is available. The timestamp granularity is at date level only; it is not possible to know, for example, whether the percutaneous transluminal coronary angioplasty procedure was done in the morning or afternoon.

CPT procedural data are extracted from the CM.ENCNTR_CPT_PRCDR billing system table. They represent billable items covered by the terminology developed by the American Medical Association [16]. The data covers coded events in outpatient as well as inpatient settings.

Admission data are extracted from the CM.CASEMIX_SMRY table of the administrative monitoring system, showing each hospital admission and length of stay. A special event is added at the beginning of each hospitalization. The event type designated for this event is “length of stay” and the numeric value stores how many days the hospitalization lasted.

Within hospital transfer data are extracted from the CDRDM.ENCOUNTER_ LOS_ROOM administrative table, which shows different wards where the patient stayed during each hospitalization. A special event is added for each transfer indicating that a transfer has occurred, the length of stay at that ward, and the type of the target unit where the patient was transferred.

Legacy system data. At IHC, the HELP1 legacy inpatient system is still used at many facilities. RGExtractor can integrate selected additional data from HELP1 that is not currently interfaced to the CDR. Two EDW source tables are used. HELP1.DATA table is used to extract selected nurse charting data and legacy decision support system (DSS) audit trail data (e.g., adverse drug monitoring system). The HELP1.DRUG table is used to extract inpatient medication data.

1.1.1.2 Data integration

As was mentioned in section 3.2.1.1, RGExtractor integrates several data sources into a simplified event-based schema. It does not try to capture all available event data. For example, it does not extract the units for lab measurements or route for drug orders. The CDR data integration utilizes the type, subtype, exam, coded value, numeric value and flag columns shown in Table 3.2. The administrative and legacy data integration uses the event type column and two additional attributes called term2_code and term2_ textual_explanation. “Term2” stands for “additional terminology” rather than the implied meaning of “second terminology.” Each administrative or legacy system uses a different coding scheme. Each category of administrative and legacy system event has a special RGExtractor event type assigned (e.g., “nurse note” or “inpatient drug”), and the pertinent terminology code and textual explanation of this code is stored in the term2_code and term2_textual_explanation attributes, respectively. For example, Table 3.2 shows a diagnostic billing item which has “ICD9-CM Diseases” in the event type column, and “72610” and “ROTATOR CUFF SYND NOS” in the two term2 columns. With this strategy, where event type dictates which coding scheme is expected in the term2 attributes, RGExtractor’s target data structure can handle any number of additional terminologies.

It is important to emphasize again that the data extraction was seen only as a necessary

Table 3.2 Sample EHR with selected example events

EV_TIME

TYPE

SUBTYPE

EXAM

CODED VALUE

FLAG

VAL_NUM

TERM2_CD

TERM2_TXT

1990-01-01 00:00:00.0

Birth event

             

2046-04-23 00:00:00.0

Length of Stay

       

3

   

2046-04-23 00:00:00.0

ICD-9-CM Diseases

         

72610

ROTATOR CUFF SYND NOS

2046-04-23 00:00:00.0

ICD-9-CM Procedures

         

8363

ROTATOR CUFF REPAIR

2046-04-23 00:00:00.0

CPT-4

       

2

J3010

Inj, fentanyl citrate

2046-04-23 00:00:00.0

CPT-4

         

29999

ARTHROSCOPY OF JOINT

2046-04-23 15:01:00.0

Clinical Text Data

Operative Report

           

2046-04-23 15:23:00.0

Standard Lab Data

Lipid Profile

Cholesterol, Plasma Quant.

 

Higher Than Normal

327

   

2046-04-23 15:21:00.0

Standard Lab Data

Urine Mi-croscopics

Epithelial Cells, Urine

Occasional

       

2046-05-11 13:21:50.0

Problem Event

Diagnosis

Diagnosis

Hyperlipidemia

       

2046-08-12 11:12:13.0

Patient Order

Pharmacy order

 

Meperidine Hcl, 50Mg/ Ml, Ampul

       

2047-01-18 10:55:01.0

Nurse Note

         

203.1.10.3.1.10.1.0

PURPOSEFUL MOVEMENT

2047-01-18 15:23:30.0

Inpatient Drug

         

3513816

ELECTROLYTES (NUTRILYTE) 42.9 ML, VIAL

2047-01-19 11:02:02.0

Discharged

         

43

ICU

Only selected sample events are shown. Dates are fictional and only textual meanings of codes are shown. The “ev_time” column shows the event time, the next four columns contain the basic four attributes (type, subtype, exam, coded value), followed by the flag and numeric value columns. Only the time and event type attributes are required for all events. The last two “Term2” columns used for legacy and administrative data are explained in section 3.2.1.2.

prerequisite for other parts of this project and the goal was not to create a representation and integration strategy which would be complete. Instead, a highly pragmatic approach was adopted. In fact, in an operational environment it is expected that IHC’s EDW team would provide this data integration view rather than relying on any internal development

efforts of this project.

1.1.1.3 Additional data extraction processing

RGExtractor produces two sets of output files, which are used in two later phases. The first of these, the binary, compact data files (.dat files), are used during the execution phase for parsing EHR events during the analysis. Binary files contain only timestamps and the terminology codes, are smaller in size, and cannot be viewed by a human user.

The second set of output files, EHR files (.xml files), are used during the review phase to support the individual patient view (explained later in section 3.2.4). In addition to the data items in the binary files, these files contain textual explanations of the terminology codes (e.g., code “6457946“ = “Extubation Procedure”). RGExtractor uses the HDD and other terminology lookup tables (ICD9 Diagnoses, ICD9 Procedures, CPT codes, and the HELP1 legacy system coding scheme called PTXT [17]) to pregenerate these code annotations. This strategy was adopted because of speed and file portability requirements. In general, a file-based storage structure was selected because of the EDW’s non-24/7 availability and the available expertise with manipulating file-based data. A file-based structure also was used because of the adoption of both the XSLT-based transformation technology for displaying EHR files and the single patient execution model. Disadvantages of the file-based structure are data access speed and scalability. A database storage target would address some of the limitations, but the necessary technical infrastructure was not available at the time of the extraction process development. Considering the feasibility-testing character of the overall project, the selected file-based solution was not of major concern.

An important part of the extraction process is replacement of real patient identifiers with meaningless research identifiers, as well as removal of any event parameters which refer to the real patient identifier. The deidentification was mandated by patient privacy and confidentiality concerns and the existing research IRB policies. Although the extracted files do not contain any patient identifying information, because of the actual timestamp information for each event, they were only stored on either a password-protected storage media, or IHC’s internal storage network, which is governed by strict security policies. The extracted datasets used strictly research identifiers, and the mapping files to real patient identifiers were kept in a separate location on IHC’s internal data storage network.

The extraction phase is normally performed by the analyst. It can be done once for a large cohort of patients, e.g., all diabetic patients who were treated in 2006, and then reused for several different subsequent scenarios which analyze different aspects of diabetes care. After the extraction is finished, each extracted population is usually given a name (e.g., DMCOHORT06) and a textual annotation file is used to store cohort metadata, such as the cohort selection code, details of the query used to in the cohort specification step, included data sources, the size of the extracted population, and the size of the extracted data. This metadata enables reuse of previously extracted cohorts for follow-up or secondary analyses.

Code usage statistics. Knowledge of which codes are available and how often they are used is crucial for the next phase, scenario creation. RG uses only terminology-coded information. The very existence of a pertinent EHR coded field within the clinical information system (CIS) and its real-life use is crucial for being able to answer any analytical questions. For example, a subanalysis of different types of pneumonias (any pneumonia, community-acquired pneumonia, and hospital-acquired pneumonia) is not possible when appropriate direct codes or other indirect coded-indicators are not present in the current EHR.

RGExtractor has, for this purpose, built-in functionality to create multiple code usage statistics reports. The following coding schemes or views are supported: (1) Administrative codes: ICD Diagnosis codes, ICD Procedural codes, CPT codes; (2) CDR codes: view ordered by event type, view ordered by event type, subtype, and exam code, view ordered by absolute usage; (3) Codes of prescribed drugs (entered within HELP2); and (4) Legacy system codes: clinical event codes and inpatient drug codes. An example of code-usage statistics for CDR codes is shown in Table 3.3.

Alternative data extraction design. A final consideration to be mentioned is a possibility of complete elimination of the extraction phase in future versions of RG framework. It would be feasible to change the data-get RGEAs to access the whole EDW directly, and offer alternative ways to support the single patient EHR view in the review phase. Elimination of this step would lower the dependency on the analyst, and speed up and simplify the overall RG analytical process. However, such a design would require further speed and access optimization of the EDW, which was not realistically achievable at this point, so it was not pursued. IHC’s EDW is currently optimized for population

Table 3.3 Sample code usage statistics report

TYPETXT

SBTYPETXT

EXAMTXT

VAL_CDTXT

# of RECORDS

Pat Obs Event

Care Manager Encounter

Contact Personnel

 

4

Pat Obs Event

Care Manager Encounter

Care Manager Encounter Reason

 

4

Pat Obs Event

Care Manager Encounter

Care Manager Encounter Outcome

 

4

Pat Obs Event

Care Manager Encounter

Call Attempts Quantitative

 

3

Pat Obs Event

Care Manager Encounter

Call Time

 

3

Pat Obs Event

Care Manager Encounter

Care Manager Coordination Time

 

4

Pat Obs Event

Care Manager Encounter

Next Appointment Date/Time

 

2

Pat Obs Event

Care Manager Encounter

Care Manager Next Appointment

 

2

Problem Event

Medical Procedure

Medical Procedure

Cesarean Section

7

Problem Event

Medical Procedure

Medical Procedure

Tonsillectomy With Adenoidectomy

2

Problem Event

Medical Procedure

Medical Procedure

Colonoscopy

2

Problem Event

Medical Procedure

Medical Procedure

 

1

Problem Event

Medical Procedure

Diagnosis 

Endometriosis

1

Problem Event

Medical Procedure

ProcedureOnsetTime

 

7

Problem Event

Diagnosis

Chronicity

Current

12

Problem Event

Diagnosis

Chronicity

Chronic

3

Problem Event

Diagnosis

Chronicity

First occurrence

3

Problem Event

Diagnosis

Chronicity

Intermittent

1

Problem Event

Diagnosis

Diagnosis 

 

244

Problem Event

Diagnosis

Diagnosis 

Pregnancy

61

Problem Event

Diagnosis

Diagnosis 

Depression

37

Problem Event

Diagnosis

Diagnosis 

Cesarean Section

25

CDR codes usage report is shown above (ordered by event type and descending usage within this type). The key column n is the
“# of records”. Codes (normally displayed next to description) have been removed due to copyright issues. An empty field in the coded value column (VAL_CDTXT) under diagnosis or procedure subtype indicates a free-text entry.

analyses (e.g., indexed columns or table segmentations), and the requested changes would involve better support for faster retrieval of data on a single patient, or a particular event or encounter.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: