Archive for October, 2010

HealthFlow data export tool

October 21, 2010

To provide more detail on the ability to work with RetroGuide or FlowGuide related data in tools like Weka or R, we provide some screenshots here.

 

HF extract tool (former (less suitable) name RetroGuide Admin)

image

 

MXML is format for ProM process mining tool. (workflow log schema)
Data buton creates CSV file.

There is also option to extract into ARFF.

image

Analysis in Weka:

image

External applications (part 2)

October 6, 2010

This description applied to RetroGuide (retrospective component of the HealthFlow system, but it is still valid to the overall HealthFlow system (which indludes retrospective mode (RetroGuide) as well as prospective mode (FlowGuide)

For full info, see this book: http://www.amazon.com/dp/3639100999

 

1 External applications revisited

1.1.1 Phase 2: Scenario development

In the scenario development phase, a workflow editor is used to create or modify a RG scenario. Key concepts, such as the scenario’s flowchart and code layers, RGEAs, variables, and conditions, were introduced in section 3.1. This section will describe their use in greater detail. The end product of this phase is a scenario with a fully populated code layer so that the scenario is executable.

The scenario development phase is the key methodological step where collaboration of the requestor and the analyst occurs. The flowchart layer is the key facilitator of this collaboration. Although a complex scenario can be created from scratch in one editing session, RG was designed to support a gradual and iterative scenario development process where an initial, simple scenario version is progressively extended into more complex versions. The requestor usually suggests the flowchart of the initial version. Any fundamental changes originating either from the requestor or the analyst are reflected in the flowchart layer through addition of new analytical steps (nodes in the flowchart), modification of existing steps, or addition of new transition conditions (arrows in the flowchart).

In the most common collaboration setting, the requestor deals with the flowchart layer, whereas the analyst is responsible for the code layer. The flowchart layer enables annotation of the nodes or arrows with textual comments. Without requiring knowledge of RG’s code layer, the requestor can use these annotation fields to specify the required EHR data elements, details of any single analytical step, and/or transition conditions, using free-text comments. Use of these annotation fields is important in the initial version of the scenario. This version usually is not executable since only the flowchart layer is specified by the requestor. The analyst uses the annotations to insert the proper RGEAs into the nodes, define any necessary variables, and translate verbally specified transition conditions into computable syntax. The requestor can easily review these added code-layer attributes; moreover, after some basic scenario authoring training, an advanced requestor can author these attributes himself. The workflow editor also has a built-in validation feature which can check the consistency of the scenario definition (e.g., unconnected flowchart nodes or other missing, required, definition elements). The annotation fields also can be used to document any changes made to the code layer of a particular flowchart node or arrow.

The above account describes tasks primarily meant for the requestor. The following sections describe the use of RGEAs, variables, and conditions, meant primarily for the analyst or the advanced-requestor during scenario development.

1.1.1.1 Use of external applications

Section 3.1 gave a basic account of RG external applications, and specified a few categories of simple RGEAs (data-get applications, position manipulating applications, and comparison applications). Table 3.4 contains a more complete list of important RGEAs. The table lists two additional categories: (1) RGEAs which support assigning values to scenario variables; and (2) RGEAs related to report generation, which is described later in Section 3.2.4.

The data-get category of RGEAs has yet another important aspect which is related to a

 

——————————————————————–

Table 3.4 List of RGEAs

image

Data-get RGEAs

· Find_coded_event*

· Find_coded_value_under_specific_observation*

· Find_coded_event_with_flag*

· Find_drug_prescription*

· List_next_X_events

· Patient_is_male

· Get_patient_age_at_current_EHR_position

Position manipulating RGEAs

· Jump_to_timestamp

· Jump_to_first_EHR_event

· Jump_to_last_EHR_event

· Jump_to_next_timepoint

· Jump_forward/backward_X_hours_from_curr_position

RGEAs related to variables

· Remember_time_stamp

· Remember_numeric_value

· Assign_value_to_variable

· Increase_counter_variable

Data comparison RGEAs

· Evaluate_two_timestamps_difference_criterion

RGEAs related to report generation

· PtList_Harvester

· Write_DR_Top_note

· Collect_statistical_data_item

Applications marked with * also have corresponding “ReverseFind_” and “_group” variants. “Reverse_find_” variant will perform a search backwards in time, and “_group” variant enables specification of a set of codes (using simple enumeration) to be searched as opposed to only a single code search.

—————–

 

 

 

 

separate data-get conceptual scenario layer (as defined in section 2.3.1 and listed in Table 2.11). The RG framework can accommodate structural changes to the underlying EDW structure (or other EHR data source) by adjusting appropriately the data-get RGEAs (rewriting them to reflect the EDW structural change). Thus, by changing the appropriate RGEAs, an older scenario can still be executed against a structurally changed EDW [18].

RGEAs can have input and output parameters. Parameters offer a way of passing information, and are closely connected to variables, which were briefly described in section 3.1. Parameters are directly supported by workflow technology. The utilized XPDL standard defines three types of parameters: IN, OUT, and INOUT. IN parameters are read-only, input parameters, and they offer a way of passing data to an external application. OUT parameters are write-only, output parameters, and they can pass information from an external application back to the execution engine. INOUT parameters have properties of both and are the most flexible.

There are two additional conventions which apply to the current set of RGEAs. First, most RGEAs have an output parameter called ParsingResult which stores the succeed/fail type of the overall result of the application execution. Second, each RGEA which needs to know or can modify the current EHR position implements an appropriate INOUT parameter for this purpose.

A detailed description of RGEAs, including their parameters, can be found in Appendix A. The current set of RGEAs represents a significant end-product of this research project. The initial set was extended multiple times as additional RG case studies were implemented, and it offers an interesting break-down of possible, atomic, analytical tasks which can be used as an inspiration for other researchers.

1.1.1.2 Use of variables

There are two categories of variables: System variables, which are present in each scenario by default, and user-defined variables, which serve a specific analytical purpose within a given analysis.

System variables have a “S_” prefix. Currently, there are two system variables. The first is the S_ParsingResult variable, which stores the success/failure result of the last executed RGEA for which such output is defined. It is implemented as an integer variable. A zero value designates a false/failure outcome, whereas a nonzero value conveys a true/success outcome. In the data-get RGEA, the sign and the absolute value of the parsing result designates the EHR position where the desired event was found. S_ParsingResult is heavily used in transition conditions. The second system variable, S_CurrPosition, is the technical implementation of the current EHR position pointer described in section 3.1.

User-defined variables can be used for many different purposes. The introductory section mentioned the three most common purposes for variables (remembering a timestamp, numeric, or count value) and RG’s suggested variable naming conventions. User-defined variables are declared in a special workflow process properties screen within a workflow editor, where the user specifies the variable name, variable type (e.g., string, number, or Boolean value), and desired initial value. Three examples of variable use from the case studies (presented in Chapter 4) are described below.

In the first example, the analysis of the results from a computerized glucose management protocol uses a user-defined, temporal variable to store the time of enrollment into the protocol. This timestamp value is important for many later steps within this analysis. The scenario declares an integer type variable, t010_glc_prot_ enrollment, and in the same node that searches for the glucose protocol enrollment code, it also calls the Remember_timestamp_as_position RGEA with t010_glc_prot_ enrollment as a parameter.

In fact, in order to remember a certain event, two general approaches can be taken. The scenario can work with the position of the event in the ordered list of events (position in EHR), or it can use the actual timestamp of that event. For the event position, an integer variable type would be used, similar to the implementation of the concept of current EHR position. The second strategy, absolute timestamp, would use a string or a date-time variable type. Both approaches are possible and each approach has implications for the RGEAs which operate with the remembered temporal events. For example, the event position approach offers easy implementation of temporal jumps to particular positions (even if several events share the same timestamp, which is often the case in a single, structured EHR form). On the other hand, if concurrency is important and all events from a particular time point must be parsed, the position approach mandates the existence and the use of RGEAs such as Jump_to_ Previous/Next_TimeStamp(). Most RGEAs are currently implemented to expect the position-type integer value and have the ability, if necessary, to retrieve the actual timestamp of a given event at a certain EHR position.

A second example of user-defined variable comes from the osteoporosis case study. A variable, v_099_age_at_first_fracture, is used to remember the age of the patient at the time of her first fracture, which is later used to test for certain desired values and scenario branching. The patient age would be obtained by first using the application Find_ICD_Dg_group (‘fractures.csv’) to find the position in the EHR where the first fracture occurred, and then, if found, calling another application, Get_Pt_Age_at_Current_Position(v099_age_at_first_fracture), to store the calculated patient age at the current position in the variable.

The third example of a user-defined variable is a Boolean variable which also can be found in the osteoporosis analysis. Boolean variables can be used to remember the result of a subanalysis performed earlier in the scenario for later use. The osteoporosis analysis evaluates multiple criteria for the OMW1 quality improvement “yes/no” outcome measure. A Boolean variable, b01_OMW1_compliant, is used to remember patient compliance status (true or false). Later steps in the analysis investigate noncompliant patients in some greater detail. The variable enables reuse of the result without reproducing the elaborate OMW1 criteria, and even combining the result with additional conditions. For example, the transition condition “b01_OMW1_compliant=0 AND t_second_fracture>0” will capture patients who were noncompliant and experienced a second fracture later. In fact, some built-in features in the timestamp remembering RGEA applications (see appendix A for details) enable using timestamp variables in a “Boolean” fashion as well, as demonstrated by this condition example.

1.1.1.3 Use of conditions

Transition conditions represent another mechanism which extends the flowchart representation paradigm. Section 3.1 mentioned their use to direct the flow of logic and especially their use to implement branching scenario logic. Although they are a more complex mechanism, conditions are usually quickly understood, and an advanced requestor can easily author the conditions himself. A novice requestor can use a flowchart arrow’s annotation fields in the same way as was described previously for nodes and free-text descriptions of RG external applications. In this case, the analyst must translate these annotations into computable condition expressions.

Although the concept of a transition condition is well defined in workflow technology, the XPDL language does not define any specific syntax for its formulation and leaves the syntax and the interpretation to the workflow engine. The Enhydra Shark [19] workflow engine uses JavaScript syntax to formulate and parse conditions. Any variables can be used in condition expressions together with the basic Boolean (AND, OR, XOR) and numeric (>, <, ≥, ≤, =) operators. An example of a transition condition and the annotation within the JaWE workflow editor is shown in Figure 3.1.

EHR Data (HealthFlow background info)

October 6, 2010

 

This description applied to RetroGuide (retrospective component of the HealthFlow system, but it is still valid to the overall HealthFlow system (which indludes retrospective mode (RetroGuide) as well as prospective mode (FlowGuide)

For full info, see this book: http://www.amazon.com/dp/3639100999

Data used to be integrated by a separate tool (RetroGuide), but newer system (HealtFlow) assumes data is already in the native HealthFlow event model

 

 

 

1 Sources of data

1.1.1.1 Data sources

With RGExtractor, the goal was to assemble a reasonably complete EHR which would support a large number of possible analyses, while at the same time the extracted data would be relatively simple to understand by nonexpert requestors and would somewhat reflect the computer view of the available EHR data. As stated above, certain data sources within IHC’s EDW are not fully integrated (although they are integrateable), and extra effort by the analyst is necessary to achieve this integration.

In order to use IHC’s data warehouse, an approval from Institutional Review Boards at IHC and University of Utah was sought and obtained. As a result of the IRB approval process, the RGextractor’s data extraction rules were designed to comply with legal regulations (Health Insurance Portability and Accountability Act) and policies governing the research use of healthcare data [10]. The extracted datasets were used only internally within IHC and were not shared with any outside entity. RGExtractor specifically removed patient identifying elements enumerated in the safe harbor deidentification method [11].

The following sections describe different sources of coded data at IHC that were found to be useful to be included in RGExtractor.

Clinical data repository data. The clinical data repository (CDR) is a central database for storing lifetime EHR data, and it is a crucial part of IHC’s HELP2 system [12]. It contains coded, numerical, and textual clinical data from a wide range of sources [13]. It includes data from certain interfaced sources (e.g., laboratory results) and, most importantly, it stores any data entered through HELP2’s clinical user applications, Clinical Workstation and Clinical Desktop (e.g., problem list entries, data entered through structured EHR forms, and drug prescription data). The CDR uses IHC’s internal Healthcare Data Dictionary (HDD) coding scheme [14].

The CDR uses an event-based model [15], and each event has two, mandatory, event-table attributes (apart from time and patient dimension): event type and event subtype codes. Depending on these two attributes, additional dimensions for different types and subtypes can be contained in additional tables, which are highly normalized and hierarchical. For example: (1) a laboratory event would contain additional attributes in several laboratory tables; (2) a coded, clinical observation would contain a different set of additional attributes in clinical observation tables; or (3) a pharmacy order would use pharmacy tables to store several drug-related attributes. To extract even the basic additional event details, a different, and often fairly complex, table join must be used. RGExtractor flattens and simplifies those additional attributes for all events into 4 basic attributes of event type (level 1), event subtype (level 2), exam code (level 3), and coded value code (level 4). Codes stored at levels 3 and 4 (exam and coded value) are defined very loosely and are used flexibly depending on the type of the EHR event. Table 3.2 contains several examples of EHR events outputted by RGExtractor. In addition to the four basic attributes, a numerical value attribute and a result flag attribute are used. For example, a urine sediment result shown in Table 3.2 has an event type code “Standard Lab Data,” subtype code “Urine Macroscopics,” exam code “Epithelial Cells, Urine Sediment” and coded value result of “Occasional.” Other events shown in the table are birth event, care encounter start event (“length of stay” event type), clinical report filing event, and several administrative and legacy events.

Administrative data. Administrative data refers to coded data created by trained human coders for billing purposes. Every patient encounter record is evaluated by a trained coder after discharge, and several coded entries are created for reimbursement purposes. Three main categories of administrative data are included in the RG extraction process.

ICD9 Diagnoses are extracted from the CM.ICD9_DG_ENC billing system table. The assignment of diagnostic codes happens after the patient is discharged, usually within 1-3 weeks. The exact date of the billing diagnostic code assignment is not available in the EDW table; however, all diagnoses are clearly linked to the pertinent hospitalization. There is a single primary discharge diagnosis and several secondary discharge diagnoses assigned to each visit. RGExtactor uses a different event type to distinguish these two categories. RG is not using the assigned admission diagnosis because it may not be the most accurate main diagnosis for the patient visit (it may change later during the hospitalization). For the RG generated time-ordered chart, a specific date for these diagnostic events must be chosen. The two possible dates were admission date or discharge date. RGExtractor currently uses the admission date. This choice has the advantage that in a chronological chart-parsing strategy, it is simpler to evaluate relevant patient diagnoses for the encounter since there is no need to do a search for another discharge event. Current billing system practices do not enable distinguishing diagnoses which the patient had at admission from diagnoses assigned later during the hospitalization. Clinician-entered EHR problem list entries are more accurate; however, unlike the billing system diagnoses, the EHR’s problem list is currently not used consistently for every patient. Many existing established QI measures rely on billing data for diagnoses.

ICD9 Procedures are extracted from the CM.ICD9_PROC_ENC billing system table. For procedures, the valid date of actual procedures within a hospitalization is available. The timestamp granularity is at date level only; it is not possible to know, for example, whether the percutaneous transluminal coronary angioplasty procedure was done in the morning or afternoon.

CPT procedural data are extracted from the CM.ENCNTR_CPT_PRCDR billing system table. They represent billable items covered by the terminology developed by the American Medical Association [16]. The data covers coded events in outpatient as well as inpatient settings.

Admission data are extracted from the CM.CASEMIX_SMRY table of the administrative monitoring system, showing each hospital admission and length of stay. A special event is added at the beginning of each hospitalization. The event type designated for this event is “length of stay” and the numeric value stores how many days the hospitalization lasted.

Within hospital transfer data are extracted from the CDRDM.ENCOUNTER_ LOS_ROOM administrative table, which shows different wards where the patient stayed during each hospitalization. A special event is added for each transfer indicating that a transfer has occurred, the length of stay at that ward, and the type of the target unit where the patient was transferred.

Legacy system data. At IHC, the HELP1 legacy inpatient system is still used at many facilities. RGExtractor can integrate selected additional data from HELP1 that is not currently interfaced to the CDR. Two EDW source tables are used. HELP1.DATA table is used to extract selected nurse charting data and legacy decision support system (DSS) audit trail data (e.g., adverse drug monitoring system). The HELP1.DRUG table is used to extract inpatient medication data.

1.1.1.2 Data integration

As was mentioned in section 3.2.1.1, RGExtractor integrates several data sources into a simplified event-based schema. It does not try to capture all available event data. For example, it does not extract the units for lab measurements or route for drug orders. The CDR data integration utilizes the type, subtype, exam, coded value, numeric value and flag columns shown in Table 3.2. The administrative and legacy data integration uses the event type column and two additional attributes called term2_code and term2_ textual_explanation. “Term2” stands for “additional terminology” rather than the implied meaning of “second terminology.” Each administrative or legacy system uses a different coding scheme. Each category of administrative and legacy system event has a special RGExtractor event type assigned (e.g., “nurse note” or “inpatient drug”), and the pertinent terminology code and textual explanation of this code is stored in the term2_code and term2_textual_explanation attributes, respectively. For example, Table 3.2 shows a diagnostic billing item which has “ICD9-CM Diseases” in the event type column, and “72610” and “ROTATOR CUFF SYND NOS” in the two term2 columns. With this strategy, where event type dictates which coding scheme is expected in the term2 attributes, RGExtractor’s target data structure can handle any number of additional terminologies.

It is important to emphasize again that the data extraction was seen only as a necessary

Table 3.2 Sample EHR with selected example events

EV_TIME

TYPE

SUBTYPE

EXAM

CODED VALUE

FLAG

VAL_NUM

TERM2_CD

TERM2_TXT

1990-01-01 00:00:00.0

Birth event

             

2046-04-23 00:00:00.0

Length of Stay

       

3

   

2046-04-23 00:00:00.0

ICD-9-CM Diseases

         

72610

ROTATOR CUFF SYND NOS

2046-04-23 00:00:00.0

ICD-9-CM Procedures

         

8363

ROTATOR CUFF REPAIR

2046-04-23 00:00:00.0

CPT-4

       

2

J3010

Inj, fentanyl citrate

2046-04-23 00:00:00.0

CPT-4

         

29999

ARTHROSCOPY OF JOINT

2046-04-23 15:01:00.0

Clinical Text Data

Operative Report

           

2046-04-23 15:23:00.0

Standard Lab Data

Lipid Profile

Cholesterol, Plasma Quant.

 

Higher Than Normal

327

   

2046-04-23 15:21:00.0

Standard Lab Data

Urine Mi-croscopics

Epithelial Cells, Urine

Occasional

       

2046-05-11 13:21:50.0

Problem Event

Diagnosis

Diagnosis

Hyperlipidemia

       

2046-08-12 11:12:13.0

Patient Order

Pharmacy order

 

Meperidine Hcl, 50Mg/ Ml, Ampul

       

2047-01-18 10:55:01.0

Nurse Note

         

203.1.10.3.1.10.1.0

PURPOSEFUL MOVEMENT

2047-01-18 15:23:30.0

Inpatient Drug

         

3513816

ELECTROLYTES (NUTRILYTE) 42.9 ML, VIAL

2047-01-19 11:02:02.0

Discharged

         

43

ICU

Only selected sample events are shown. Dates are fictional and only textual meanings of codes are shown. The “ev_time” column shows the event time, the next four columns contain the basic four attributes (type, subtype, exam, coded value), followed by the flag and numeric value columns. Only the time and event type attributes are required for all events. The last two “Term2” columns used for legacy and administrative data are explained in section 3.2.1.2.

prerequisite for other parts of this project and the goal was not to create a representation and integration strategy which would be complete. Instead, a highly pragmatic approach was adopted. In fact, in an operational environment it is expected that IHC’s EDW team would provide this data integration view rather than relying on any internal development

efforts of this project.

1.1.1.3 Additional data extraction processing

RGExtractor produces two sets of output files, which are used in two later phases. The first of these, the binary, compact data files (.dat files), are used during the execution phase for parsing EHR events during the analysis. Binary files contain only timestamps and the terminology codes, are smaller in size, and cannot be viewed by a human user.

The second set of output files, EHR files (.xml files), are used during the review phase to support the individual patient view (explained later in section 3.2.4). In addition to the data items in the binary files, these files contain textual explanations of the terminology codes (e.g., code “6457946“ = “Extubation Procedure”). RGExtractor uses the HDD and other terminology lookup tables (ICD9 Diagnoses, ICD9 Procedures, CPT codes, and the HELP1 legacy system coding scheme called PTXT [17]) to pregenerate these code annotations. This strategy was adopted because of speed and file portability requirements. In general, a file-based storage structure was selected because of the EDW’s non-24/7 availability and the available expertise with manipulating file-based data. A file-based structure also was used because of the adoption of both the XSLT-based transformation technology for displaying EHR files and the single patient execution model. Disadvantages of the file-based structure are data access speed and scalability. A database storage target would address some of the limitations, but the necessary technical infrastructure was not available at the time of the extraction process development. Considering the feasibility-testing character of the overall project, the selected file-based solution was not of major concern.

An important part of the extraction process is replacement of real patient identifiers with meaningless research identifiers, as well as removal of any event parameters which refer to the real patient identifier. The deidentification was mandated by patient privacy and confidentiality concerns and the existing research IRB policies. Although the extracted files do not contain any patient identifying information, because of the actual timestamp information for each event, they were only stored on either a password-protected storage media, or IHC’s internal storage network, which is governed by strict security policies. The extracted datasets used strictly research identifiers, and the mapping files to real patient identifiers were kept in a separate location on IHC’s internal data storage network.

The extraction phase is normally performed by the analyst. It can be done once for a large cohort of patients, e.g., all diabetic patients who were treated in 2006, and then reused for several different subsequent scenarios which analyze different aspects of diabetes care. After the extraction is finished, each extracted population is usually given a name (e.g., DMCOHORT06) and a textual annotation file is used to store cohort metadata, such as the cohort selection code, details of the query used to in the cohort specification step, included data sources, the size of the extracted population, and the size of the extracted data. This metadata enables reuse of previously extracted cohorts for follow-up or secondary analyses.

Code usage statistics. Knowledge of which codes are available and how often they are used is crucial for the next phase, scenario creation. RG uses only terminology-coded information. The very existence of a pertinent EHR coded field within the clinical information system (CIS) and its real-life use is crucial for being able to answer any analytical questions. For example, a subanalysis of different types of pneumonias (any pneumonia, community-acquired pneumonia, and hospital-acquired pneumonia) is not possible when appropriate direct codes or other indirect coded-indicators are not present in the current EHR.

RGExtractor has, for this purpose, built-in functionality to create multiple code usage statistics reports. The following coding schemes or views are supported: (1) Administrative codes: ICD Diagnosis codes, ICD Procedural codes, CPT codes; (2) CDR codes: view ordered by event type, view ordered by event type, subtype, and exam code, view ordered by absolute usage; (3) Codes of prescribed drugs (entered within HELP2); and (4) Legacy system codes: clinical event codes and inpatient drug codes. An example of code-usage statistics for CDR codes is shown in Table 3.3.

Alternative data extraction design. A final consideration to be mentioned is a possibility of complete elimination of the extraction phase in future versions of RG framework. It would be feasible to change the data-get RGEAs to access the whole EDW directly, and offer alternative ways to support the single patient EHR view in the review phase. Elimination of this step would lower the dependency on the analyst, and speed up and simplify the overall RG analytical process. However, such a design would require further speed and access optimization of the EDW, which was not realistically achievable at this point, so it was not pursued. IHC’s EDW is currently optimized for population

Table 3.3 Sample code usage statistics report

TYPETXT

SBTYPETXT

EXAMTXT

VAL_CDTXT

# of RECORDS

Pat Obs Event

Care Manager Encounter

Contact Personnel

 

4

Pat Obs Event

Care Manager Encounter

Care Manager Encounter Reason

 

4

Pat Obs Event

Care Manager Encounter

Care Manager Encounter Outcome

 

4

Pat Obs Event

Care Manager Encounter

Call Attempts Quantitative

 

3

Pat Obs Event

Care Manager Encounter

Call Time

 

3

Pat Obs Event

Care Manager Encounter

Care Manager Coordination Time

 

4

Pat Obs Event

Care Manager Encounter

Next Appointment Date/Time

 

2

Pat Obs Event

Care Manager Encounter

Care Manager Next Appointment

 

2

Problem Event

Medical Procedure

Medical Procedure

Cesarean Section

7

Problem Event

Medical Procedure

Medical Procedure

Tonsillectomy With Adenoidectomy

2

Problem Event

Medical Procedure

Medical Procedure

Colonoscopy

2

Problem Event

Medical Procedure

Medical Procedure

 

1

Problem Event

Medical Procedure

Diagnosis 

Endometriosis

1

Problem Event

Medical Procedure

ProcedureOnsetTime

 

7

Problem Event

Diagnosis

Chronicity

Current

12

Problem Event

Diagnosis

Chronicity

Chronic

3

Problem Event

Diagnosis

Chronicity

First occurrence

3

Problem Event

Diagnosis

Chronicity

Intermittent

1

Problem Event

Diagnosis

Diagnosis 

 

244

Problem Event

Diagnosis

Diagnosis 

Pregnancy

61

Problem Event

Diagnosis

Diagnosis 

Depression

37

Problem Event

Diagnosis

Diagnosis 

Cesarean Section

25

CDR codes usage report is shown above (ordered by event type and descending usage within this type). The key column n is the
“# of records”. Codes (normally displayed next to description) have been removed due to copyright issues. An empty field in the coded value column (VAL_CDTXT) under diagnosis or procedure subtype indicates a free-text entry.

analyses (e.g., indexed columns or table segmentations), and the requested changes would involve better support for faster retrieval of data on a single patient, or a particular event or encounter.

Background info on HealhFlow system (basic concept, external applications part 1)

October 6, 2010

This description applied to RetroGuide (retrospective component of the HealthFlow system, but it is still valid to the overall HealthFlow system (which indludes retrospective mode (RetroGuide) as well as prospective mode (FlowGuide)

 

For full info, see this book: http://www.amazon.com/dp/3639100999

1 Basic introduction

In this section, the RetroGuide (RG) analytical suite (proposed in Aim 2) is described in detail. An introduction to RG is followed by a description of four phases of RG’s usage. Several systems which are similar to RG are listed and described. Finally, a comparison of RG to Structured Query Language (SQL) is provided.

1.1 Introduction

RG is a suite of applications which supports several analytical steps. The RG architecture and analytical approach was substantially inspired by workflow technology and is meant to be applied in a medical context using electronic health record (EHR) data. Some of the adopted workflow constructs were already mentioned in Chapters 1 and 2, but the following list presents them again in overview: (1) graphical executable process models; (2) ability of the process flowcharts to contain references to execution of external applications; (3) an execution scheme where each process instance (or analyzed patient) is treated separately; and (4) built-in documentation of the process execution flow for later review or analysis.

1.1.1 Requirements

Reviewing the analytical limitations presented in Chapter 1, a set of requirements for a novel analytical approach was defined. These requirements are enumerated in [1] and also presented in the list below:

1. Provision of a set of user friendly graphical tools, targeted for clinician’s use, in which a clinical process or an analysis can be modeled in a stepwise fashion. The model would resemble a flowchart format often used by clinical guidelines.

2. Extensibility of the flowchart format with elements, entered by analysts or programmers, which would enable linkage of the graphical model to real healthcare data in an enterprise data warehouse (EDW). Additionally, these extending elements would enable modeling complexity which cannot be expressed by the flowchart notation alone.

3. Direct executability of the flowchart-based format (combining the input from clinicians and programmers/analysts).

4. Support for gradual development from simple models to more complex ones, with a shared workbench used by both clinicians and EDW data analysts.

5. Expressive process modeling language, sufficient to represent healthcare specific problems and challenges (e.g., simple clinical guidelines, adverse drug events, basic temporal conditions).

6. Small development burden (i.e., reuse existing standards and available tools, and develop only institution specific or healthcare domain specific extensions).

7. Support for the generation of reports and ability to assess process variability.

8. Data export into other analytical or statistical packages.

9. Ability to reuse models created on retrospective data in prospective mode. Retrospective models must be extendable to support execution in real time, using real events as controlling triggers. In other words, models working with retrospective data must be extendable to become point of care decision support modules.

1.1.2 Basic concepts

Several key basic concepts are necessary to understand the RG analytical approach. They are structured into several subsections below.

An RG analytical question is asked in a form of a scenario, which has two layers: a graphical flowchart layer and an additional hidden code layer. The RG term “scenario” is equivalent to the concept of a process or process definition in workflow technology. However, RG adds several additional conventions (which are explained later in section 3.2.2) to the process concept, hence a different term (“scenario”) is used. A scenario is created or viewed in a workflow editor application. The workflow editor can output and save the final scenario using a process definition language as defined in the workflow technology. RG is currently implemented using XML Process Definition Language (XPDL) [2] and can use any XPDL-compliant editor for creating new scenarios or modifying existing ones. The main editor utilized in this project is JaWE version 1.4 [3], however, the CapeVision plugin [4] for Microsoft Visio, Tibco Business Studio [5] and JPEd [6] editors were also used experimentally.

1.1.2.1 The flowchart and code layers

The flowchart layer (”flowchart”) consists of nodes and arrows which connect the nodes. Whereas nodes represent individual steps in the analysis, the arrows represent transitions in the flow of the analysis’ logic.

Two special nodes designate the start and end of the scenario’s logic. An RG flowchart reflects a set of instructions for a stepwise, sequential analytical process, and, if it is necessary for the analysis, it can contain loops. The procedural, algorithmic nature of the RG flowchart is very different from node-and-arrows models used in some graphical dependency models.

Both nodes and transitions may contain additional attributes which comprise the code layer and which ultimately make the flowchart executable. The additional attributes of nodes are references to execution of one or more external applications. The additional attributes of transitions are transition conditions. If there is no condition inside a transition, the next step (node) is executed in every case. If a condition is present, the transition to the next node happens only if the condition is satisfied. There may be more than one arrow originating from any given node, which offers the ability to have branching logic. An example of a reference to an external application is “ReverseFind_CodedValueCD(C-section_procedure_CD)”, which will result in a EHR event search (backwards in time) for the C-section procedure. An example of a transition condition is “(value_D_dimer < 300) AND (value_antithrombin3 < 0.45)”, which means that the next node or scenario-branch will be executed only if the given threshold criteria are met.

1.1.2.2 Execution scheme

The scenario is executed on data of a single patient. An analysis of a population is achieved by sequentially running the scenario on all members of the population – one at a time. Population results are abstracted at the end of the whole process when the engine finishes the sequential execution of the scenario on all patients.

1.1.2.3 Navigation through the single patient EHR

RG uses a unique method of browsing through the EHR during the scenario execution. The approach resembles a human chart review process. RG operates strictly on a time-ordered patient chart, and this chronological assumption is crucial for its analytical functions. During the execution, RG manipulates the electronic chart, according to the stepwise sequence defined in the scenario, in a manner similar to a human browsing through a paper chart.

Most paper charts are also organized chronologically. For any chart review there is usually a clear set of step-by-step instructions of what the reviewer needs to look for in the paper record. The reviewer follows these steps one by one. The steps ask him or her to either look for certain events in the record or answer any analytical questions needed for the next steps in the review process. If the task is to find a certain event, the reviewer might often be asked to write down the result of this event search – a “yes/no” outcome. In the case of a “yes” outcome (i.e., the desired event in a particular step is found), the reviewer might be instructed to remember a pertinent numerical or temporal value about the found event. This remembered value can be used for comparisons in the next steps of the instructions.

During this manual review process, the reviewer browses the paper chart forwards or backwards, fulfilling each step in the instructions. At any point in time, the chart would be open at a particular position where the reviewer finished the last step. For understanding the RG execution model, the notion of this current position in the EHR (either for a human reviewer or a computer algorithm) is crucial. RG implements the current position pointer as an integer number which means the order rank of one particular event, where the execution stopped when it finished the last step. In the manual chart review process it would be similar to a notion of a “page number” in a chronologically ordered paper chart.

The review instructions also may contain steps where the reviewer would simply be asked to browse to a particular absolute or relative position from the current position. For example: “Go back to the start of the record and look for evidence for particular comorbidity (e.g., hypertension) at any point in time.” Another example would be: “Skip the rest of the current hospitalization where you found the PTCA operation and look for complications X occurring in a window of 4-12 months from the operation.”

There is one additional key element about the behavior of the current position marker during RG execution. If the next searched desired event is not found, the marker stays where it was before such unsuccessful search. To explain this in more detail, imagine there is an EHR with 1457 events. If the current position marker after finishing the last step is at a particular position (e.g., 453) and the next desired event was not found, the marker stays where it was (position 453), although in fact RG during the unsuccessful search browsed all events 457th , 458th , 459th , etc., until the last (1457th) event and finished at event 1457. If the desired event is not found, RG’s current position marker stays unchanged.

1.1.2.4 Variables

RG has the ability to remember certain facts via the use of variables. The concept of variables is part of workflow technology, where it is called workflow relevant data. The creation and use of variables technically belong to the code-layer; however, most variable names are often directly exposed and used in the flowchart node titles.

The manual chart review example presented previously hinted at the need for variables when it mentioned an instruction for the human reviewer to write down certain facts about found events (to be used later in the logic or as data output). The RG scenario, being based on a workflow process, can use variables for this purpose of remembering important facts. The number of variables used is unlimited, and various types are offered in RG. Certain naming conventions of variables are strongly recommended (e.g., “time” suffix or “t” prefix for temporal facts, “value” or “v” for numerical values, and “count” or ”c” for count variables).

For example, a scenario analyzing hypertension may search for a prescription of a certain antihypertensive drug; remember the drug’s prescription date as t_drug; search for a systolic blood pressure value (sBP) prior to this prescription and remember this value as v_sBP_prior_therapy; jump to time 6 months after t_drug; search for the next available systolic blood pressure value and remember it as v_sBP_after_therapy. Subsequent analytical nodes or conditions can use all these introduced variables to answer various clinical questions, or to restrict or split further analysis into subgroups of patients which satisfy one or multiple temporal or numerical conditions.

1.1.2.5 Logical constructs

The RG analysis is controlled through two main formalisms which correspond to the two layers mentioned above. The first formalism is offered by the flowcharting logic and exposed within the flowchart layer. This includes the use of various steps, use of conditions on transitions, and use of multiple flowchart branches to model the analytical problem at hand. This graphical formalism is meant to be understood by the analysis requestor. Any fundamental scenario modifications done by the analyst to the model are reflected by changes in the flowchart (e.g., adding an extra analytical step or adding scenario branches).

The second, problem modeling formalism is the use of external applications, and is represented at the code scenario layer. RG external applications (RGEA) can contain computer code which can cover reasoning which is outside the capabilities of the flowchart formalism.

External applications can serve several different purposes from simple to complex. Simple external applications may, for example, retrieve data from an EHR (data-get applications). Another category of simple RGEAs is applications which manipulate current position in the EHR to achieve certain effects useful for the analysis at hand (e.g., Jump_forward_X_Days (Desired_Days), Jump_After_Timestamp (Desired_absolute_ time-stamp), or Jump_to_First_EHR_Event). Another category of analytical applications can do comparisons which cannot be expressed as transition conditions. For example, temporal comparisons such as: Temporal_Difference_Exceeding (Timestamp1, Timestamp2, Desired_ Difference_Days).

Complex external applications may involve use of external, complex reasoning engines. For example, calculation of advanced statistical indicators, use of Bayesian belief networks, artificial neural networks, or other machine learning techniques for classification or prediction. An RG scenario would provide the necessary input for the calculations and would be able to use the output parameters for deciding what to do next (branching) with the outputted result.