How to identify a pattern across multiple variables in 1 record?

Accepted Solution Solved
Reply
Contributor SM1
Contributor
Posts: 30
Accepted Solution

How to identify a pattern across multiple variables in 1 record?

Hi,

 

I am using SAS 9.4. I am working with health care data of patients enrolled in case management (CM) services for varying lengths of time during a 5-year period. My full data set includes 28,673 rows. Each row is a record of 1 patient encounter and includes 1 CM-relevant code ('V4989 2' = start CM, 'V4989 3' = continue CM, and 'V4989 4' = end CM). Some patients have only 2 encounters, while 1 patient has 110 encounters.

 

I need to sort patients according to the patterns of CM-relevant codes in their records. For every patient, their 1st encounter should include the start code, all subsequent encounters should include the continue code, and their last encounter should include the end code. There are a lot of patient records that do not follow this pattern.

 

I transposed my data so that each row = 1 patient and contains the CM codes from every documented encounter. Here's an example of some data:

 

PATID     VARNAME     CODE1     CODE2     CODE3     CODE4     CODE5  . . . .CODE110

00001      CMCODE      V4989 2     V4989 3   V4989 4

00002      CMCODE      V4989 3     V4989 4

00003      CMCODE      V4989 3     V4989 3    V4989 3    V4989 3    V4989 4

00004      CMCODE      V4989 2     V4989 4    V4989 2    V4989 3    V4989 4

00005      CMCODE      V4989 4     V4989 3    V4989 3    V4989 4

 

Patient 00001's record is coded correctly per regulation. Patients 0002 and 0003 follow the same (incorrect) pattern. Patient 0004's record looks like it was coded correctly - but for 2 separate enrollments in CM services. Patient 0005's record follows its own (incorrect) pattern.

 

I've been trying to figure out how to separate patient records according to the pattern of the CM coding, but I'm at a loss. Any suggestions about what might work will be really appreciated.

 

Thanks!


Accepted Solutions
Solution
‎03-03-2016 04:59 PM
Trusted Advisor
Posts: 1,114

Re: How to identify a pattern across multiple variables in 1 record?

[ Edited ]

Hi SM1,

 

The code below shows how I would create code patterns (in a character variable COMB) by combining the last digits of the CM codes (the 'V4989' part seems to be redundant), removing duplicates. It works with the untransposed ("long" layout) data, sorted by PATID and chronologically within PATID, and assumes that the values of CMCODE are of the form 'V4989 x' with an informative character at position 7, e.g., a digit from {2, 3, 4}. (I would first check if all values follow this basic pattern.)

 

/* Create test data just for demonstration */
data long;
input patid $ @7 cmcode $7.;
cards;
00001 V4989 2
00001 V4989 3
00001 V4989 4
00002 V4989 3
00002 V4989 4
00003 V4989 3
00003 V4989 3
00003 V4989 3
00003 V4989 3
00003 V4989 4
00004 V4989 2
00004 V4989 4
00004 V4989 2
00004 V4989 3
00004 V4989 4
00005 V4989 4
00005 V4989 3
00005 V4989 3
00005 V4989 4
;

/* Create code combinations */
data codecomb;
retain patid; /* just to keep PATID at first position */
length comb $110;
do until(last.patid);
  set long;
  by patid cmcode notsorted;
  if first.cmcode then comb=cats(comb, char(cmcode,7));
end;
drop cmcode;
run;

proc freq data=codecomb;
tables comb;
run;

For example, patients 00002 and 00003 would get the same pattern COMB='34' (interpreted as "one or more occurrences of 'V4989 3', followed by one or more occurrences of 'V4989 4'"). If you don't want to compress consecutive occurrences of the same code to a single digit, i.e. obtain COMB='33334' for patient 00003, simply remove the IF condition.

 

You could then perform further pattern matching with the code patterns in variable COMB.

View solution in original post


All Replies
Grand Advisor
Posts: 10,251

Re: How to identify a pattern across multiple variables in 1 record?

When I have "a lot" of records that do not meet an expected the pattern the first thing is to go back to the source and see how the data was actually generated and extracted from a system. You may have received a subset that was originally intended for another purpose and hence does not have every actual "encounter" within the time frame.

 

In a medical setting it could be that what you have such things as administrative removals from the program with possible reinstatement. Or selection of records according to someother service code. Such as the first visit did not actually involve seeing a doctor or test but was enrolled for some reason but the data you have is only associated with the doctor visit/lab test/procedure type visits.

 

Re-enrollment might be associated with change in health insurance status or other provider, or planned or expected expiration for some other reason.

 

Or simple data entry issues. I have to say that 2=start is minorly counterintuitive. I would tend to think 1=start.

Contributor SM1
Contributor
Posts: 30

Re: How to identify a pattern across multiple variables in 1 record?

Unfortunately, these records were generated and extracted properly from the system.

 

I just need to figure out some approach to systematically subdividing the records by observed coding pattern.

 

 

Solution
‎03-03-2016 04:59 PM
Trusted Advisor
Posts: 1,114

Re: How to identify a pattern across multiple variables in 1 record?

[ Edited ]

Hi SM1,

 

The code below shows how I would create code patterns (in a character variable COMB) by combining the last digits of the CM codes (the 'V4989' part seems to be redundant), removing duplicates. It works with the untransposed ("long" layout) data, sorted by PATID and chronologically within PATID, and assumes that the values of CMCODE are of the form 'V4989 x' with an informative character at position 7, e.g., a digit from {2, 3, 4}. (I would first check if all values follow this basic pattern.)

 

/* Create test data just for demonstration */
data long;
input patid $ @7 cmcode $7.;
cards;
00001 V4989 2
00001 V4989 3
00001 V4989 4
00002 V4989 3
00002 V4989 4
00003 V4989 3
00003 V4989 3
00003 V4989 3
00003 V4989 3
00003 V4989 4
00004 V4989 2
00004 V4989 4
00004 V4989 2
00004 V4989 3
00004 V4989 4
00005 V4989 4
00005 V4989 3
00005 V4989 3
00005 V4989 4
;

/* Create code combinations */
data codecomb;
retain patid; /* just to keep PATID at first position */
length comb $110;
do until(last.patid);
  set long;
  by patid cmcode notsorted;
  if first.cmcode then comb=cats(comb, char(cmcode,7));
end;
drop cmcode;
run;

proc freq data=codecomb;
tables comb;
run;

For example, patients 00002 and 00003 would get the same pattern COMB='34' (interpreted as "one or more occurrences of 'V4989 3', followed by one or more occurrences of 'V4989 4'"). If you don't want to compress consecutive occurrences of the same code to a single digit, i.e. obtain COMB='33334' for patient 00003, simply remove the IF condition.

 

You could then perform further pattern matching with the code patterns in variable COMB.

Contributor SM1
Contributor
Posts: 30

Re: How to identify a pattern across multiple variables in 1 record?

Hi FreelanceReinhard,

 

The code worked perfectly . . . and now I'll go spend some quality time further matching the groups.

 

Thank you so much!

Contributor SM1
Contributor
Posts: 30

Re: How to identify a pattern across multiple variables in 1 record?

Hi Freelance Reinhard,

 

I replied to quickly - I do have one question.

 

The syntax did work, except it didn't compress consecutive occurrences of the same code to a single digit.

 

From the Freq output, this is what the first few lines for the variable "COMB" look like:

 

2

22

222

2222233333333333333333334

2222333333333333333333333333333333333333333333333333333333334

22223333334

22224

2223

222333

22233323333

 

Is there code I could use to compress consecutive occurrences of the same code to a single digit? If that's possible, then my problem really will be solved.

 

Thanks so much!

 

Trusted Advisor
Posts: 1,114

Re: How to identify a pattern across multiple variables in 1 record?

Hi SM1,

 

Did you use the code with or without the IF condition? As described in my earlier post, with the IF condition the duplicates should be eliminated, without the IF condition they would be kept in the combination.

Contributor SM1
Contributor
Posts: 30

Re: How to identify a pattern across multiple variables in 1 record?

The syntax included the IF condition.

 

I included 1 other variable in the BY statement (encdate = encounter date)

 

Here's the syntax I ran:

 

data codecomb;

retain patid;

length comb $110;

do until(last.patid);

set CM_dx;

by patid encdate cmcode notsorted;

if first.cmcode then comb=cats(comb, char(cmcode,7));

end;

run;

Trusted Advisor
Posts: 1,114

Re: How to identify a pattern across multiple variables in 1 record?

I see. By including the additional variable you split the PATID BY groups into PATID-ENCDATE BY groups, thus changing the meaning of the IF condition. "FIRST.CMCODE" now means "first new value within a fixed PATID-ENCDATE combination." That is, for example, if ENCDATE changes from each observation (of one PATID) to the next, the IF condition is always true, so that the result is the same as if the IF condition wasn't there (hence all the duplicates).

 

I understand why you included ENCDATE (to ensure the chronological order, which is indeed important for the code to make sense), but you should run the code without ENCDATE in the BY statement after sorting the dataset BY PATID ENCDATE.

Contributor SM1
Contributor
Posts: 30

Re: How to identify a pattern across multiple variables in 1 record?

Makes perfect sense.

 

I'll do that.

 

Thank you so much!!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 562 views
  • 0 likes
  • 3 in conversation