Re: how to identify a condition by two claims and capture the earlest ...

Crystal_F · Posted 06-05-2017 11:36 AM

Hi everyone,

I have a programing question regarding how to identify a condition by two claims and capture the earlest date of diagnosis here. I'm hopeing someone could help me come up with an efficient way of programming to achive the goal, because i have 21 conditions need to be captured using the same claim data.

In my dataset (please see the attachement for example), each patient might have different number of claims and each row represents a single claim with a claim id. I need to use the diagnosis code, dx1-dx12 to identify whether a patient has, for example, hypertension (if any of the dx = : ('401', '402', '403') ). If the patient has hypertension, then i need to pick up the earlest date of diagnosis, using the corresponding claim date. In the final dataset, each patient will be flaged as hypertension (0/1), depression(0/1), and so forth, with the earlest date of diagnosis for each, if they have the target condition.

I will be greatly appreciated if someone could share your thoughts on how to program this efficiently.

Thank you!

kiranv_ · Posted 06-05-2017 11:47 AM

Someone in this wonderful SAS community can easily help you, If you show a sample have and want datasets.

ballardw · Posted 06-05-2017 12:36 PM

Many users here don't want to download Word files because of virus potential, others have such things blocked by security software. Also if you give us Word we have to create a SAS data set and due to the non-existent constraints on data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against. Desired output datasets should be provided in a similar fashion.

Crystal_F · Posted 06-05-2017 02:58 PM

data TEMP.SAMPLE1;
  infile datalines dsd truncover;
  input ICD_DGNS_CD1:$7. ICD_DGNS_CD2:$7. ICD_DGNS_CD3:$7. ICD_DGNS_CD4:$7. ICD_DGNS_CD5:$7. ICD_DGNS_CD6:$7. ICD_DGNS_CD7:$7. ICD_DGNS_CD8:$7. ICD_DGNS_CD9:$7. ICD_DGNS_CD10:$7. ICD_DGNS_CD11:$7. ICD_DGNS_CD12:$7. claim_dt:YYMMDDN8. id:32. clm_id:32.;
datalines4;
7812,53081,7243,4011,,,,,,,,,20111226,1,50
7812,53081,7243,4011,,,,,,,,,20111226,1,51
7812,53081,7243,4011,,,,,,,,,20111226,1,52
7812,53081,7243,4011,,,,,,,,,20111226,1,54
;;;;

Crystal_F · Posted 06-05-2017 03:15 PM

This code should generate the sample data. Hopefully I got it right. The want data will also contain a flag on hypertension and the earliest date of hypertension diagnosis. Thanks.

s_lassen · Posted 06-06-2017 03:12 AM

@Crystal_F:

You can try something like this:

1. convert your array to a long form (one obs. for each diagnosis and claim):

data long;
  set have;
  array diagnoses dx1-dx12;
  do _N_=1 to dim(diagnoses);
    if missing(diagnoses(_N_)) then 
      continue;
    dx=diagnoses(_N_);
    output;
    end;
  drop dx1-dx12;
run;

If the diagnoses are always "left-aligned" (meaning that once you find a missing diagnosis, the rest of the array will be missing too), you can safely replace the CONTINUE statement with a LEAVE statement, getting some improvement in performance.

2. Sort the data and find the earliest date for each diagnosis by patient:

proc sort data=long;
  by patient_id dx claim_date;
run;

data want;
  set long;
  by patient_id dx;
  if first.dx;
run;

Crystal_F · Posted 06-06-2017 08:18 PM

Thank you for sharing your valuable thoughts. I like your idea of improving efficiency by using long data instead. I'd like to appliy your logic to my programming when a target condition is defined by capturing any of the specified dignosis code (icd9) in one claim. But in this particular project, I would need to have at least two claims with any of the target icd9 code, let's say, any icd9 code starts with '401','402','403', '404', '405'. The same rule applies to the definition of the rest of conditions of interest. The dataset that i have is at the claim level, meaning each patient could have multiple claims. The dataset that i want will be at the patient level, with an indicator of each condition of interst, along with the earliest indication of dignosis data (which is the claim data in the original dataset). Thank you for your help, which is greatly appreciated!

s_lassen · Posted 06-07-2017 02:52 AM

@Crystal_F:

You could use a format to group your codes, like

proc format;
  value $diaggrp
    '401','402','403', '404', '405'='Severe dyslexia'
    '102','103'='Constipation'
    '803','335'='Uncommon cold'
  ;
run;

data long;
  set have;
  array diagnoses dx1-dx12;
  do _N_=1 to dim(diagnoses);
    if missing(diagnoses(_N_)) then 
      continue;
    dx=diagnoses(_N_);
    group=put(dx,$diaggrp.);
    output;
    end;
  drop dx1-dx12;
run;

Then, use the groups instead of the original diagnosis numbers. To make sure that you have two claims in a group, just check that the observation is not the last of its kind:

proc sort data=long nodupkey;
 by patient_id group claim_id;
run;

proc sort data=long;
  by patient_id group claim_date;
run;

data want;
  set long;
  by patient_id group;
  if first.group and not last.group;
run;

I put the first proc sort with the NODUPKEY in to make sure that if you had two DX values in the same group on the same claim, that would only count as one claim of this type. The second sort may be dropped if you are sure that CLAIM_ID values are always in chronological order.

Crystal_F · Posted 06-12-2017 09:49 AM

Thank you again for sharing your valuable thoughts. This is a great solution for pulling out cpt code via using format. I also like the idea of putting them into group for date extraction. But I'm afraid I need to look for any icd9 code start with 402, 403, 404 and 405...In the final want data set, I wish I could have one patient per row, with all the condition flagged as 1/0 and the corresponding diagnosis date. I couldn't figure out any better solution other than using proc sql or proc summary. But I believe there must be some smart way to get around this.

how to identify a condition by two claims and capture the earlest date of diagnosis

Re: how to identify a condition by two claims and capture the earlest date of diagnosis

Re: how to identify a condition by two claims and capture the earlest date of diagnosis

Re: how to identify a condition by two claims and capture the earlest date of diagnosis

Re: how to identify a condition by two claims and capture the earlest date of diagnosis

Re: how to identify a condition by two claims and capture the earlest date of diagnosis

Re: how to identify a condition by two claims and capture the earlest date of diagnosis

Re: how to identify a condition by two claims and capture the earlest date of diagnosis

Re: how to identify a condition by two claims and capture the earlest date of diagnosis

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!