BookmarkSubscribeRSS Feed
c8826024
Calcite | Level 5

Hi, everybody, I have a dataset like the follwoing:

patient_iddrug1drug2drug3drug1_dtdrug1_dtdrug1_dtTest_dt1Test_dt2Test_dt3Test_dt4Test_dt5Test_dt6Test_dt7
1aaabbb1/2/20121/4/2012
2cccaaabbb3/4/20125/2/20126/1/20121/3/20122/2/30123/6/20124/1/20125/2/20125/6/20125/8/2012
3aaa1/5/20121/1/20121/5/20122/1/20123/1/20124/1/20125/1/20126/1/2012
4bbb2/1/20121/1/20121/5/20121/6/2012
5bbbccc3/1/20123/16/20121/1/20123/1/20123/16/2012
6aaabbbccc1/30/20123/1/20125/10/20121/30/20123/1/20125/10/20126/1/2012
7

I want to create a new series of variables, reason code, by carrying over drug name based on the comparison of test date with drug taking date. If test date greater than or equal to drug taking date, then the reason code will be the drug used by the drug taking date. Somthing like this:

patient_iddrug1drug2drug3drug1_dtdrug1_dtdrug1_dtTest_dt1Test_dt2Test_dt3Test_dt4Test_dt5Test_dt6Test_dt7reason1reason2reason3reason4reason5reason6reason7
1aaabbb1/2/20121/4/2012bbb
2cccaaabbb3/4/20125/2/20126/1/20121/3/20122/2/30123/6/20124/1/20125/2/20125/6/20125/8/2012unknownunknownccccccaaaaaaaaa
3aaa1/5/20121/1/20121/5/20122/1/20123/1/20124/1/20125/1/20126/1/2012unknownaaaaaaaaaaaaaaaaaa
4bbb2/1/20121/1/20121/5/20121/6/2012unknownunknownunknown
5bbbccc3/1/20123/16/20121/1/20123/1/20123/16/2012unknownbbbccc
6aaabbbccc1/30/20123/1/20125/10/20121/30/20123/1/20125/10/20126/1/2012aaabbbcccccc

7

Does anybody know how to make it happen in SAS?

Thanks.

10 REPLIES 10
Reeza
Super User

Are you sure the dates will always be in order for drug_dt?

Why for patient 3/4/5 are there dates in the drug rather than drug names?

I'm assuming you want to work with the data in this structure rather than transposing it?

Will you always know how many drug dates and tests you have?  What happens when drug dates or drugs are missing?

Is there a boundary on the date, ie if the drug is 1 year (6 months, 3 months) after the test is that still valid?

Can you provide the data in a data step, I'm guessing you have missing above and it isn't showing up?

Yes it can be done and not relatively hard...nest a do loop for each test to go through the drug dates and find the first test after or on the date.

c8826024
Calcite | Level 5

Thank you for your response.

Yes, the drug date and test date will be always in order. If there is no drug name, then the drug date will be missing. When I cut/paste the data to this board, somehome the blanks have been shifted/overwrited by near non-blank values which is mess up my data. Here is the correct data:

patient_iddrug1drug2drug3drug1_dtdrug1_dtdrug1_dtTest_dt1Test_dt2Test_dt3Test_dt4Test_dt5Test_dt6Test_dt7reason1reason2reason3reason4reason5reason6reason7
1aaabbbmissing1/2/2012missingmissing1/4/2012missingmissingmissingmissingmissingmissingbbbmissingmissingmissingmissingmissingmissing
2cccaaabbb3/4/20125/2/20126/1/20121/3/20122/2/30123/6/20124/1/20125/2/20125/6/20125/8/2012unknownunknownccccccaaaaaaaaa
3aaamissingmissing1/5/2012missingmissing1/1/20121/5/20122/1/20123/1/20124/1/20125/1/20126/1/2012unknownaaaaaaaaaaaaaaaaaa
4bbbmissingmissing2/1/2012missingmissing1/1/20121/5/20121/6/2012missingmissingmissingmissingunknownunknownunknownmissingmissingmissingmissing
5bbbcccmissing3/1/20123/16/2012missing1/1/20123/1/20123/16/2012missingmissingmissingmissingunknownbbbcccmissingmissingmissingmissing
6aaabbbccc1/30/20123/1/20125/10/20121/30/20123/1/20125/10/20126/1/2012missingmissingmissingaaabbbccccccmissingmissingmissing
7missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing

You are right. I might not know exact how many drugs or tests will be taken. As a start, I made it 3 drugs and 7 tests.

c8826024
Calcite | Level 5

There is no boundary on the date.

art297
Opal | Level 21

Can we presume that the actual variable list was supposed to be:

patient_i drug1 drug2 drug3 drug1_dt drug2_dt drug3_dt Test_dt1 Test_dt2 Test_dt3

and that there wasn't supposed to be a Test_dt4?  And, if so, what would you like the resulting table to look like based on that data.

Reeza
Super User

Is the above how your data actually looks, since you don't know the # of drugs or tests?

Reeza
Super User

This gets you relatively close. I didn't do too much testing.  You have a 3012 on there I'm assuming should be 2012.

I also renamed drug1_dt to 3 unique names, drug_dt1-drug_dt3 because it was easier to use that way.

You do need to know your array dimensions to work but since you have the variables that should be easy enough.

data want;

    set have;

    array tst(*) test_dt1-test_dt7;

    array drgs(*) drug1-drug3;

    array drgs_dt(*) drug_dt1-drug_dt3;

    array rsn(*) $ reason1-reason7;

   

    do i=1 to dim(tst) while (not missing(tst(i)));

        do j=1 to dim(drgs_dt) while (not missing(drgs_dt(j)));

            if drgs_dt(j)<= tst(i) then do;

                rsn(i)=drgs(j);

            end;

        end;

        if rsn(i)='' then rsn(i)='unknown';

    end;

run;

Ksharp
Super User

Easy.

data have;
infile cards truncover;
input patient_id     drug1 $     drug2 $     drug3 $     (drug1_dt     drug2_dt     drug3_dt     Test_dt1     Test_dt2     Test_dt3     Test_dt4     Test_dt5     Test_dt6     Test_dt7) (: mmddyy10.);
format drug1_dt     drug2_dt     drug3_dt     Test: mmddyy10.;
cards;
1     aaa     bbb     .     1/2/2012     .     .     1/4/2012     .     .     .     .     .     .     bbb     .     .     .     .     .     .
2     ccc     aaa     bbb     3/4/2012     5/2/2012     6/1/2012     1/3/2012     2/2/3012     3/6/2012     4/1/2012     5/2/2012     5/6/2012     5/8/2012     
3     aaa     .     .     1/5/2012     .     .     1/1/2012     1/5/2012     2/1/2012     3/1/2012     4/1/2012     5/1/2012     6/1/2012
4     bbb     .     .     2/1/2012     .     .     1/1/2012     1/5/2012     1/6/2012     
5     bbb     ccc     .     3/1/2012     3/16/2012     .     1/1/2012     3/1/2012     3/16/2012
6     aaa     bbb     ccc     1/30/2012     3/1/2012     5/10/2012     1/30/2012     3/1/2012     5/10/2012     6/1/2012
;
run;

data temp;
 set have(keep=patient_id drug1 drug2      drug3 drug1_dt     drug2_dt     drug3_dt) ;
 do i=coalesce(drug1_dt,'01dec2012'd) to coalesce(drug2_dt,'01dec2012'd)-1 ;
 key=i;
 reason=drug1;
 output;
 end;
 do i=coalesce(drug2_dt,'01dec2012'd) to coalesce(drug3_dt,'01dec2012'd)-1 ;
 key=i;
 reason=drug2;
 output;
 end;
 do i=coalesce(drug3_dt,'01dec2012'd) to '01dec2012'd ;
 key=i;
 reason=drug3;
 output;
 end;
 keep patient_id key reason;
 run;
data want;
if _n_ =1 then do;
 if 0 then set temp;
 declare hash ha(dataset:'temp');
  ha.definekey('patient_id','key');
  ha.definedata('reason');
  ha.definedone();
end;
 set have;
 array t{7} Test_: ;
 array r{7} $ reason1-reason7 ;
 do i=1 to dim(t);
  key=t{i};
  call missing(reason);
   rc=ha.find();
  r{i}=reason;
 end;
 drop i key rc reason;
run;
 

Ksharp

Haikuo
Onyx | Level 15

Work something out based on Fareeza's code:

data have;

infile cards truncover;

input patient_id     drug1 $     drug2 $     drug3 $     (drug1_dt     drug2_dt     drug3_dt     Test_dt1     Test_dt2     Test_dt3     Test_dt4     Test_dt5     Test_dt6     Test_dt7) (: mmddyy10.);

format drug1_dt     drug2_dt     drug3_dt     Test: mmddyy10.;

cards;

1     aaa     bbb     .     1/2/2012     .     .     1/4/2012     .     .     .     .     .     .     bbb     .     .     .     .     .     .

2     ccc     aaa     bbb     3/4/2012     5/2/2012     6/1/2012     1/3/2012     2/2/2012     3/6/2012     4/1/2012     5/2/2012     5/6/2012     5/8/2012    

3     aaa     .     .     1/5/2012     .     .     1/1/2012     1/5/2012     2/1/2012     3/1/2012     4/1/2012     5/1/2012     6/1/2012

4     bbb     .     .     2/1/2012     .     .     1/1/2012     1/5/2012     1/6/2012    

5     bbb     ccc     .     3/1/2012     3/16/2012     .     1/1/2012     3/1/2012     3/16/2012

6     aaa     bbb     ccc     1/30/2012     3/1/2012     5/10/2012     1/30/2012     3/1/2012     5/10/2012     6/1/2012

7 . . . . . . . . . . . . . .

;

run;

data want;

    set have;

    array tst(*) test_dt1-test_dt7;

    array drgs(*) $ drug1-drug3;

    array drgs_dt(*) drug1_dt drug2_dt drug3_dt;

    array rsn(*) $8. reason1-reason7;

do _i=1 to dim(tst);

  if missing(tst(_i)) then rsn(_i)='';

  else do;

       if tst(_i)<min(of drgs_dt(*)) then rsn(_i)='unknown';

       else do _d=1 to dim(drgs_dt) while (not missing(drgs_dt(_d)));

           if tst(_i)<drgs_dt(_d) then do;

                do _k=_d-1 by -1 to 0 while (missing(drgs(_k)));

                end;

                rsn(_i)=drgs(_k);

                leave;

            end;

            else rsn(_i)=drgs(_d);

       end;

      

  end;

end;

  drop _:;

run;

Haikuo
c8826024
Calcite | Level 5


Thank you very much. I was sick for a couple of days and would test each code.

c8826024
Calcite | Level 5

My dataset has 7 million records. So Ksharp's solution is not working since the memory is full. Trying Reeza & Hai's solution.....

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1158 views
  • 1 like
  • 5 in conversation