DATA Step, Macro, Functions and more

Data Step Help

Reply
Contributor
Posts: 38

Data Step Help

Hi, everybody, I have a dataset like the follwoing:

patient_iddrug1drug2drug3drug1_dtdrug1_dtdrug1_dtTest_dt1Test_dt2Test_dt3Test_dt4Test_dt5Test_dt6Test_dt7
1aaabbb1/2/20121/4/2012
2cccaaabbb3/4/20125/2/20126/1/20121/3/20122/2/30123/6/20124/1/20125/2/20125/6/20125/8/2012
3aaa1/5/20121/1/20121/5/20122/1/20123/1/20124/1/20125/1/20126/1/2012
4bbb2/1/20121/1/20121/5/20121/6/2012
5bbbccc3/1/20123/16/20121/1/20123/1/20123/16/2012
6aaabbbccc1/30/20123/1/20125/10/20121/30/20123/1/20125/10/20126/1/2012
7

I want to create a new series of variables, reason code, by carrying over drug name based on the comparison of test date with drug taking date. If test date greater than or equal to drug taking date, then the reason code will be the drug used by the drug taking date. Somthing like this:

patient_iddrug1drug2drug3drug1_dtdrug1_dtdrug1_dtTest_dt1Test_dt2Test_dt3Test_dt4Test_dt5Test_dt6Test_dt7reason1reason2reason3reason4reason5reason6reason7
1aaabbb1/2/20121/4/2012bbb
2cccaaabbb3/4/20125/2/20126/1/20121/3/20122/2/30123/6/20124/1/20125/2/20125/6/20125/8/2012unknownunknownccccccaaaaaaaaa
3aaa1/5/20121/1/20121/5/20122/1/20123/1/20124/1/20125/1/20126/1/2012unknownaaaaaaaaaaaaaaaaaa
4bbb2/1/20121/1/20121/5/20121/6/2012unknownunknownunknown
5bbbccc3/1/20123/16/20121/1/20123/1/20123/16/2012unknownbbbccc
6aaabbbccc1/30/20123/1/20125/10/20121/30/20123/1/20125/10/20126/1/2012aaabbbcccccc

7

Does anybody know how to make it happen in SAS?

Thanks.

Super User
Posts: 19,772

Re: Data Step Help

Are you sure the dates will always be in order for drug_dt?

Why for patient 3/4/5 are there dates in the drug rather than drug names?

I'm assuming you want to work with the data in this structure rather than transposing it?

Will you always know how many drug dates and tests you have?  What happens when drug dates or drugs are missing?

Is there a boundary on the date, ie if the drug is 1 year (6 months, 3 months) after the test is that still valid?

Can you provide the data in a data step, I'm guessing you have missing above and it isn't showing up?

Yes it can be done and not relatively hard...nest a do loop for each test to go through the drug dates and find the first test after or on the date.

Contributor
Posts: 38

Re: Data Step Help

Thank you for your response.

Yes, the drug date and test date will be always in order. If there is no drug name, then the drug date will be missing. When I cut/paste the data to this board, somehome the blanks have been shifted/overwrited by near non-blank values which is mess up my data. Here is the correct data:

patient_iddrug1drug2drug3drug1_dtdrug1_dtdrug1_dtTest_dt1Test_dt2Test_dt3Test_dt4Test_dt5Test_dt6Test_dt7reason1reason2reason3reason4reason5reason6reason7
1aaabbbmissing1/2/2012missingmissing1/4/2012missingmissingmissingmissingmissingmissingbbbmissingmissingmissingmissingmissingmissing
2cccaaabbb3/4/20125/2/20126/1/20121/3/20122/2/30123/6/20124/1/20125/2/20125/6/20125/8/2012unknownunknownccccccaaaaaaaaa
3aaamissingmissing1/5/2012missingmissing1/1/20121/5/20122/1/20123/1/20124/1/20125/1/20126/1/2012unknownaaaaaaaaaaaaaaaaaa
4bbbmissingmissing2/1/2012missingmissing1/1/20121/5/20121/6/2012missingmissingmissingmissingunknownunknownunknownmissingmissingmissingmissing
5bbbcccmissing3/1/20123/16/2012missing1/1/20123/1/20123/16/2012missingmissingmissingmissingunknownbbbcccmissingmissingmissingmissing
6aaabbbccc1/30/20123/1/20125/10/20121/30/20123/1/20125/10/20126/1/2012missingmissingmissingaaabbbccccccmissingmissingmissing
7missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing

You are right. I might not know exact how many drugs or tests will be taken. As a start, I made it 3 drugs and 7 tests.

Contributor
Posts: 38

Re: Data Step Help

There is no boundary on the date.

PROC Star
Posts: 7,468

Re: Data Step Help

Can we presume that the actual variable list was supposed to be:

patient_i drug1 drug2 drug3 drug1_dt drug2_dt drug3_dt Test_dt1 Test_dt2 Test_dt3

and that there wasn't supposed to be a Test_dt4?  And, if so, what would you like the resulting table to look like based on that data.

Super User
Posts: 19,772

Re: Data Step Help

Is the above how your data actually looks, since you don't know the # of drugs or tests?

Super User
Posts: 19,772

Re: Data Step Help

This gets you relatively close. I didn't do too much testing.  You have a 3012 on there I'm assuming should be 2012.

I also renamed drug1_dt to 3 unique names, drug_dt1-drug_dt3 because it was easier to use that way.

You do need to know your array dimensions to work but since you have the variables that should be easy enough.

data want;

    set have;

    array tst(*) test_dt1-test_dt7;

    array drgs(*) drug1-drug3;

    array drgs_dt(*) drug_dt1-drug_dt3;

    array rsn(*) $ reason1-reason7;

   

    do i=1 to dim(tst) while (not missing(tst(i)));

        do j=1 to dim(drgs_dt) while (not missing(drgs_dt(j)));

            if drgs_dt(j)<= tst(i) then do;

                rsn(i)=drgs(j);

            end;

        end;

        if rsn(i)='' then rsn(i)='unknown';

    end;

run;

Super User
Posts: 10,020

Re: Data Step Help

Easy.

data have;
infile cards truncover;
input patient_id     drug1 $     drug2 $     drug3 $     (drug1_dt     drug2_dt     drug3_dt     Test_dt1     Test_dt2     Test_dt3     Test_dt4     Test_dt5     Test_dt6     Test_dt7) (: mmddyy10.);
format drug1_dt     drug2_dt     drug3_dt     Test: mmddyy10.;
cards;
1     aaa     bbb     .     1/2/2012     .     .     1/4/2012     .     .     .     .     .     .     bbb     .     .     .     .     .     .
2     ccc     aaa     bbb     3/4/2012     5/2/2012     6/1/2012     1/3/2012     2/2/3012     3/6/2012     4/1/2012     5/2/2012     5/6/2012     5/8/2012     
3     aaa     .     .     1/5/2012     .     .     1/1/2012     1/5/2012     2/1/2012     3/1/2012     4/1/2012     5/1/2012     6/1/2012
4     bbb     .     .     2/1/2012     .     .     1/1/2012     1/5/2012     1/6/2012     
5     bbb     ccc     .     3/1/2012     3/16/2012     .     1/1/2012     3/1/2012     3/16/2012
6     aaa     bbb     ccc     1/30/2012     3/1/2012     5/10/2012     1/30/2012     3/1/2012     5/10/2012     6/1/2012
;
run;

data temp;
 set have(keep=patient_id drug1 drug2      drug3 drug1_dt     drug2_dt     drug3_dt) ;
 do i=coalesce(drug1_dt,'01dec2012'd) to coalesce(drug2_dt,'01dec2012'd)-1 ;
 key=i;
 reason=drug1;
 output;
 end;
 do i=coalesce(drug2_dt,'01dec2012'd) to coalesce(drug3_dt,'01dec2012'd)-1 ;
 key=i;
 reason=drug2;
 output;
 end;
 do i=coalesce(drug3_dt,'01dec2012'd) to '01dec2012'd ;
 key=i;
 reason=drug3;
 output;
 end;
 keep patient_id key reason;
 run;
data want;
if _n_ =1 then do;
 if 0 then set temp;
 declare hash ha(dataset:'temp');
  ha.definekey('patient_id','key');
  ha.definedata('reason');
  ha.definedone();
end;
 set have;
 array t{7} Test_: ;
 array r{7} $ reason1-reason7 ;
 do i=1 to dim(t);
  key=t{i};
  call missing(reason);
   rc=ha.find();
  r{i}=reason;
 end;
 drop i key rc reason;
run;
 

Ksharp

Respected Advisor
Posts: 3,156

Re: Data Step Help

Work something out based on Fareeza's code:

data have;

infile cards truncover;

input patient_id     drug1 $     drug2 $     drug3 $     (drug1_dt     drug2_dt     drug3_dt     Test_dt1     Test_dt2     Test_dt3     Test_dt4     Test_dt5     Test_dt6     Test_dt7) (: mmddyy10.);

format drug1_dt     drug2_dt     drug3_dt     Test: mmddyy10.;

cards;

1     aaa     bbb     .     1/2/2012     .     .     1/4/2012     .     .     .     .     .     .     bbb     .     .     .     .     .     .

2     ccc     aaa     bbb     3/4/2012     5/2/2012     6/1/2012     1/3/2012     2/2/2012     3/6/2012     4/1/2012     5/2/2012     5/6/2012     5/8/2012    

3     aaa     .     .     1/5/2012     .     .     1/1/2012     1/5/2012     2/1/2012     3/1/2012     4/1/2012     5/1/2012     6/1/2012

4     bbb     .     .     2/1/2012     .     .     1/1/2012     1/5/2012     1/6/2012    

5     bbb     ccc     .     3/1/2012     3/16/2012     .     1/1/2012     3/1/2012     3/16/2012

6     aaa     bbb     ccc     1/30/2012     3/1/2012     5/10/2012     1/30/2012     3/1/2012     5/10/2012     6/1/2012

7 . . . . . . . . . . . . . .

;

run;

data want;

    set have;

    array tst(*) test_dt1-test_dt7;

    array drgs(*) $ drug1-drug3;

    array drgs_dt(*) drug1_dt drug2_dt drug3_dt;

    array rsn(*) $8. reason1-reason7;

do _i=1 to dim(tst);

  if missing(tst(_i)) then rsn(_i)='';

  else do;

       if tst(_i)<min(of drgs_dt(*)) then rsn(_i)='unknown';

       else do _d=1 to dim(drgs_dt) while (not missing(drgs_dt(_d)));

           if tst(_i)<drgs_dt(_d) then do;

                do _k=_d-1 by -1 to 0 while (missing(drgs(_k)));

                end;

                rsn(_i)=drgs(_k);

                leave;

            end;

            else rsn(_i)=drgs(_d);

       end;

      

  end;

end;

  drop _:;

run;

Haikuo
Contributor
Posts: 38

Re: Data Step Help


Thank you very much. I was sick for a couple of days and would test each code.

Contributor
Posts: 38

Re: Data Step Help

My dataset has 7 million records. So Ksharp's solution is not working since the memory is full. Trying Reeza & Hai's solution.....

Ask a Question
Discussion stats
  • 10 replies
  • 328 views
  • 1 like
  • 5 in conversation