Hi, everybody, I have a dataset like the follwoing:
patient_id | drug1 | drug2 | drug3 | drug1_dt | drug1_dt | drug1_dt | Test_dt1 | Test_dt2 | Test_dt3 | Test_dt4 | Test_dt5 | Test_dt6 | Test_dt7 |
1 | aaa | bbb | 1/2/2012 | 1/4/2012 | |||||||||
2 | ccc | aaa | bbb | 3/4/2012 | 5/2/2012 | 6/1/2012 | 1/3/2012 | 2/2/3012 | 3/6/2012 | 4/1/2012 | 5/2/2012 | 5/6/2012 | 5/8/2012 |
3 | aaa | 1/5/2012 | 1/1/2012 | 1/5/2012 | 2/1/2012 | 3/1/2012 | 4/1/2012 | 5/1/2012 | 6/1/2012 | ||||
4 | bbb | 2/1/2012 | 1/1/2012 | 1/5/2012 | 1/6/2012 | ||||||||
5 | bbb | ccc | 3/1/2012 | 3/16/2012 | 1/1/2012 | 3/1/2012 | 3/16/2012 | ||||||
6 | aaa | bbb | ccc | 1/30/2012 | 3/1/2012 | 5/10/2012 | 1/30/2012 | 3/1/2012 | 5/10/2012 | 6/1/2012 | |||
7 |
I want to create a new series of variables, reason code, by carrying over drug name based on the comparison of test date with drug taking date. If test date greater than or equal to drug taking date, then the reason code will be the drug used by the drug taking date. Somthing like this:
patient_id | drug1 | drug2 | drug3 | drug1_dt | drug1_dt | drug1_dt | Test_dt1 | Test_dt2 | Test_dt3 | Test_dt4 | Test_dt5 | Test_dt6 | Test_dt7 | reason1 | reason2 | reason3 | reason4 | reason5 | reason6 | reason7 |
1 | aaa | bbb | 1/2/2012 | 1/4/2012 | bbb | |||||||||||||||
2 | ccc | aaa | bbb | 3/4/2012 | 5/2/2012 | 6/1/2012 | 1/3/2012 | 2/2/3012 | 3/6/2012 | 4/1/2012 | 5/2/2012 | 5/6/2012 | 5/8/2012 | unknown | unknown | ccc | ccc | aaa | aaa | aaa |
3 | aaa | 1/5/2012 | 1/1/2012 | 1/5/2012 | 2/1/2012 | 3/1/2012 | 4/1/2012 | 5/1/2012 | 6/1/2012 | unknown | aaa | aaa | aaa | aaa | aaa | aaa | ||||
4 | bbb | 2/1/2012 | 1/1/2012 | 1/5/2012 | 1/6/2012 | unknown | unknown | unknown | ||||||||||||
5 | bbb | ccc | 3/1/2012 | 3/16/2012 | 1/1/2012 | 3/1/2012 | 3/16/2012 | unknown | bbb | ccc | ||||||||||
6 | aaa | bbb | ccc | 1/30/2012 | 3/1/2012 | 5/10/2012 | 1/30/2012 | 3/1/2012 | 5/10/2012 | 6/1/2012 | aaa | bbb | ccc | ccc | ||||||
7 |
Does anybody know how to make it happen in SAS?
Thanks.
Are you sure the dates will always be in order for drug_dt?
Why for patient 3/4/5 are there dates in the drug rather than drug names?
I'm assuming you want to work with the data in this structure rather than transposing it?
Will you always know how many drug dates and tests you have? What happens when drug dates or drugs are missing?
Is there a boundary on the date, ie if the drug is 1 year (6 months, 3 months) after the test is that still valid?
Can you provide the data in a data step, I'm guessing you have missing above and it isn't showing up?
Yes it can be done and not relatively hard...nest a do loop for each test to go through the drug dates and find the first test after or on the date.
Thank you for your response.
Yes, the drug date and test date will be always in order. If there is no drug name, then the drug date will be missing. When I cut/paste the data to this board, somehome the blanks have been shifted/overwrited by near non-blank values which is mess up my data. Here is the correct data:
patient_id | drug1 | drug2 | drug3 | drug1_dt | drug1_dt | drug1_dt | Test_dt1 | Test_dt2 | Test_dt3 | Test_dt4 | Test_dt5 | Test_dt6 | Test_dt7 | reason1 | reason2 | reason3 | reason4 | reason5 | reason6 | reason7 |
1 | aaa | bbb | missing | 1/2/2012 | missing | missing | 1/4/2012 | missing | missing | missing | missing | missing | missing | bbb | missing | missing | missing | missing | missing | missing |
2 | ccc | aaa | bbb | 3/4/2012 | 5/2/2012 | 6/1/2012 | 1/3/2012 | 2/2/3012 | 3/6/2012 | 4/1/2012 | 5/2/2012 | 5/6/2012 | 5/8/2012 | unknown | unknown | ccc | ccc | aaa | aaa | aaa |
3 | aaa | missing | missing | 1/5/2012 | missing | missing | 1/1/2012 | 1/5/2012 | 2/1/2012 | 3/1/2012 | 4/1/2012 | 5/1/2012 | 6/1/2012 | unknown | aaa | aaa | aaa | aaa | aaa | aaa |
4 | bbb | missing | missing | 2/1/2012 | missing | missing | 1/1/2012 | 1/5/2012 | 1/6/2012 | missing | missing | missing | missing | unknown | unknown | unknown | missing | missing | missing | missing |
5 | bbb | ccc | missing | 3/1/2012 | 3/16/2012 | missing | 1/1/2012 | 3/1/2012 | 3/16/2012 | missing | missing | missing | missing | unknown | bbb | ccc | missing | missing | missing | missing |
6 | aaa | bbb | ccc | 1/30/2012 | 3/1/2012 | 5/10/2012 | 1/30/2012 | 3/1/2012 | 5/10/2012 | 6/1/2012 | missing | missing | missing | aaa | bbb | ccc | ccc | missing | missing | missing |
7 | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing |
You are right. I might not know exact how many drugs or tests will be taken. As a start, I made it 3 drugs and 7 tests.
There is no boundary on the date.
Can we presume that the actual variable list was supposed to be:
patient_i drug1 drug2 drug3 drug1_dt drug2_dt drug3_dt Test_dt1 Test_dt2 Test_dt3
and that there wasn't supposed to be a Test_dt4? And, if so, what would you like the resulting table to look like based on that data.
Is the above how your data actually looks, since you don't know the # of drugs or tests?
This gets you relatively close. I didn't do too much testing. You have a 3012 on there I'm assuming should be 2012.
I also renamed drug1_dt to 3 unique names, drug_dt1-drug_dt3 because it was easier to use that way.
You do need to know your array dimensions to work but since you have the variables that should be easy enough.
data want;
set have;
array tst(*) test_dt1-test_dt7;
array drgs(*) drug1-drug3;
array drgs_dt(*) drug_dt1-drug_dt3;
array rsn(*) $ reason1-reason7;
do i=1 to dim(tst) while (not missing(tst(i)));
do j=1 to dim(drgs_dt) while (not missing(drgs_dt(j)));
if drgs_dt(j)<= tst(i) then do;
rsn(i)=drgs(j);
end;
end;
if rsn(i)='' then rsn(i)='unknown';
end;
run;
Easy.
data have; infile cards truncover; input patient_id drug1 $ drug2 $ drug3 $ (drug1_dt drug2_dt drug3_dt Test_dt1 Test_dt2 Test_dt3 Test_dt4 Test_dt5 Test_dt6 Test_dt7) (: mmddyy10.); format drug1_dt drug2_dt drug3_dt Test: mmddyy10.; cards; 1 aaa bbb . 1/2/2012 . . 1/4/2012 . . . . . . bbb . . . . . . 2 ccc aaa bbb 3/4/2012 5/2/2012 6/1/2012 1/3/2012 2/2/3012 3/6/2012 4/1/2012 5/2/2012 5/6/2012 5/8/2012 3 aaa . . 1/5/2012 . . 1/1/2012 1/5/2012 2/1/2012 3/1/2012 4/1/2012 5/1/2012 6/1/2012 4 bbb . . 2/1/2012 . . 1/1/2012 1/5/2012 1/6/2012 5 bbb ccc . 3/1/2012 3/16/2012 . 1/1/2012 3/1/2012 3/16/2012 6 aaa bbb ccc 1/30/2012 3/1/2012 5/10/2012 1/30/2012 3/1/2012 5/10/2012 6/1/2012 ; run; data temp; set have(keep=patient_id drug1 drug2 drug3 drug1_dt drug2_dt drug3_dt) ; do i=coalesce(drug1_dt,'01dec2012'd) to coalesce(drug2_dt,'01dec2012'd)-1 ; key=i; reason=drug1; output; end; do i=coalesce(drug2_dt,'01dec2012'd) to coalesce(drug3_dt,'01dec2012'd)-1 ; key=i; reason=drug2; output; end; do i=coalesce(drug3_dt,'01dec2012'd) to '01dec2012'd ; key=i; reason=drug3; output; end; keep patient_id key reason; run; data want; if _n_ =1 then do; if 0 then set temp; declare hash ha(dataset:'temp'); ha.definekey('patient_id','key'); ha.definedata('reason'); ha.definedone(); end; set have; array t{7} Test_: ; array r{7} $ reason1-reason7 ; do i=1 to dim(t); key=t{i}; call missing(reason); rc=ha.find(); r{i}=reason; end; drop i key rc reason; run;
Ksharp
Work something out based on Fareeza's code:
data have;
infile cards truncover;
input patient_id drug1 $ drug2 $ drug3 $ (drug1_dt drug2_dt drug3_dt Test_dt1 Test_dt2 Test_dt3 Test_dt4 Test_dt5 Test_dt6 Test_dt7) (: mmddyy10.);
format drug1_dt drug2_dt drug3_dt Test: mmddyy10.;
cards;
1 aaa bbb . 1/2/2012 . . 1/4/2012 . . . . . . bbb . . . . . .
2 ccc aaa bbb 3/4/2012 5/2/2012 6/1/2012 1/3/2012 2/2/2012 3/6/2012 4/1/2012 5/2/2012 5/6/2012 5/8/2012
3 aaa . . 1/5/2012 . . 1/1/2012 1/5/2012 2/1/2012 3/1/2012 4/1/2012 5/1/2012 6/1/2012
4 bbb . . 2/1/2012 . . 1/1/2012 1/5/2012 1/6/2012
5 bbb ccc . 3/1/2012 3/16/2012 . 1/1/2012 3/1/2012 3/16/2012
6 aaa bbb ccc 1/30/2012 3/1/2012 5/10/2012 1/30/2012 3/1/2012 5/10/2012 6/1/2012
7 . . . . . . . . . . . . . .
;
run;
data want;
set have;
array tst(*) test_dt1-test_dt7;
array drgs(*) $ drug1-drug3;
array drgs_dt(*) drug1_dt drug2_dt drug3_dt;
array rsn(*) $8. reason1-reason7;
do _i=1 to dim(tst);
if missing(tst(_i)) then rsn(_i)='';
else do;
if tst(_i)<min(of drgs_dt(*)) then rsn(_i)='unknown';
else do _d=1 to dim(drgs_dt) while (not missing(drgs_dt(_d)));
if tst(_i)<drgs_dt(_d) then do;
do _k=_d-1 by -1 to 0 while (missing(drgs(_k)));
end;
rsn(_i)=drgs(_k);
leave;
end;
else rsn(_i)=drgs(_d);
end;
end;
end;
drop _:;
run;
Thank you very much. I was sick for a couple of days and would test each code.
My dataset has 7 million records. So Ksharp's solution is not working since the memory is full. Trying Reeza & Hai's solution.....
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.