Re: Data Step Help

c8826024 · Posted 11-30-2012 03:47 PM

Hi, everybody, I have a dataset like the follwoing:

patient_id

drug1

drug2

drug3

drug1_dt

Test_dt1

Test_dt2

Test_dt3

Test_dt4

Test_dt5

Test_dt6

Test_dt7

1

aaa

bbb

1/2/2012

1/4/2012

2

ccc

aaa

bbb

3/4/2012

5/2/2012

6/1/2012

1/3/2012

2/2/3012

3/6/2012

4/1/2012

5/2/2012

5/6/2012

5/8/2012

3

aaa

1/5/2012

1/1/2012

1/5/2012

2/1/2012

3/1/2012

4/1/2012

5/1/2012

6/1/2012

4

bbb

2/1/2012

1/1/2012

1/5/2012

1/6/2012

5

bbb

ccc

3/1/2012

3/16/2012

1/1/2012

3/1/2012

3/16/2012

6

aaa

bbb

ccc

1/30/2012

3/1/2012

5/10/2012

1/30/2012

3/1/2012

5/10/2012

6/1/2012

7

I want to create a new series of variables, reason code, by carrying over drug name based on the comparison of test date with drug taking date. If test date greater than or equal to drug taking date, then the reason code will be the drug used by the drug taking date. Somthing like this:

patient_id

drug1

drug2

drug3

drug1_dt

Test_dt1

Test_dt2

Test_dt3

Test_dt4

Test_dt5

Test_dt6

Test_dt7

reason1

reason2

reason3

reason4

reason5

reason6

reason7

1

aaa

bbb

1/2/2012

1/4/2012

bbb

2

ccc

aaa

bbb

3/4/2012

5/2/2012

6/1/2012

1/3/2012

2/2/3012

3/6/2012

4/1/2012

5/2/2012

5/6/2012

5/8/2012

unknown

ccc

aaa

3

aaa

1/5/2012

1/1/2012

1/5/2012

2/1/2012

3/1/2012

4/1/2012

5/1/2012

6/1/2012

unknown

aaa

4

bbb

2/1/2012

1/1/2012

1/5/2012

1/6/2012

unknown

5

bbb

ccc

3/1/2012

3/16/2012

1/1/2012

3/1/2012

3/16/2012

unknown

bbb

ccc

6

aaa

bbb

ccc

1/30/2012

3/1/2012

5/10/2012

1/30/2012

3/1/2012

5/10/2012

6/1/2012

aaa

bbb

ccc

7

Does anybody know how to make it happen in SAS?

Thanks.

Reeza · Posted 11-30-2012 04:05 PM

Are you sure the dates will always be in order for drug_dt?

Why for patient 3/4/5 are there dates in the drug rather than drug names?

I'm assuming you want to work with the data in this structure rather than transposing it?

Will you always know how many drug dates and tests you have? What happens when drug dates or drugs are missing?

Is there a boundary on the date, ie if the drug is 1 year (6 months, 3 months) after the test is that still valid?

Can you provide the data in a data step, I'm guessing you have missing above and it isn't showing up?

Yes it can be done and not relatively hard...nest a do loop for each test to go through the drug dates and find the first test after or on the date.

c8826024 · Posted 11-30-2012 04:48 PM

Thank you for your response.

Yes, the drug date and test date will be always in order. If there is no drug name, then the drug date will be missing. When I cut/paste the data to this board, somehome the blanks have been shifted/overwrited by near non-blank values which is mess up my data. Here is the correct data:

patient_id

drug1

drug2

drug3

drug1_dt

Test_dt1

Test_dt2

Test_dt3

Test_dt4

Test_dt5

Test_dt6

Test_dt7

reason1

reason2

reason3

reason4

reason5

reason6

reason7

1

aaa

bbb

missing

1/2/2012

missing

1/4/2012

missing

bbb

missing

2

ccc

aaa

bbb

3/4/2012

5/2/2012

6/1/2012

1/3/2012

2/2/3012

3/6/2012

4/1/2012

5/2/2012

5/6/2012

5/8/2012

unknown

ccc

aaa

3

aaa

missing

1/5/2012

missing

1/1/2012

1/5/2012

2/1/2012

3/1/2012

4/1/2012

5/1/2012

6/1/2012

unknown

aaa

4

bbb

missing

2/1/2012

missing

1/1/2012

1/5/2012

1/6/2012

missing

unknown

missing

5

bbb

ccc

missing

3/1/2012

3/16/2012

missing

1/1/2012

3/1/2012

3/16/2012

missing

unknown

bbb

ccc

missing

6

aaa

bbb

ccc

1/30/2012

3/1/2012

5/10/2012

1/30/2012

3/1/2012

5/10/2012

6/1/2012

missing

aaa

bbb

ccc

missing

7

missing

You are right. I might not know exact how many drugs or tests will be taken. As a start, I made it 3 drugs and 7 tests.

c8826024 · Posted 11-30-2012 04:49 PM

There is no boundary on the date.

art297 · Posted 11-30-2012 05:05 PM

Can we presume that the actual variable list was supposed to be:

patient_i drug1 drug2 drug3 drug1_dt drug2_dt drug3_dt Test_dt1 Test_dt2 Test_dt3

and that there wasn't supposed to be a Test_dt4? And, if so, what would you like the resulting table to look like based on that data.

Reeza · Posted 11-30-2012 05:07 PM

Is the above how your data actually looks, since you don't know the # of drugs or tests?

Reeza · Posted 11-30-2012 05:29 PM

This gets you relatively close. I didn't do too much testing. You have a 3012 on there I'm assuming should be 2012.

I also renamed drug1_dt to 3 unique names, drug_dt1-drug_dt3 because it was easier to use that way.

You do need to know your array dimensions to work but since you have the variables that should be easy enough.

data want;

set have;

array tst(*) test_dt1-test_dt7;

array drgs(*) drug1-drug3;

array drgs_dt(*) drug_dt1-drug_dt3;

array rsn(*) $ reason1-reason7;

do i=1 to dim(tst) while (not missing(tst(i)));

do j=1 to dim(drgs_dt) while (not missing(drgs_dt(j)));

if drgs_dt(j)<= tst(i) then do;

rsn(i)=drgs(j);

end;

if rsn(i)='' then rsn(i)='unknown';

end;

run;

Ksharp · Posted 12-03-2012 02:05 AM

Easy.

data have;
infile cards truncover;
input patient_id     drug1 $     drug2 $     drug3 $     (drug1_dt     drug2_dt     drug3_dt     Test_dt1     Test_dt2     Test_dt3     Test_dt4     Test_dt5     Test_dt6     Test_dt7) (: mmddyy10.);
format drug1_dt     drug2_dt     drug3_dt     Test: mmddyy10.;
cards;
1     aaa     bbb     .     1/2/2012     .     .     1/4/2012     .     .     .     .     .     .     bbb     .     .     .     .     .     .
2     ccc     aaa     bbb     3/4/2012     5/2/2012     6/1/2012     1/3/2012     2/2/3012     3/6/2012     4/1/2012     5/2/2012     5/6/2012     5/8/2012     
3     aaa     .     .     1/5/2012     .     .     1/1/2012     1/5/2012     2/1/2012     3/1/2012     4/1/2012     5/1/2012     6/1/2012
4     bbb     .     .     2/1/2012     .     .     1/1/2012     1/5/2012     1/6/2012     
5     bbb     ccc     .     3/1/2012     3/16/2012     .     1/1/2012     3/1/2012     3/16/2012
6     aaa     bbb     ccc     1/30/2012     3/1/2012     5/10/2012     1/30/2012     3/1/2012     5/10/2012     6/1/2012
;
run;

data temp;
 set have(keep=patient_id drug1 drug2      drug3 drug1_dt     drug2_dt     drug3_dt) ;
 do i=coalesce(drug1_dt,'01dec2012'd) to coalesce(drug2_dt,'01dec2012'd)-1 ;
 key=i;
 reason=drug1;
 output;
 end;
 do i=coalesce(drug2_dt,'01dec2012'd) to coalesce(drug3_dt,'01dec2012'd)-1 ;
 key=i;
 reason=drug2;
 output;
 end;
 do i=coalesce(drug3_dt,'01dec2012'd) to '01dec2012'd ;
 key=i;
 reason=drug3;
 output;
 end;
 keep patient_id key reason;
 run;
data want;
if _n_ =1 then do;
 if 0 then set temp;
 declare hash ha(dataset:'temp');
  ha.definekey('patient_id','key');
  ha.definedata('reason');
  ha.definedone();
end;
 set have;
 array t{7} Test_: ;
 array r{7} $ reason1-reason7 ;
 do i=1 to dim(t);
  key=t{i};
  call missing(reason);
   rc=ha.find();
  r{i}=reason;
 end;
 drop i key rc reason;
run;

Ksharp

Haikuo · Posted 12-04-2012 01:09 AM

Work something out based on Fareeza's code:

data have;

infile cards truncover;

input patient_id drug1 $ drug2 $ drug3 $ (drug1_dt drug2_dt drug3_dt Test_dt1 Test_dt2 Test_dt3 Test_dt4 Test_dt5 Test_dt6 Test_dt7) (: mmddyy10.);

format drug1_dt drug2_dt drug3_dt Test: mmddyy10.;

cards;

1 aaa bbb . 1/2/2012 . . 1/4/2012 . . . . . . bbb . . . . . .

2 ccc aaa bbb 3/4/2012 5/2/2012 6/1/2012 1/3/2012 2/2/2012 3/6/2012 4/1/2012 5/2/2012 5/6/2012 5/8/2012

3 aaa . . 1/5/2012 . . 1/1/2012 1/5/2012 2/1/2012 3/1/2012 4/1/2012 5/1/2012 6/1/2012

4 bbb . . 2/1/2012 . . 1/1/2012 1/5/2012 1/6/2012

5 bbb ccc . 3/1/2012 3/16/2012 . 1/1/2012 3/1/2012 3/16/2012

6 aaa bbb ccc 1/30/2012 3/1/2012 5/10/2012 1/30/2012 3/1/2012 5/10/2012 6/1/2012

7 . . . . . . . . . . . . . .

;

run;

data want;

set have;

array tst(*) test_dt1-test_dt7;

array drgs(*) $ drug1-drug3;

array drgs_dt(*) drug1_dt drug2_dt drug3_dt;

array rsn(*) $8. reason1-reason7;

do _i=1 to dim(tst);

if missing(tst(_i)) then rsn(_i)='';

else do;

if tst(_i)<min(of drgs_dt(*)) then rsn(_i)='unknown';

else do _d=1 to dim(drgs_dt) while (not missing(drgs_dt(_d)));

if tst(_i)<drgs_dt(_d) then do;

do _k=_d-1 by -1 to 0 while (missing(drgs(_k)));

end;

rsn(_i)=drgs(_k);

leave;

end;

else rsn(_i)=drgs(_d);

end;

drop _:;

run;

Haikuo

c8826024 · Posted 12-04-2012 03:08 PM

Thank you very much. I was sick for a couple of days and would test each code.

c8826024 · Posted 12-05-2012 02:10 PM

My dataset has 7 million records. So Ksharp's solution is not working since the memory is full. Trying Reeza & Hai's solution.....

Haikuo

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away