Hi~ currently I have a data set as followed:
[DATA 1 ]
ID Drug Start_DAY End_DAY
A Tx1 2013/01/01 2013/01/03
Tx1 2013/01/04 2013/01/08
Tx2 2013/01/09 2013/01/12
Tx1 2013/01/14 2013/01/16
Tx1 2013/01/22 2013/01/26
B
C
D......
and I'm trying to reconstruct the data sets as followed :
[DATA2]
ID Drug Start End
A Tx1 2013/01/01 2013/01/08
Tx2 2013/01/09 2013/01/12
Missing 2013/01/13 2013/01/13
Tx1 2013/01/14 2013/01/16
Missing 2013/01/17 2013/01/21
Tx1 2013/01/22 2013/01/26
I've tried to use arrays for these reconstruction but it didn't work at all .....
Can any one give me a hand or suggestion??
Thanks ~ !!!
There is probably an easier way to do this, but the following will likely accomplish what you want:
data have;
informat Start_DAY End_DAY yymmdd10.;
format Start_DAY End_DAY yymmdd10.;
input (ID Drug) ($) Start_DAY End_DAY;
cards;
A Tx1 2013/01/01 2013/01/03
A Tx1 2013/01/04 2013/01/08
A Tx2 2013/01/09 2013/01/12
A Tx1 2013/01/14 2013/01/16
A Tx1 2013/01/22 2013/01/26
B Tx2 2013/01/09 2013/01/15
B Tx1 2013/01/22 2013/01/26
;
data want;
set have;
by id;
array jan('01jan2013'd:'31jan2013'd) $ _temporary_;
retain jan;
if first.id then do _n_='01jan2013'd to '31jan2013'd;
jan(_n_)="Missing";
end;
do _n_=Start_DAY to End_day;
jan(_n_)=Drug;
end;
if last.id then do;
do _n_='01jan2013'd to '31jan2013'd;
if _n_ eq '01jan2013'd then do;
Drug=jan(_n_);
Start_Day=_n_;
end;
else do;
if _n_ eq'31jan2013'd then do;
End_day=_n_;
if jan(_n_) ne jan(_n_-1) then do;
Start_Day=_n_;
Drug=jan(_n_);
end;
output;
end;
else if jan(_n_) ne jan(_n_-1) then do;
End_day=_n_-1;
output;
Start_Day=_n_;
Drug=jan(_n_);
end;
end;
end;
end;
run;
Here is a non-array approach, just looking for gaps and outputting the a record whenever a gap is found.
data want; set have; by id start_day; lag_end=lag(end_day); output; if first.id=0 then do; if start_day ne lag_end+1 then do; Drug="Missing"; end_day=start_day-1; start_day=lag_end+1; output; end; end; drop lag_end; run; proc sort data=want; by id start_day; run;
Thanks for the reply ;
but In addition to recognize the missing part,
the reconstructed date sets are more complicated cause it need to combine the date if it's the same drug with no missing day in different row.
such as
Tx1 2013/01/01 2013/01/03
Tx1 2013/01/04 2013/01/08
would turned to be Tx1 2013/01/01 2013/01/08 ;
thanks a lot ~
Maybe I can try to do the missing part first and then try to figure out how to combine dates in different row ....
Hi ~~ thanks for the reply ;
But while I tried to put the code in SAS 9.3
it didn't work though ~
the log file would show
array jan('01jan2013'd.:'31jan2013'd.) $ _temporary_;
------------
22
76
ERROR 22-322: Syntax error, expecting one of the following: a name, an integer constant, *.
ERROR 76-322: Syntax error, statement will be ignored.
did I type it wrong or did I miss something ?
thanks a lot ~
My mistake. The code wasn't tested and, apparently, SAS won't accept the array bounds as written. The following, though, did run successfully on 9.3:
data want;
set have;
by id;
array jan(19359:19389) $ _temporary_;
retain jan;
if first.id then do _n_='01jan2013'd to '31jan2013'd;
jan(_n_)="Missing";
end;
do _n_=Start_DAY to End_day;
jan(_n_)=Drug;
end;
if last.id then do;
do _n_='01jan2013'd to '31jan2013'd;
if _n_ eq '01jan2013'd then do;
Drug=jan(_n_);
Start_Day=_n_;
end;
else do;
if _n_ eq'31jan2013'd then do;
End_day=_n_;
if jan(_n_) ne jan(_n_-1) then do;
Start_Day=_n_;
Drug=jan(_n_);
end;
output;
end;
else if jan(_n_) ne jan(_n_-1) then do;
End_day=_n_-1;
output;
Start_Day=_n_;
Drug=jan(_n_);
end;
end;
end;
end;
run;
How about SYSEVALF and the BOUND functions.
Much better than my hard coding the date values!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.