Hello all,
Here is a sample data for which I need help with.
data tr1;
id$ datea dateb datec resa resb resc
001 21277 21278 21279 0 1 0
002 21238 21239 21240 1 2 0
003 21200 21201 21202 1 0 1
I want output data as
id$ max_res first_dt last_dt patt
001 1 21278 21278 S
002 2 21238 21239 C
003 1 21200 21202 I
I am new to sas programming and I am stuck with this case for a long time after trying it different ways which I have failed terribly) now I am posting it here.Please help me to resolve the case. Any help is much appreciated.
Thanks
data tr1;
input id$ datea dateb datec resa resb resc;
cards;
001 21277 21278 21279 0 1 0
002 21238 21239 21240 1 2 0
003 21200 21201 21202 1 0 1
;
run;
data want;
set tr1;
array dt date:;
array re res:;
do i=1 to dim(re);
if re[i] > 0 then do;
firstdt=dt[i];
ind1=i;
leave;
end;
end;
do j=dim(re) to 1 by -1;
if re[j] > 0 then do;
lastdt=dt[j];
ind2=j;
leave;
end;
end;
max_res=max(of re[*]);
if ind1=ind2 then
patt='S';
else if ind2=ind1+1 then
patt='C';
else patt='I';
keep id max_res firstdt lastdt patt;
run;
Please explain the logic that let's you go from the input data to the output table.
Sure. datea, dateb and datec are collection dates for results resa,resb and resc(a,b,c are suffixes which reperesnt different timepoints they are collected.). When res_ ne 0 then first_dt is the date at which first result is collected and if there is single result for that subject ,then patt is 's'. If there are results collected continuoulsy at every timepoint then last_dt is the date of last time point and patt='c'.
if results collected at differnt timepoints and are not continuous(suffixes a,b, c etc) then patt=I and last-dt=last time point results are collected.
@skom1 wrote:
Sure. datea, dateb and datec are collection dates for results resa,resb and resc(a,b,c are suffixes which reperesnt different timepoints they are collected.). When res_ ne 0 then first_dt is the date at which first result is collected and if there is single result for that subject ,then patt is 's'. If there are results collected continuoulsy at every timepoint then last_dt is the date of last time point and patt='c'.
if results collected at differnt timepoints and are not continuous(suffixes a,b, c etc) then patt=I and last-dt=last time point results are collected.
Is there a fixed number of these suffixes? Your statement above shows etc after c which sort of implies that there may be MANY of these. Your example data only shows a, b, and c but if there is a variable number then some approaches may not work.
You don't show dat with "continuously collected at every timepoint" if resc=0 is supposed to mean not collected. So you need to provide a more concise definition of the "C" conditions. And if you actually have more than 3 res variable what happens if you have the first and second with a value other than 0, the third has a 0 and the fourth and fifth have values other than 0?
Yes. there are more suffixes(a to h). "C' is only populated when results are collected at all successive timepoints. In the case you specified, if first, second results ne '0' and thirds=0 and 4th, 5th are ne '0' , patt = 'I'.
data tr1;
input id$ datea dateb datec resa resb resc;
cards;
001 21277 21278 21279 0 1 0
002 21238 21239 21240 1 2 0
003 21200 21201 21202 1 0 1
;
data want;
set tr1;
array t resa--resc;
array d datea--datec;
array temp(3) _temporary_;
max_resc=max(of t(*));
call missing(of temp(*));
do over t;
if t then temp(_i_)=d;
end;
firstdate=min(of temp(*));
lastdate=max(of temp(*));
keep id firstdate lastdate max_resc;
run;
I think there's yet more work necessary. I don't think the dates would come out properly yet.
Here's a similar approach for you to consider:
data want;
set have;
array res {3} resa resb resc;
array dts {3} datea dateb datec;
n_res=0;
do k=1 to 3;
if res{k} then do;
n_res + 1;
first_dt = min(first_dt, dts{k});
last_dt = max(last_dt, dts{k});
end;
end;
maxres = max(resa, resb, resc);
if n_res=1 then patt='S';
else if n_res = 3 then patt='C';
else if n_res = 2 then do;
if resb=0 then patt='I';
else patt = 'C';
end;
run;
data tr1;
input id$ datea dateb datec resa resb resc;
cards;
001 21277 21278 21279 0 1 0
002 21238 21239 21240 1 2 0
003 21200 21201 21202 1 0 1
;
data want;
set tr1;
array t resa--resc;
array d datea--datec;
array temp(3) _temporary_;
max_resc=max(of t(*));
call missing(of temp(*));
do over t;
if t then do;
temp(_i_)=d;
if _t>. and _i_-_t>1 then Patt = 'I';
else patt='C';
_t=_i_;
end;
end;
if n(of temp(*))=1 then patt='S';
firstdate=min(of temp(*));
lastdate=max(of temp(*));
keep id firstdate lastdate max_resc patt;
run;
data tr1;
input id$ datea dateb datec resa resb resc;
cards;
001 21277 21278 21279 0 1 0
002 21238 21239 21240 1 2 0
003 21200 21201 21202 1 0 1
;
run;
data want;
set tr1;
array dt date:;
array re res:;
do i=1 to dim(re);
if re[i] > 0 then do;
firstdt=dt[i];
ind1=i;
leave;
end;
end;
do j=dim(re) to 1 by -1;
if re[j] > 0 then do;
lastdt=dt[j];
ind2=j;
leave;
end;
end;
max_res=max(of re[*]);
if ind1=ind2 then
patt='S';
else if ind2=ind1+1 then
patt='C';
else patt='I';
keep id max_res firstdt lastdt patt;
run;
You can make an array parallel to RESA,RESB,RESC in which each array element (call it _RES{i}) is missing whenever the analogous RES variable is zero, and is set to equal i otherwise. So for the third obs the _RES array would be {1,.,3}.
Then apply the MAX, MIN, and N functions to _RES to identify the element of the corresponding DATE array to use for first_dt, last_dt, and patt:
data have;
input id$ datea dateb datec resa resb resc;
cards;
001 21277 21278 21279 0 1 0
002 21238 21239 21240 1 2 0
003 21200 21201 21202 1 0 1
;
data want (drop=i _r:);
set have;
array dat {*} date: ;
array res {*} res: ;
array _res {3}; /*Gets a missing value when RES=0, or else the element index when RES^=0*/
do i=1 to 3;
if res{i}^=0 then _res{i}=i;
end;
first_dt=dat{min(of _res{*})};
last_dt=dat{max(of _res{*})};
if first_dt=last_dt then patt='S';
else if max(of _res{*}) = min(of _res{*}) + n(of _res{*})-1 then patt='C';
else patt='I';
run;
Two comments:
Hi,
here is the program and its error message.
26 data ZRS1 /*(drop=i _r:)*/;
27 set ZRS;
28 array dat {*} date: ;
29 array res {*} res: ;
30
31 array _res {8} ;
32 do i=1 to 8;
33 mindt = min(of dat[*]);
34 maxdt = max(of dat[*]);
35 if res{i}^='0' then _res{i}=i;
36 end;
37
38 if _res{i}=. then do;
39
40 first_dt =mindt;
41 last_dt = maxdt;
42 end;
43
44 else if _res{i} ne . then do;
45 first_dt=dat{min(of _res{*})};
46 last_dt=dat{max(of _res{*})};end;
47
48
49 if first_dt=last_dt then patt='S';
50 else if max(of _res{*}) = min(of _res{*}) + n(of _res{*})-1 then patt='C';
51 else patt='I';
52 run;
ERROR: Array subscript out of range at line 38 column 5.
Thank You.
I'm not sure how you got from my program to yours, but it is not close enough for me to give compact answers.
Strip your program back to mine. Make only the changes necessary to accommodate 8 RES values rather than the 3 in my suggestion. Then if you have a problem report back to us.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.