- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
Here is a sample data for which I need help with.
data tr1;
id$ datea dateb datec resa resb resc
001 21277 21278 21279 0 1 0
002 21238 21239 21240 1 2 0
003 21200 21201 21202 1 0 1
I want output data as
id$ max_res first_dt last_dt patt
001 1 21278 21278 S
002 2 21238 21239 C
003 1 21200 21202 I
I am new to sas programming and I am stuck with this case for a long time after trying it different ways which I have failed terribly) now I am posting it here.Please help me to resolve the case. Any help is much appreciated.
Thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data tr1;
input id$ datea dateb datec resa resb resc;
cards;
001 21277 21278 21279 0 1 0
002 21238 21239 21240 1 2 0
003 21200 21201 21202 1 0 1
;
run;
data want;
set tr1;
array dt date:;
array re res:;
do i=1 to dim(re);
if re[i] > 0 then do;
firstdt=dt[i];
ind1=i;
leave;
end;
end;
do j=dim(re) to 1 by -1;
if re[j] > 0 then do;
lastdt=dt[j];
ind2=j;
leave;
end;
end;
max_res=max(of re[*]);
if ind1=ind2 then
patt='S';
else if ind2=ind1+1 then
patt='C';
else patt='I';
keep id max_res firstdt lastdt patt;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please explain the logic that let's you go from the input data to the output table.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sure. datea, dateb and datec are collection dates for results resa,resb and resc(a,b,c are suffixes which reperesnt different timepoints they are collected.). When res_ ne 0 then first_dt is the date at which first result is collected and if there is single result for that subject ,then patt is 's'. If there are results collected continuoulsy at every timepoint then last_dt is the date of last time point and patt='c'.
if results collected at differnt timepoints and are not continuous(suffixes a,b, c etc) then patt=I and last-dt=last time point results are collected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@skom1 wrote:
Sure. datea, dateb and datec are collection dates for results resa,resb and resc(a,b,c are suffixes which reperesnt different timepoints they are collected.). When res_ ne 0 then first_dt is the date at which first result is collected and if there is single result for that subject ,then patt is 's'. If there are results collected continuoulsy at every timepoint then last_dt is the date of last time point and patt='c'.
if results collected at differnt timepoints and are not continuous(suffixes a,b, c etc) then patt=I and last-dt=last time point results are collected.
Is there a fixed number of these suffixes? Your statement above shows etc after c which sort of implies that there may be MANY of these. Your example data only shows a, b, and c but if there is a variable number then some approaches may not work.
You don't show dat with "continuously collected at every timepoint" if resc=0 is supposed to mean not collected. So you need to provide a more concise definition of the "C" conditions. And if you actually have more than 3 res variable what happens if you have the first and second with a value other than 0, the third has a 0 and the fourth and fifth have values other than 0?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes. there are more suffixes(a to h). "C' is only populated when results are collected at all successive timepoints. In the case you specified, if first, second results ne '0' and thirds=0 and 4th, 5th are ne '0' , patt = 'I'.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data tr1;
input id$ datea dateb datec resa resb resc;
cards;
001 21277 21278 21279 0 1 0
002 21238 21239 21240 1 2 0
003 21200 21201 21202 1 0 1
;
data want;
set tr1;
array t resa--resc;
array d datea--datec;
array temp(3) _temporary_;
max_resc=max(of t(*));
call missing(of temp(*));
do over t;
if t then temp(_i_)=d;
end;
firstdate=min(of temp(*));
lastdate=max(of temp(*));
keep id firstdate lastdate max_resc;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
if single result, patt=S
if results are collected continuously then Patt=C
if results collected intermittently then Patt = 'I'.
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think there's yet more work necessary. I don't think the dates would come out properly yet.
Here's a similar approach for you to consider:
data want;
set have;
array res {3} resa resb resc;
array dts {3} datea dateb datec;
n_res=0;
do k=1 to 3;
if res{k} then do;
n_res + 1;
first_dt = min(first_dt, dts{k});
last_dt = max(last_dt, dts{k});
end;end;
maxres = max(resa, resb, resc);
if n_res=1 then patt='S';
else if n_res = 3 then patt='C';
else if n_res = 2 then do;
if resb=0 then patt='I';
else patt = 'C';
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data tr1;
input id$ datea dateb datec resa resb resc;
cards;
001 21277 21278 21279 0 1 0
002 21238 21239 21240 1 2 0
003 21200 21201 21202 1 0 1
;
data want;
set tr1;
array t resa--resc;
array d datea--datec;
array temp(3) _temporary_;
max_resc=max(of t(*));
call missing(of temp(*));
do over t;
if t then do;
temp(_i_)=d;
if _t>. and _i_-_t>1 then Patt = 'I';
else patt='C';
_t=_i_;
end;
end;
if n(of temp(*))=1 then patt='S';
firstdate=min(of temp(*));
lastdate=max(of temp(*));
keep id firstdate lastdate max_resc patt;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data tr1;
input id$ datea dateb datec resa resb resc;
cards;
001 21277 21278 21279 0 1 0
002 21238 21239 21240 1 2 0
003 21200 21201 21202 1 0 1
;
run;
data want;
set tr1;
array dt date:;
array re res:;
do i=1 to dim(re);
if re[i] > 0 then do;
firstdt=dt[i];
ind1=i;
leave;
end;
end;
do j=dim(re) to 1 by -1;
if re[j] > 0 then do;
lastdt=dt[j];
ind2=j;
leave;
end;
end;
max_res=max(of re[*]);
if ind1=ind2 then
patt='S';
else if ind2=ind1+1 then
patt='C';
else patt='I';
keep id max_res firstdt lastdt patt;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can make an array parallel to RESA,RESB,RESC in which each array element (call it _RES{i}) is missing whenever the analogous RES variable is zero, and is set to equal i otherwise. So for the third obs the _RES array would be {1,.,3}.
Then apply the MAX, MIN, and N functions to _RES to identify the element of the corresponding DATE array to use for first_dt, last_dt, and patt:
data have;
input id$ datea dateb datec resa resb resc;
cards;
001 21277 21278 21279 0 1 0
002 21238 21239 21240 1 2 0
003 21200 21201 21202 1 0 1
;
data want (drop=i _r:);
set have;
array dat {*} date: ;
array res {*} res: ;
array _res {3}; /*Gets a missing value when RES=0, or else the element index when RES^=0*/
do i=1 to 3;
if res{i}^=0 then _res{i}=i;
end;
first_dt=dat{min(of _res{*})};
last_dt=dat{max(of _res{*})};
if first_dt=last_dt then patt='S';
else if max(of _res{*}) = min(of _res{*}) + n(of _res{*})-1 then patt='C';
else patt='I';
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Two comments:
- You report an error message, but it is unaccompanied by a copy of the problem program, or the log containing the message. Help us help you by providing information needed to diagnose and prescribe.
- HOWEVER ... you refer to having 8 RES (and DATE) variables, so that means your arrays (including _RES) should all have 8 elements. (My example defined _RES as having only 3). And any do loops from 1 to 3 should be changed to loop from 1 to 8.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
here is the program and its error message.
26 data ZRS1 /*(drop=i _r:)*/;
27 set ZRS;
28 array dat {*} date: ;
29 array res {*} res: ;
30
31 array _res {8} ;
32 do i=1 to 8;
33 mindt = min(of dat[*]);
34 maxdt = max(of dat[*]);
35 if res{i}^='0' then _res{i}=i;
36 end;
37
38 if _res{i}=. then do;
39
40 first_dt =mindt;
41 last_dt = maxdt;
42 end;
43
44 else if _res{i} ne . then do;
45 first_dt=dat{min(of _res{*})};
46 last_dt=dat{max(of _res{*})};end;
47
48
49 if first_dt=last_dt then patt='S';
50 else if max(of _res{*}) = min(of _res{*}) + n(of _res{*})-1 then patt='C';
51 else patt='I';
52 run;
ERROR: Array subscript out of range at line 38 column 5.
Thank You.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure how you got from my program to yours, but it is not close enough for me to give compact answers.
Strip your program back to mine. Make only the changes necessary to accommodate 8 RES values rather than the 3 in my suggestion. Then if you have a problem report back to us.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------