SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
skom1
Calcite | Level 5

Hello all,

 

Here is a sample data for which I need help with.

 

data tr1;

 id$       datea         dateb      datec      resa      resb      resc

001       21277       21278      21279      0            1          0

002       21238       21239      21240      1            2          0

003       21200       21201      21202      1            0          1

 

I want output data as

 

id$         max_res      first_dt         last_dt           patt

001             1             21278           21278           S

002             2             21238           21239           C

003             1             21200           21202            I

 

 

I am new to sas programming and I am stuck with this case for a long time after trying it different ways which I have failed terribly) now  I am posting it here.Please help me to resolve the case. Any help is much appreciated.

 

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
r_behata
Barite | Level 11
data tr1;
input id$       datea         dateb      datec      resa      resb      resc;
cards;
001       21277       21278      21279      0            1          0
002       21238       21239      21240      1            2          0
003       21200       21201      21202      1            0          1
;
run;


data want;
set tr1;

array dt date:;
array re  res:;


do i=1 to dim(re);
if re[i] > 0 then do;
	firstdt=dt[i];
	ind1=i;
	leave;
end;
end;

do j=dim(re) to 1 by -1;
if re[j] > 0 then do;
	lastdt=dt[j];
	ind2=j;
	leave;
end;
end;

max_res=max(of re[*]);
 
if ind1=ind2 then	
	patt='S';
else if ind2=ind1+1 then
	patt='C';
else 	patt='I';

keep id max_res firstdt lastdt patt;
run;

View solution in original post

16 REPLIES 16
PaigeMiller
Diamond | Level 26

Please explain the logic that let's you go from the input data to the output table.

--
Paige Miller
skom1
Calcite | Level 5

Sure. datea, dateb and datec are collection dates for results resa,resb and resc(a,b,c are suffixes which reperesnt different timepoints they are collected.). When res_  ne 0 then first_dt is the date at which first result is collected and if there is single result for that subject ,then patt is 's'. If there are results collected continuoulsy  at every timepoint then last_dt is the date of  last time point and patt='c'.

if results collected at differnt timepoints and are not continuous(suffixes a,b, c etc) then patt=I and last-dt=last time point results are collected.

 

ballardw
Super User

@skom1 wrote:

Sure. datea, dateb and datec are collection dates for results resa,resb and resc(a,b,c are suffixes which reperesnt different timepoints they are collected.). When res_  ne 0 then first_dt is the date at which first result is collected and if there is single result for that subject ,then patt is 's'. If there are results collected continuoulsy  at every timepoint then last_dt is the date of  last time point and patt='c'.

if results collected at differnt timepoints and are not continuous(suffixes a,b, c etc) then patt=I and last-dt=last time point results are collected.

 


Is there a fixed number of these suffixes? Your statement above shows etc after c which sort of implies that there may be MANY of these. Your example data only shows a, b, and c but if there is a variable number then some approaches may not work.

 

You don't show dat with "continuously collected at every timepoint" if resc=0 is supposed to mean not collected. So you need to provide a more concise definition of the "C" conditions. And if you actually have more than 3 res variable what happens if you have the first and second with a value other than 0, the third has a 0 and the fourth and fifth have values other than 0?

 

 

skom1
Calcite | Level 5

Yes. there are more suffixes(a to h). "C' is only populated when results are collected at all successive timepoints. In the case you specified, if first, second results ne '0' and thirds=0 and 4th, 5th are ne '0' , patt = 'I'.

novinosrin
Tourmaline | Level 20
data tr1;
input id$       datea         dateb      datec      resa      resb      resc;
cards;
001       21277       21278      21279      0            1          0
002       21238       21239      21240      1            2          0
003       21200       21201      21202      1            0          1
;

data want;
set tr1;
array t resa--resc;
array d datea--datec;
array temp(3)  _temporary_;
max_resc=max(of t(*));
call missing(of temp(*));
do over t;
 if t then temp(_i_)=d;
end;
firstdate=min(of temp(*));
lastdate=max(of temp(*));
keep id firstdate lastdate max_resc;
run;
skom1
Calcite | Level 5
Thank You. But I see 'Patt' variable is not derived. could you please help me with that as well.
if single result, patt=S
if results are collected continuously then Patt=C
if results collected intermittently then Patt = 'I'.
Thank you
Astounding
PROC Star

I think there's yet more work necessary.  I don't think the dates would come out properly yet.

 

Here's a similar approach for you to consider:

 

data want;
set have;
array res {3} resa resb resc;
array dts {3} datea dateb datec;
n_res=0;
do k=1 to 3;
   if res{k} then do;
      n_res + 1;
      first_dt = min(first_dt, dts{k});
      last_dt = max(last_dt, dts{k});
  end;end;
maxres = max(resa, resb, resc);
if n_res=1 then patt='S';
else if n_res = 3 then patt='C';
else if n_res = 2 then do;
   if resb=0 then patt='I';
   else patt = 'C';
end;
run;



novinosrin
Tourmaline | Level 20

data tr1;
input id$       datea         dateb      datec      resa      resb      resc;
cards;
001       21277       21278      21279      0            1          0
002       21238       21239      21240      1            2          0
003       21200       21201      21202      1            0          1
;

data want;
set tr1;
array t resa--resc;
array d datea--datec;
array temp(3)  _temporary_;
max_resc=max(of t(*));
call missing(of temp(*));
do over t;
 if t then do;
  temp(_i_)=d;
 if _t>. and  _i_-_t>1 then Patt = 'I';
 else patt='C';
 _t=_i_;
 end;
end;
if n(of temp(*))=1 then patt='S';
firstdate=min(of temp(*));
lastdate=max(of temp(*));
keep id firstdate lastdate max_resc patt;
run;
r_behata
Barite | Level 11
data tr1;
input id$       datea         dateb      datec      resa      resb      resc;
cards;
001       21277       21278      21279      0            1          0
002       21238       21239      21240      1            2          0
003       21200       21201      21202      1            0          1
;
run;


data want;
set tr1;

array dt date:;
array re  res:;


do i=1 to dim(re);
if re[i] > 0 then do;
	firstdt=dt[i];
	ind1=i;
	leave;
end;
end;

do j=dim(re) to 1 by -1;
if re[j] > 0 then do;
	lastdt=dt[j];
	ind2=j;
	leave;
end;
end;

max_res=max(of re[*]);
 
if ind1=ind2 then	
	patt='S';
else if ind2=ind1+1 then
	patt='C';
else 	patt='I';

keep id max_res firstdt lastdt patt;
run;
mkeintz
PROC Star

You can make an array parallel to RESA,RESB,RESC in which each array element (call it _RES{i}) is missing whenever the analogous RES variable is zero, and is set to equal i otherwise.   So for the third obs the _RES array would be {1,.,3}.

 

Then apply the MAX, MIN, and N functions to _RES to identify the element of the corresponding DATE array to use for first_dt, last_dt, and patt:

 

data have;
input id$       datea         dateb      datec      resa      resb      resc;
cards;
001       21277       21278      21279      0            1          0
002       21238       21239      21240      1            2          0
003       21200       21201      21202      1            0          1
;


data want (drop=i _r:);
  set have;
  array dat {*} date: ;
  array res {*} res:  ;

  array _res {3};   /*Gets a missing value when RES=0, or else the element index when RES^=0*/
  do i=1 to 3;
    if res{i}^=0 then _res{i}=i;
  end;

  first_dt=dat{min(of _res{*})};
  last_dt=dat{max(of _res{*})};

  if first_dt=last_dt then patt='S';
  else if max(of _res{*}) = min(of _res{*}) + n(of _res{*})-1 then patt='C';
  else patt='I';
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
skom1
Calcite | Level 5
Thank you for the reply. The dataset I am currently working on have 8 dates and result variables. When I am trying to use your code it says array is out of range at _res{i}. Can you help me debug. Thanks
mkeintz
PROC Star

Two comments:

 

  1. You report an error message, but it is unaccompanied by a copy of the problem program, or the log containing the message.  Help us help you by providing information needed to diagnose and prescribe.
  2. HOWEVER ... you refer to having 8 RES (and DATE) variables, so that means your arrays (including _RES) should all have 8 elements.  (My example defined _RES as having only 3).  And any do loops from 1 to 3 should be changed to loop from 1 to 8.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
skom1
Calcite | Level 5

Hi,

here is the program and its error message.

 


26 data ZRS1 /*(drop=i _r:)*/;
27 set ZRS;
28 array dat {*} date: ;
29 array res {*} res: ;
30
31 array _res {8} ;
32 do i=1 to 8;
33 mindt = min(of dat[*]);
34 maxdt = max(of dat[*]);
35 if res{i}^='0' then _res{i}=i;
36 end;
37
38 if _res{i}=. then do;
39
40 first_dt =mindt;
41 last_dt = maxdt;
42 end;
43
44 else if _res{i} ne . then do;
45 first_dt=dat{min(of _res{*})};
46 last_dt=dat{max(of _res{*})};end;
47
48
49 if first_dt=last_dt then patt='S';
50 else if max(of _res{*}) = min(of _res{*}) + n(of _res{*})-1 then patt='C';
51 else patt='I';
52 run;

ERROR: Array subscript out of range at line 38 column 5.

 

 

Thank You.

mkeintz
PROC Star

 

I'm not sure how you got from my program to yours, but it is not close enough for me to give compact answers.

 

Strip your program back to mine.  Make only the changes necessary to accommodate 8 RES values rather than the 3 in my suggestion.  Then if you have a problem report back to us.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 1712 views
  • 1 like
  • 7 in conversation