Hi,
I am struggling to retain a value of "Y" for all observations by ID if the value "Y" exists for any record within the same ID:
For example, my data looks like this:
ID | indicator1 | indicator2 | indicator3 | indicator4 | indicator5 |
001 | Y | ||||
001 | Y | ||||
001 | Y | ||||
001 | Y | ||||
002 | Y | ||||
002 | Y | ||||
002 | Y | ||||
002 | |||||
002 | Y |
but I want it to look like this within the indicatorXnew fields:
ID | indicator1 | indicator2 | indicator3 | indicator4 | indicator5 | indicator1new | indicator2new | indicator3new | indicator4new | indicator5new |
001 | Y | Y | Y | Y | Y | |||||
001 | Y | Y | Y | Y | Y | |||||
001 | Y | Y | Y | Y | Y | |||||
001 | Y | Y | Y | Y | Y | |||||
002 | Y | Y | Y | Y | Y | |||||
002 | Y | Y | Y | Y | Y | |||||
002 | Y | Y | Y | Y | Y | |||||
002 | Y | Y | Y | Y | ||||||
002 | Y | Y | Y | Y | Y |
I have some familiarity with RETAIN but only for summing numeric values. I would share code but nothing I try works correctly, but here is the code to generate data set HAVE:
data HAVE;
infile datalines dlm='|' truncover;
input ID$ indicator1$ indicator2$ indicator3$ indicator4$ indicator5$;
datalines;
001|| ||Y|| || || ||
001||Y|| || || || ||
001|| || ||Y|| || ||
001|| || || || || Y
002||Y|| || || || ||
002|| || ||Y|| || ||
002|| ||Y|| || || ||
002|| || || || || ||
002|| || || ||Y|| ||
;run;
I would appreciate any help, thank you
Here's a variation that is such an odd duck, you might not see it again:
data want;
do until (last.id);
update have (obs=0) have;
by id;
end;
indicator1new = indicator1;
indicator2new = indicator2;
indicator3new = indicator3;
indicator4new = indicator4;
indicator5new = indicator5;
do until (last.id);
set have;
by id;
output;
end;
run;
Here you go.
data have;
infile datalines dlm='|' truncover;
input ID $ (indicator1 indicator2 indicator3 indicator4 indicator5) (:$1.);
datalines;
001|| ||Y|| || || ||
001||Y|| || || || ||
001|| || ||Y|| || ||
001|| || || || || Y
002||Y|| || || || ||
002|| || ||Y|| || ||
002|| ||Y|| || || ||
002|| || || || || ||
002|| || || ||Y|| ||
;
proc sort data=have;
by id;
run;
data inter;
set have;
by id;
array ind{*} indicator1-indicator5;
array new_ind{*} $1 new_indicator1-new_indicator5;
retain new_ind;
keep id new_indicator1-new_indicator5;
do i=1 to dim(new_ind);
if ind[i]='Y' then new_ind[i]='Y';
end;
if last.id then
do;
output;
call missing(of new_ind[*]);
end;
run;
data want;
merge have inter;
by id;
run;
And here an alternative coding option.
data want2;
set have;
by id;
array ind{*} indicator1-indicator5;
array new_ind{*} $1 new_indicator1-new_indicator5;
retain new_ind;
if first.id then
do;
call missing(of new_ind[*]);
do until(last.id);
set have;
by id;
do i=1 to dim(new_ind);
if ind[i]='Y' then new_ind[i]='Y';
end;
end;
end;
drop i;
run;
Collapse to one record per BY group. If your variables are really coded as Y or missing then you could use UPDATE statement to do the collapsing. The UPDATE statement will ignore the missing values when applying the transaction observations.
data summary;
update have(obs=0) have;
by id;
run;
If you then need those variable renamed and remerged onto the detailed records use a MERGE statement. If you have a series of variables with a numeric counter it works much better to have the counter as a suffix so that you can use a variable list.
data want;
merge have summary(rename=(indicator1-indicator5=new_indicator1-new_indicator5));
by id;
run;
data have; infile datalines dlm='|' truncover; input ID $ (indicator1 indicator2 indicator3 indicator4 indicator5) (:$1.); datalines; 001|| ||Y|| || || || 001||Y|| || || || || 001|| || ||Y|| || || 001|| || || || || Y 002||Y|| || || || || 002|| || ||Y|| || || 002|| ||Y|| || || || 002|| || || || || || 002|| || || ||Y|| || ; proc sql; create table want as select *, max(indicator1) as new1, max(indicator2) as new2, max(indicator3) as new3, max(indicator4) as new4, max(indicator5) as new5 from have group by id; quit;
Here's a variation that is such an odd duck, you might not see it again:
data want;
do until (last.id);
update have (obs=0) have;
by id;
end;
indicator1new = indicator1;
indicator2new = indicator2;
indicator3new = indicator3;
indicator4new = indicator4;
indicator5new = indicator5;
do until (last.id);
set have;
by id;
output;
end;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.