Hi there,
I wonder if someone can help me.
I have a data set with several columns and would like to get rid of those columns with duplicated entries like the example below
I have a data set: | |||||
id | a | b | c | d | e |
1 | blue | blue | blue | blue | red |
2 | white | white | yellow | red | |
3 | grey | grey | black | black | red |
4 | orange | orange | orange | ||
5 | brown | orange | brown | white | brown |
We would llike to remove repeated columns by patient so the data set will look like: | |||||
id | a | b | c | d | e |
1 | blue | red | |||
2 | white | yellow | red | ||
3 | grey | black | red | ||
4 | orange | ||||
5 | brown | orange | white |
all your help will be appreciated
Thanks
Eduardo.
Transpose, Remove Duplicates, Transpose again, use a data step to sort in appropriate order.
data have;
infile cards expandtabs truncover;
input id (a b c d e) ($);
cards;
1 blue blue blue blue red
2 white . white yellow red
3 grey grey black black red
4 orange orange orange . .
5 brown orange brown white brown
;
run;
proc transpose data=have out=temp1(rename=col1=var_a_e);
by id;
var a b c d e;
run;
proc sort data=temp1 nodupkey out=temp2 (where=(var_a_e ne ""));
by id var_a_e;
run;
proc transpose data=temp2 out=temp3;
by id;
id _name_;
var var_a_e;
run;
data want;
retain id a b c d e;
set temp3;
array lett(5) $ a b c d e;
array rev(5) e d c b a;
call sortc(of rev(*));
drop _name_;
run;
proc print;run;
You can also resort to the almighty data step:
data want;
array t(5) $10 _temporary_;
set have;
array ae(5) a--e;
call missing (of t(*));
do i=1 to 5;
if ae(i) not in t then
do;
j=sum(j,1);
t(j)=ae(i);
end;
end;
do i=1 to 5;
ae(i)=t(i);
end;
drop i j;
run;
Or simple IML code :
data have;
infile cards expandtabs truncover;
input id (a b c d e) ($);
cards;
1 blue blue blue blue red
2 white . white yellow red
3 grey grey black black red
4 orange orange orange . .
5 brown orange brown white brown
;
run;
proc iml;
use have;
read all var {id};
read all var _char_ into x[c=vnames];
y=j(nrow(x),ncol(x),blankstr(nleng(x)));
do i=1 to nrow(x);
temp=unique(x[i,]);
idx=loc(temp^=' ');
y[i,1:ncol(idx)]=temp[,idx];
end;
print y;
run;
data have;
infile cards expandtabs truncover;
input id (a b c d e) ($);
cards;
1 blue blue blue blue red
2 white . white yellow red
3 grey grey black black red
4 orange orange orange . .
5 brown orange brown white brown
;
run;
%let columns=5; /* <---- */
data want;
array t(&columns) $ 32 _temporary_;
set have;
array ae(*) $ _character_;
call missing (of t(*));
do i=1 to dim(ae);
if ae(i) not in t then
do;
j=sum(j,1);
t(j)=ae(i);
end;
end;
do i=1 to dim(ae);
ae(i)=t(i);
end;
drop i j;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.