Hi there,
I wonder if someone can help me.
I have a data set with several columns and would like to get rid of those columns with duplicated entries like the example below
I have a data set: | |||||
id | a | b | c | d | e |
1 | blue | blue | blue | blue | red |
2 | white | white | yellow | red | |
3 | grey | grey | black | black | red |
4 | orange | orange | orange | ||
5 | brown | orange | brown | white | brown |
We would llike to remove repeated columns by patient so the data set will look like: | |||||
id | a | b | c | d | e |
1 | blue | red | |||
2 | white | yellow | red | ||
3 | grey | black | red | ||
4 | orange | ||||
5 | brown | orange | white |
all your help will be appreciated
Thanks
Eduardo.
Transpose, Remove Duplicates, Transpose again, use a data step to sort in appropriate order.
data have;
infile cards expandtabs truncover;
input id (a b c d e) ($);
cards;
1 blue blue blue blue red
2 white . white yellow red
3 grey grey black black red
4 orange orange orange . .
5 brown orange brown white brown
;
run;
proc transpose data=have out=temp1(rename=col1=var_a_e);
by id;
var a b c d e;
run;
proc sort data=temp1 nodupkey out=temp2 (where=(var_a_e ne ""));
by id var_a_e;
run;
proc transpose data=temp2 out=temp3;
by id;
id _name_;
var var_a_e;
run;
data want;
retain id a b c d e;
set temp3;
array lett(5) $ a b c d e;
array rev(5) e d c b a;
call sortc(of rev(*));
drop _name_;
run;
proc print;run;
You can also resort to the almighty data step:
data want;
array t(5) $10 _temporary_;
set have;
array ae(5) a--e;
call missing (of t(*));
do i=1 to 5;
if ae(i) not in t then
do;
j=sum(j,1);
t(j)=ae(i);
end;
end;
do i=1 to 5;
ae(i)=t(i);
end;
drop i j;
run;
Or simple IML code :
data have;
infile cards expandtabs truncover;
input id (a b c d e) ($);
cards;
1 blue blue blue blue red
2 white . white yellow red
3 grey grey black black red
4 orange orange orange . .
5 brown orange brown white brown
;
run;
proc iml;
use have;
read all var {id};
read all var _char_ into x[c=vnames];
y=j(nrow(x),ncol(x),blankstr(nleng(x)));
do i=1 to nrow(x);
temp=unique(x[i,]);
idx=loc(temp^=' ');
y[i,1:ncol(idx)]=temp[,idx];
end;
print y;
run;
data have;
infile cards expandtabs truncover;
input id (a b c d e) ($);
cards;
1 blue blue blue blue red
2 white . white yellow red
3 grey grey black black red
4 orange orange orange . .
5 brown orange brown white brown
;
run;
%let columns=5; /* <---- */
data want;
array t(&columns) $ 32 _temporary_;
set have;
array ae(*) $ _character_;
call missing (of t(*));
do i=1 to dim(ae);
if ae(i) not in t then
do;
j=sum(j,1);
t(j)=ae(i);
end;
end;
do i=1 to dim(ae);
ae(i)=t(i);
end;
drop i j;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.