Re: How to remove columns with same entries

lalohg · Posted 12-14-2015 03:40 PM

Hi there,

I wonder if someone can help me.

I have a data set with several columns and would like to get rid of those columns with duplicated entries like the example below

I have a data set:
id	a	b	c	d	e
1	blue	blue	blue	blue	red
2	white		white	yellow	red
3	grey	grey	black	black	red
4	orange	orange	orange
5	brown	orange	brown	white	brown

We would llike to remove repeated columns by patient so the data set will look like:
id	a	b	c	d	e
1	blue	red
2	white	yellow	red
3	grey	black	red
4	orange
5	brown	orange	white

all your help will be appreciated

Thanks

Eduardo.

Reeza · Posted 12-14-2015 03:47 PM

Transpose, remove duplicates via proc sort and re-transpose.

lalohg · Posted 12-15-2015 10:27 AM

Problem is that I have lot more variables,

Reeza · Posted 12-15-2015 04:10 PM

Transpose, Remove Duplicates, Transpose again, use a data step to sort in appropriate order.

data have;
infile cards expandtabs truncover;
input id	(a	b	c	d	e) ($);
cards;
1	blue	blue	blue	blue	red
2	white	. 	white	yellow	red
3	grey	grey	black	black	red
4	orange	orange	orange	 	. .
5	brown	orange	brown	white 	brown
;
run;

proc transpose data=have out=temp1(rename=col1=var_a_e);
by id;
var a b c d e;
run;

proc sort data=temp1 nodupkey out=temp2 (where=(var_a_e ne ""));
by id var_a_e;
run;


proc transpose data=temp2 out=temp3;
by id;
id _name_;
var var_a_e;
run;

data want;
retain id a b c d e;
set temp3;
array lett(5) $ a b c d e;
array rev(5) e d c b a;
call sortc(of rev(*));

drop _name_;
run;

proc print;run;

Haikuo · Posted 12-14-2015 08:43 PM

You can also resort to the almighty data step:

data want;
	array t(5) $10 _temporary_;
	set have;
	array ae(5) a--e;
	call missing (of t(*));

	do i=1 to 5;
		if ae(i) not in t then
			do;
				j=sum(j,1);
				t(j)=ae(i);
			end;
	end;

	do i=1 to 5;
		ae(i)=t(i);
	end;

	drop i j;
run;

lalohg · Posted 12-15-2015 10:28 AM

Thanks Haikuo,
I have 36 columns not only 5, I ran what you suggested but didn't work

Ksharp · Posted 12-15-2015 04:55 AM

Or simple IML code :

data have;
infile cards expandtabs truncover;
input id	(a	b	c	d	e) ($);
cards;
1	blue	blue	blue	blue	red
2	white	. 	white	yellow	red
3	grey	grey	black	black	red
4	orange	orange	orange	 	. .
5	brown	orange	brown	white 	brown
;
run;
proc iml;
use have;
read all var {id};
read all var _char_ into x[c=vnames];
y=j(nrow(x),ncol(x),blankstr(nleng(x)));
do i=1 to nrow(x);
 temp=unique(x[i,]); 
 idx=loc(temp^=' ');
 y[i,1:ncol(idx)]=temp[,idx];
end;
print y;
run;

lalohg · Posted 12-15-2015 10:29 AM

Thanks Ksharp, my SAS licence does not include the iml procedure

Ksharp · Posted 12-15-2015 08:12 PM

data have;
infile cards expandtabs truncover;
input id	(a	b	c	d	e) ($);
cards;
1	blue	blue	blue	blue	red
2	white	. 	white	yellow	red
3	grey	grey	black	black	red
4	orange	orange	orange	 	. .
5	brown	orange	brown	white 	brown
;
run;


%let columns=5;  /* <---- */


data want;
	array t(&columns) $ 32 _temporary_;
	set have;
	array ae(*) $ _character_;
	call missing (of t(*));

	do i=1 to dim(ae);
		if ae(i) not in t then
			do;
				j=sum(j,1);
				t(j)=ae(i);
			end;
	end;

	do i=1 to dim(ae);
		ae(i)=t(i);
	end;

	drop i j;
run;

Registration is open

SAS Training: Just a Click Away