Contributor
Posts: 62

sas programming

we have 1000 rows and thousand columns like

col1 col2 col3 col4.......col1000

1      2     3     1

2     3     4      2

3     2     1      3

4     1      2     4

.      .         .       .

.      .        .       .

1000 122 133 1000

and so on .We need to find those columns in which values are same as in col1 and col4.

How will do that?

Super User
Posts: 13,583

Re: sas programming

So are you looking to compare col1 and col4 to each other or to some of the other thousand variables?

Provide some example data where col1 and col4 are not the same, values for other variables, and what you would explect the output to look like for the example. You don't need to show 1000 columns but 3 or 4 extra are helpful.

Your question could be interpretted as:

Find col1 = col4 (very easy);

Col1 equal to any of col5 to col1000 or Col4 equal to any of col5 to col1000

which gets a bit more complicated as what would the output look like if Col1 matches Col26, Col57 and Col345 while Col4 matched Col88, Col123 and col 835.

Contributor
Posts: 62

Re: sas programming

yes we need to find all those columns which are same.For Ex

Col1 col2 col3 col4 col5

1        2       1       3       1

2        3        2      4       2

3        1        3      4       3

4        5        4      2       4

here we should get col1 col3 and col5 as they have same values across the columns

Super User
Posts: 10,787

Re: sas programming

OK. Take Col1 for example. And make a macro to contain all these one thousand variables.

Or Would like to use IML code ? It looks like it is very easy for IML .

``````data have;
input Col1 col2 col3 col4 col5;
cards;
1        2       1       3       1
2        3        2      4       2
3        1        3      4       3
4        5        4      2       4
;
run;
proc sql noprint;
select count(*) into : n from have;
create table temp as
select &n as col1,
sum(col1=col2) as col2,
sum(col1=col3) as col3,
sum(col1=col4) as col4,
sum(col1=col5) as col5
from have;
quit;
proc transpose data=temp out=x;run;
proc sql noprint;
select _name_ into : list separated by ','
from x
where col1=&n;
create view want as
select &list from have;
quit;``````

Super User
Posts: 10,787

Re: sas programming

Here is IML code. I believe it should be very fast. Still take Col1 as reference column.

``````data have;
input Col1 col2 col3 col4 col5;
cards;
1        2       1       3       1
2        3        2      4       2
3        1        3      4       3
4        5        4      2       4
;
run;
proc iml;
use have;
read all var _all_ into x[c=vnames];
close;
idx=j(1,ncol(x),0);
idx[1]=1;
do i=2 to ncol(x);
if all(x[,1]=x[,i]) then idx[i]=i;
end;
idx=idx[loc(idx^=0)];
y=x[,idx];
col=vnames[,idx];
create want from y[c=col];
append from y;
close;
quit;``````
Super User
Posts: 5,884

Posts: 5,541

Re: sas programming

Groups of identical columns can be found rather simply with a transposition of the data and a sorting operation using the nouniquekey option :

``````data have;
input col1 col2 col3 col4 col5 col6 col7;
datalines;
1     2     3     1 1  1 2
2     3     4     2 20 2 3
3     2     1     3 30 3 2
4     1     2     4 40 4 1
;

proc sql;
select cats("line_", count(*)) into :lastLine from have;
quit;

proc transpose data=have out=havet prefix=line_; run;

proc sort data=havet out=wantt nouniquekey; by line_:; run;

data want;
length group 8;
set wantt; by line_:;
if first.&lastLine. then group + 1;
run;``````
PG
Super User
Posts: 23,776

Re: sas programming

You could also run a PROC CORR on the data. Any variables with a correlation of 1 would be identical.
Posts: 5,541

Re: sas programming

[ Edited ]

Great idea! Correlation would identify identical columns almost certainly. But if Col1 = 2*Col2 +1, the correlation will be 1.0000

PG
Super User
Posts: 23,776

Re: sas programming

I agree, the transpose/sort is the way to go.
Depending on what reason the OP is looking for to find the exact same values, perhaps a measure of similarity is good too.
Super User
Posts: 13,583

Re: sas programming

Variables with missing values for some observations could have a correlation of 1.

Discussion stats
• 10 replies
• 545 views
• 1 like
• 6 in conversation