sas programming

Reply
Contributor
Posts: 62

sas programming

we have 1000 rows and thousand columns like

col1 col2 col3 col4.......col1000

1      2     3     1

2     3     4      2

3     2     1      3

4     1      2     4

.      .         .       .

.      .        .       .

1000 122 133 1000

and so on .We need to find those columns in which values are same as in col1 and col4.

How will do that?

Super User
Posts: 11,343

Re: sas programming

So are you looking to compare col1 and col4 to each other or to some of the other thousand variables?

 

Provide some example data where col1 and col4 are not the same, values for other variables, and what you would explect the output to look like for the example. You don't need to show 1000 columns but 3 or 4 extra are helpful.

 

Your question could be interpretted as:

Find col1 = col4 (very easy);

 

Col1 equal to any of col5 to col1000 or Col4 equal to any of col5 to col1000

which gets a bit more complicated as what would the output look like if Col1 matches Col26, Col57 and Col345 while Col4 matched Col88, Col123 and col 835.

Contributor
Posts: 62

Re: sas programming

yes we need to find all those columns which are same.For Ex

 

Col1 col2 col3 col4 col5

1        2       1       3       1

2        3        2      4       2

3        1        3      4       3

4        5        4      2       4

here we should get col1 col3 and col5 as they have same values across the columns 

 

 

 

 

Super User
Posts: 10,044

Re: sas programming

OK. Take Col1 for example. And make a macro to contain all these one thousand variables.

Or Would like to use IML code ? It looks like it is very easy for IML .

 

 

data have;
input Col1 col2 col3 col4 col5;
cards;
1        2       1       3       1
2        3        2      4       2
3        1        3      4       3
4        5        4      2       4
;
run;
proc sql noprint;
 select count(*) into : n from have;
create table temp as
 select &n as col1,
        sum(col1=col2) as col2,
        sum(col1=col3) as col3,
        sum(col1=col4) as col4,
        sum(col1=col5) as col5 
   from have;
quit;
proc transpose data=temp out=x;run;
proc sql noprint;
 select _name_ into : list separated by ','
  from x
   where col1=&n;
create view want as
 select &list from have;
quit;

 

 

Super User
Posts: 10,044

Re: sas programming

Here is IML code. I believe it should be very fast. Still take Col1 as reference column.

 

 

data have;
input Col1 col2 col3 col4 col5;
cards;
1        2       1       3       1
2        3        2      4       2
3        1        3      4       3
4        5        4      2       4
;
run;
proc iml;
use have;
read all var _all_ into x[c=vnames];
close;
idx=j(1,ncol(x),0);
idx[1]=1;
do i=2 to ncol(x);
 if all(x[,1]=x[,i]) then idx[i]=i;
end;
idx=idx[loc(idx^=0)];
y=x[,idx];
col=vnames[,idx];
create want from y[c=col];
append from y;
close;
quit;
Super User
Posts: 5,437

Re: sas programming

Respected Advisor
Posts: 4,932

Re: sas programming

Groups of identical columns can be found rather simply with a transposition of the data and a sorting operation using the nouniquekey option :

 

data have;
input col1 col2 col3 col4 col5 col6 col7;
datalines;
1     2     3     1 1  1 2
2     3     4     2 20 2 3 
3     2     1     3 30 3 2 
4     1     2     4 40 4 1
;

proc sql;
select cats("line_", count(*)) into :lastLine from have; 
quit;
 
proc transpose data=have out=havet prefix=line_; run;

proc sort data=havet out=wantt nouniquekey; by line_:; run;

data want;
length group 8;
set wantt; by line_:;
if first.&lastLine. then group + 1;
run;
PG
Super User
Posts: 19,865

Re: sas programming

You could also run a PROC CORR on the data. Any variables with a correlation of 1 would be identical.
Respected Advisor
Posts: 4,932

Re: sas programming

[ Edited ]

Great idea! Correlation would identify identical columns almost certainly. But if Col1 = 2*Col2 +1, the correlation will be 1.0000 

PG
Super User
Posts: 19,865

Re: sas programming

I agree, the transpose/sort is the way to go.
Depending on what reason the OP is looking for to find the exact same values, perhaps a measure of similarity is good too.
Super User
Posts: 11,343

Re: sas programming

Variables with missing values for some observations could have a correlation of 1.

Ask a Question
Discussion stats
  • 10 replies
  • 520 views
  • 1 like
  • 6 in conversation