Reg:Finding in all the exesting variables

sas_Forum · Posted 11-10-2011 04:14 AM

i am having 4 variables f1 f2 f3 f4

data test;
input f1 f2 f3 f4;
cards;
1 2 3 4
4 5 1 2
8 1 9 0
3 4 4 1
9 12 12 14
2 16 17 18
run;
cards;

actually i want to find out the observations that are exesting in f1 f2 f3 and f4 in the above example 1 is repeating in all the four variable
at any time ,and 2 is there in f1,f2 and in f4 i want to find out the variable then are in atleast in 3 variables

output

f1 f2 f3 f4
1 1 1 1
2 2 . 2

Tom · Posted 11-10-2011 06:33 AM

Your data structure does not match the way you are referring to the information.

You talk about the values inside individual variables as if they were the observations.

If you restructure the data into two variables, one with the value and one with the column label then the query becomes easier.

data test;

do col='f1','f2','f3','f4';

input value @; output;

end;

cards;

1 2 3 4

4 5 1 2

8 1 9 0

3 4 4 1

9 12 12 14

2 16 17 18

run;

proc sql noprint ;

create table want as

select distinct value

from test

group by value

having count(distinct col) > 2

;

quit;

proc print; run;

sas_Forum · Posted 11-10-2011 07:41 AM

i am having 4 variables f1 f2 f3 f4

data test;
input name add f1 f2 f3 f4;
cards;
a b 1 2 3 4
c g 4 5 1 2
y u 8 1 9 0
r p 3 4 4 1
k l 9 12 12 14
m n 2 16 17 18
run;
cards;

actually i want to find out the observations that are exesting in f1 f2 f3 and f4 in the above example 1 is repeating in all the four variable
at any time ,and 2 is there in f1,f2 and in f4 i want to find out the variable then are in atleast in 3 variables.

output

f1 f2 f3 f4
1 1 1 1
2 2 . 2

Tom · Posted 11-10-2011 07:59 AM

I still have no idea what you are actually trying to do, but try this method using PROC TRANSPOSE to convert your 4 variables into 4 observations so that we can query over them.

proc transpose data=test out=test2 ;
  by name add notsorted ;
  var f1-f4 ;
run;
proc sql noprint; 
  create table test2 as 
    select distinct col1 as value,_name_,col1
    from test2 
    group by col1
    having count(distinct _name_) > 2
    order by 1,2
  ;
quit;
proc transpose data=test2 out=want(drop=_name_) ;
  by value;
  id _name_;
  var col1 ;
run;
proc print;
run;

output:

Obs value f1 f2 f3 f4

1 1 1 1 1 1

2 2 2 2 . 2

3 4 4 4 4 4

sas_Forum · Posted 11-10-2011 08:08 AM

But i am having 50 variables beside that f1,f2,f3 and f4 ,i cant use proc transpose and data is in 30 laks

art297 · Posted 11-10-2011 08:19 AM

Some questions:

What is the full range of values the variables can have?

Your example mentioned knowing values being shared by 2, 3 and 4 of the variables. Does that now translate to be shared by 2,3,4 ..etc .. through all 50 variables?

Comment: 3 million isn't that big of a data set.

sas_Forum · Posted 11-10-2011 08:46 AM

yes

Tom · Posted 11-10-2011 09:51 AM

Does that mean that it takes 50 variables to uniquely identify an observation in your source dataset? So that you are worried about the space/time to store 3 million * 4 * (length of 50 variables plus the _name_ and COL1 variables) bytes of data?

From your original request you do not seem to care about those variables so it should be fine not to include them in the transposed data.

You can always use a data step view to construct the transposed data to feed to the PROC SQL code so that you do not need to store that intermediate dataset at all.

data test2 / view=test2 ;

length _name_ $32 col1 8 ;

set test ;

array vars f1-f4 ;

do i=1 to dim(vars);

_name_ = vname(vars(i));

col1 = vars(i);

output;

end; keep _name_ col1; run;

Or does it mean that you want to search over 50 variables instead of 4. The difference in space to store 50 variables per observation versus one variable in 50 times as many observations is not much.

Peter_C · Posted 11-11-2011 07:09 AM

I think the biggest problem seems to be generalising.

this seems to provide some flexibility while making only a single pass through the original data

%let your_data = test ;

%let list_values2test = 1 2 ;

%let list_columns2test = f1 f2 f3 f4 ;

%let required = 3 ;

%let num_values = %sysfunc( countw( &list_values2test )) ;

%let num_tests = %sysfunc( countw( &list_columns2test )) ;

data results( keep= &list_columns2test ) ;

***********************************************************************

* array results holds flags indicating required value found in a col *

**********************************************************************;

array results( &num_values, &num_tests ) $1 _temporary_ ;

array inputs ( &num_tests ) &list_columns2test ;

array outputs( &num_tests ) &list_columns2test ;

***********************************************************************

* array values lists the values to be hunted among the columns

**********************************************************************;

array values ( &num_values ) _temporary_ ( &list_values2test ) ;

do row= 1 by 1 while( not end_of_data ) ;

set &your_data end= end_of_data ;

do col = 1 to &num_tests ;

which = whichn( inputs(col), of values(*) ) ;

if which then results( which, col ) = 'x' ;

end ;

do test = 1 to &num_values;

found =0 ;

value = values( test ) ;

call missing( of outputs(*) );

do col=1 to &num_tests ;

if results( test, col ) ='x' then do ;

found + 1 ;

outputs( col ) = value ;

end ;

if found >= &required then output ;

end ;

stop ;

run ;

tested and working for this set of data

If performance is an issue, you could stop testing once all flags have been set for a value

Ksharp · Posted 11-11-2011 01:24 AM

How about:

data test;
input f1 f2 f3 f4;
cards;
1 2 3 4
4 5 1 2
8 1 9 0
3 4 4 1
9 12 12 14
2 16 17 17
;
run;
proc sql noprint;
 select quote(cats('_',name)) into : name separated by ','
  from dictionary.columns
   where libname='WORK' and memname='TEST' and upcase(name) like 'F%'
    order by varnum;

 select cats('_',name) into : _name separated by ' '
  from dictionary.columns
   where libname='WORK' and memname='TEST' and upcase(name) like 'F%'
    order by varnum;
quit;
%put _user_;
data want(keep=_f:);
 if _n_ eq 1 then do;
                    length k &_name 8;
                    declare hash ha (hashexp: 20);
                    declare hiter hi ('ha');
                     ha.definekey('k');
                     ha.definedata(&name);
                     ha.definedone();
                   end;
 set test end=last;
 array ff{*} f: ; array _ff{*} _f: ;
 do i=1 to dim(ff);
  k=ff{i]; rc=ha.find(); _ff{i}=ff{i}; ha.replace();
  call missing(of _ff{*});
 end;
 if last then do;
               do while(hi.next()=0);
                 if n(of _:) ge 3 then output;
               end;
              end;
run;

Ksharp

Message was edited by: xia keshan

R_Win · Posted 11-11-2011 07:32 AM

Thqs ksharp but how can i do the same for character varaibles

Ksharp · Posted 11-13-2011 10:40 PM

data test;
input (f1 f2 f3 f4) ($);
cards;
1 2 3 4
4 5 1 2
8 1 9 0
3 4 4 1
9 12 12 14
2 16 17 17
;
run;
proc sql noprint;
 select quote(cats('_',name)) into : name separated by ','
  from dictionary.columns
   where libname='WORK' and memname='TEST' and upcase(name) like 'F%'
    order by varnum;

 select cats('_',name) into : _name separated by ' '
  from dictionary.columns
   where libname='WORK' and memname='TEST' and upcase(name) like 'F%'
    order by varnum;
quit;
%put _user_;
data want(keep=_f:);
 if _n_ eq 1 then do;
                    length k &_name $ 8;
                    declare hash ha (hashexp: 20);
                    declare hiter hi ('ha');
                     ha.definekey('k');
                     ha.definedata(&name);
                     ha.definedone();
                   end;
 set test end=last;
 array ff{*} $ f: ; array _ff{*} $ _f: ;
 do i=1 to dim(ff);
  k=ff{i]; rc=ha.find(); _ff{i}=ff{i}; ha.replace();
  call missing(of _ff{*});
 end;
 if last then do;
               do while(hi.next()=0);
                 if cmiss(of _f:) lt 2 then output;
               end;
              end;
run;

Ksharp

sas_Forum · Posted 11-14-2011 01:47 AM

Thqs Ksharp for your answer it worked