DATA Step, Macro, Functions and more

Reg:Finding in all the exesting variables

Reply
Frequent Contributor
Posts: 140

Reg:Finding in all the exesting variables

i am having 4 variables f1 f2 f3 f4

data test;
input f1 f2 f3 f4;
cards;
1 2 3 4
4 5 1 2
8 1 9 0
3 4 4 1
9 12 12 14
2 16 17 18
run;
cards;

actually i want to find out the observations that are exesting in f1 f2 f3 and f4 in the above example 1 is repeating in all the four variable
at any time ,and 2 is there in f1,f2 and in f4 i want to find out the variable then are in atleast in 3 variables

output

f1  f2  f3  f4
1  1  1  1
2  2  .  2

Super User
Super User
Posts: 7,076

Reg:Finding in all the exesting variables

Posted in reply to sas_Forum

Your data structure does not match the way you are referring to the information.

You talk about the values inside individual variables as if they were the observations.

If you restructure the data into two variables, one with the value and one with the column label then the query becomes easier.

data test;

do col='f1','f2','f3','f4';

   input value @; output;

end;

cards;

1 2 3 4

4 5 1 2

8 1 9 0

3 4 4 1

9 12 12 14

2 16 17 18

run;

proc sql noprint ;

  create table want as

    select distinct value

    from test

    group by value

    having count(distinct col) > 2

  ;

quit;

proc print; run;

Frequent Contributor
Posts: 140

Reg:Finding in all the exesting variables

Posted in reply to sas_Forum

i am having 4 variables f1 f2 f3 f4

data test;
input  name add f1 f2 f3 f4;
cards;
a b 1 2 3 4
c g 4 5 1 2
y u 8 1 9 0
r p 3 4 4 1
k l 9 12 12 14
m n 2 16 17 18
run;
cards;

actually i want to find out the observations that are exesting in f1 f2 f3 and f4 in the above example 1 is repeating in all the four variable
at any time ,and 2 is there in f1,f2 and in f4 i want to find out the variable then are in atleast in 3 variables.

output

f1  f2  f3  f4
1  1  1  1
2  2  .  2

Super User
Super User
Posts: 7,076

Re: Reg:Finding in all the exesting variables

Posted in reply to sas_Forum

I still have no idea what you are actually trying to do, but try this method using PROC TRANSPOSE to convert your 4 variables into 4 observations so that we can query over them.

proc transpose data=test out=test2 ;

  by name add notsorted ;

  var f1-f4 ;

run;

proc sql noprint;

  create table test2 as

    select distinct col1 as value,_name_,col1

    from test2

    group by col1

    having count(distinct _name_) > 2

    order by 1,2

  ;

quit;

proc transpose data=test2 out=want(drop=_name_) ;

  by value;

  id _name_;

  var col1 ;

run;

proc print;

run;

output:

Obs    value    f1    f2    f3    f4

1       1       1     1     1     1

2       2       2     2     .     2

3       4       4     4     4     4

Frequent Contributor
Posts: 140

Re: Reg:Finding in all the exesting variables

But i am having 50 variables beside that f1,f2,f3 and f4 ,i cant use proc transpose and data is in 30 laks

PROC Star
Posts: 7,491

Re: Reg:Finding in all the exesting variables

Posted in reply to sas_Forum

Some questions:

What is the full range of values the variables can have? 

Your example mentioned knowing values being shared by 2, 3 and 4 of the variables.  Does that now translate to be shared by 2,3,4 ..etc .. through all 50 variables?

Comment: 3 million isn't that big of a data set.

Frequent Contributor
Posts: 140

Reg:Finding in all the exesting variables

yes

Super User
Super User
Posts: 7,076

Re: Reg:Finding in all the exesting variables

Posted in reply to sas_Forum

Does that mean that it takes 50 variables to uniquely identify an observation in your source dataset?  So that you are worried about the space/time to store 3 million * 4 * (length of 50 variables plus the _name_ and COL1 variables) bytes of data?

From your original request you do not seem to care about those variables so it should be fine not to include them in the transposed data.

You can always use a data step view to construct the transposed data to feed to the PROC SQL code so that you do not need to store that intermediate dataset at all.

data test2 / view=test2 ;

   length _name_ $32 col1 8 ;

   set test ;

  array vars f1-f4 ;

   do i=1 to dim(vars);

     _name_ = vname(vars(i));

    col1 = vars(i);

     output;

   end;
  keep _name_ col1;
run;

Or does it mean that you want to search over 50 variables instead of 4. The difference in space to store 50 variables per observation versus one variable in 50 times as many observations is not much.

Valued Guide
Posts: 2,177

Reg:Finding in all the exesting variables

Posted in reply to sas_Forum

I think the biggest problem seems to be generalising.

this seems to provide some flexibility while making only a single pass through the original data

%let your_data = test ;

%let list_values2test  = 1 2 ;

%let list_columns2test = f1 f2 f3 f4 ;

%let required          = 3 ;

%let num_values = %sysfunc( countw( &list_values2test )) ;

%let num_tests  = %sysfunc( countw( &list_columns2test )) ;

data results( keep= &list_columns2test )  ;

***********************************************************************

* array results holds flags indicating required value found in a col  *

**********************************************************************;

array results( &num_values, &num_tests ) $1 _temporary_ ;

array inputs (              &num_tests )    &list_columns2test ;

array outputs(              &num_tests )    &list_columns2test ;

***********************************************************************

* array values lists the values to be hunted among the columns

**********************************************************************;

array values ( &num_values             )    _temporary_ ( &list_values2test ) ;

do row= 1 by 1 while( not end_of_data ) ;

   set &your_data  end=    end_of_data ;

   do col = 1 to &num_tests ;

      which = whichn( inputs(col), of values(*) ) ;

      if which then results( which, col ) = 'x' ;

   end ;

end ;

do test = 1 to &num_values;

    found =0 ;

    value = values( test ) ;

    call missing( of outputs(*) );

    do col=1 to &num_tests ;

       if results( test, col ) ='x' then do ;

          found + 1 ;

          outputs( col ) = value ;

       end ;

    end ;

    if found  >= &required then output ;

end ;

stop ;

run ;

tested and working for this set of data

If performance is an issue, you could stop testing once all flags have been set for a value

Super User
Posts: 10,044

Re: Reg:Finding in all the exesting variables

Posted in reply to sas_Forum

How about:

data test;
input f1 f2 f3 f4;
cards;
1 2 3 4
4 5 1 2
8 1 9 0
3 4 4 1
9 12 12 14
2 16 17 17
;
run;
proc sql noprint;
 select quote(cats('_',name)) into : name separated by ','
  from dictionary.columns
   where libname='WORK' and memname='TEST' and upcase(name) like 'F%'
    order by varnum;

 select cats('_',name) into : _name separated by ' '
  from dictionary.columns
   where libname='WORK' and memname='TEST' and upcase(name) like 'F%'
    order by varnum;
quit;
%put _user_;
data want(keep=_f:);
 if _n_ eq 1 then do;
                    length k &_name 8;
                    declare hash ha (hashexp: 20);
                    declare hiter hi ('ha');
                     ha.definekey('k');
                     ha.definedata(&name);
                     ha.definedone();
                   end;
 set test end=last;
 array ff{*} f: ; array _ff{*} _f: ;
 do i=1 to dim(ff);
  k=ff{i]; rc=ha.find(); _ff{i}=ff{i}; ha.replace();
  call missing(of _ff{*});
 end;
 if last then do;
               do while(hi.next()=0);
                 if n(of _:) ge 3 then output;
               end;
              end;
run;



Ksharp

Message was edited by: xia keshan

Regular Contributor
Posts: 229

Reg:Finding in all the exesting variables

Thqs ksharp but how can i do the same for character varaibles

Super User
Posts: 10,044

Reg:Finding in all the exesting variables

Posted in reply to sas_Forum
data test;
input (f1 f2 f3 f4) ($);
cards;
1 2 3 4
4 5 1 2
8 1 9 0
3 4 4 1
9 12 12 14
2 16 17 17
;
run;
proc sql noprint;
 select quote(cats('_',name)) into : name separated by ','
  from dictionary.columns
   where libname='WORK' and memname='TEST' and upcase(name) like 'F%'
    order by varnum;

 select cats('_',name) into : _name separated by ' '
  from dictionary.columns
   where libname='WORK' and memname='TEST' and upcase(name) like 'F%'
    order by varnum;
quit;
%put _user_;
data want(keep=_f:);
 if _n_ eq 1 then do;
                    length k &_name $ 8;
                    declare hash ha (hashexp: 20);
                    declare hiter hi ('ha');
                     ha.definekey('k');
                     ha.definedata(&name);
                     ha.definedone();
                   end;
 set test end=last;
 array ff{*} $ f: ; array _ff{*} $ _f: ;
 do i=1 to dim(ff);
  k=ff{i]; rc=ha.find(); _ff{i}=ff{i}; ha.replace();
  call missing(of _ff{*});
 end;
 if last then do;
               do while(hi.next()=0);
                 if cmiss(of _f:) lt 2 then output;
               end;
              end;
run;


Ksharp

Frequent Contributor
Posts: 140

Reg:Finding in all the exesting variables

Thqs Ksharp for your answer it worked

Ask a Question
Discussion stats
  • 12 replies
  • 295 views
  • 0 likes
  • 6 in conversation