BookmarkSubscribeRSS Feed
sas_Forum
Calcite | Level 5

i am having 4 variables f1 f2 f3 f4

data test;
input f1 f2 f3 f4;
cards;
1 2 3 4
4 5 1 2
8 1 9 0
3 4 4 1
9 12 12 14
2 16 17 18
run;
cards;

actually i want to find out the observations that are exesting in f1 f2 f3 and f4 in the above example 1 is repeating in all the four variable
at any time ,and 2 is there in f1,f2 and in f4 i want to find out the variable then are in atleast in 3 variables

output

f1  f2  f3  f4
1  1  1  1
2  2  .  2

12 REPLIES 12
Tom
Super User Tom
Super User

Your data structure does not match the way you are referring to the information.

You talk about the values inside individual variables as if they were the observations.

If you restructure the data into two variables, one with the value and one with the column label then the query becomes easier.

data test;

do col='f1','f2','f3','f4';

   input value @; output;

end;

cards;

1 2 3 4

4 5 1 2

8 1 9 0

3 4 4 1

9 12 12 14

2 16 17 18

run;

proc sql noprint ;

  create table want as

    select distinct value

    from test

    group by value

    having count(distinct col) > 2

  ;

quit;

proc print; run;

sas_Forum
Calcite | Level 5

i am having 4 variables f1 f2 f3 f4

data test;
input  name add f1 f2 f3 f4;
cards;
a b 1 2 3 4
c g 4 5 1 2
y u 8 1 9 0
r p 3 4 4 1
k l 9 12 12 14
m n 2 16 17 18
run;
cards;

actually i want to find out the observations that are exesting in f1 f2 f3 and f4 in the above example 1 is repeating in all the four variable
at any time ,and 2 is there in f1,f2 and in f4 i want to find out the variable then are in atleast in 3 variables.

output

f1  f2  f3  f4
1  1  1  1
2  2  .  2

Tom
Super User Tom
Super User

I still have no idea what you are actually trying to do, but try this method using PROC TRANSPOSE to convert your 4 variables into 4 observations so that we can query over them.

proc transpose data=test out=test2 ;

  by name add notsorted ;

  var f1-f4 ;

run;

proc sql noprint;

  create table test2 as

    select distinct col1 as value,_name_,col1

    from test2

    group by col1

    having count(distinct _name_) > 2

    order by 1,2

  ;

quit;

proc transpose data=test2 out=want(drop=_name_) ;

  by value;

  id _name_;

  var col1 ;

run;

proc print;

run;

output:

Obs    value    f1    f2    f3    f4

1       1       1     1     1     1

2       2       2     2     .     2

3       4       4     4     4     4

sas_Forum
Calcite | Level 5

But i am having 50 variables beside that f1,f2,f3 and f4 ,i cant use proc transpose and data is in 30 laks

art297
Opal | Level 21

Some questions:

What is the full range of values the variables can have? 

Your example mentioned knowing values being shared by 2, 3 and 4 of the variables.  Does that now translate to be shared by 2,3,4 ..etc .. through all 50 variables?

Comment: 3 million isn't that big of a data set.

Tom
Super User Tom
Super User

Does that mean that it takes 50 variables to uniquely identify an observation in your source dataset?  So that you are worried about the space/time to store 3 million * 4 * (length of 50 variables plus the _name_ and COL1 variables) bytes of data?

From your original request you do not seem to care about those variables so it should be fine not to include them in the transposed data.

You can always use a data step view to construct the transposed data to feed to the PROC SQL code so that you do not need to store that intermediate dataset at all.

data test2 / view=test2 ;

   length _name_ $32 col1 8 ;

   set test ;

  array vars f1-f4 ;

   do i=1 to dim(vars);

     _name_ = vname(vars(i));

    col1 = vars(i);

     output;

   end;
  keep _name_ col1;
run;

Or does it mean that you want to search over 50 variables instead of 4. The difference in space to store 50 variables per observation versus one variable in 50 times as many observations is not much.

Peter_C
Rhodochrosite | Level 12

I think the biggest problem seems to be generalising.

this seems to provide some flexibility while making only a single pass through the original data

%let your_data = test ;

%let list_values2test  = 1 2 ;

%let list_columns2test = f1 f2 f3 f4 ;

%let required          = 3 ;

%let num_values = %sysfunc( countw( &list_values2test )) ;

%let num_tests  = %sysfunc( countw( &list_columns2test )) ;

data results( keep= &list_columns2test )  ;

***********************************************************************

* array results holds flags indicating required value found in a col  *

**********************************************************************;

array results( &num_values, &num_tests ) $1 _temporary_ ;

array inputs (              &num_tests )    &list_columns2test ;

array outputs(              &num_tests )    &list_columns2test ;

***********************************************************************

* array values lists the values to be hunted among the columns

**********************************************************************;

array values ( &num_values             )    _temporary_ ( &list_values2test ) ;

do row= 1 by 1 while( not end_of_data ) ;

   set &your_data  end=    end_of_data ;

   do col = 1 to &num_tests ;

      which = whichn( inputs(col), of values(*) ) ;

      if which then results( which, col ) = 'x' ;

   end ;

end ;

do test = 1 to &num_values;

    found =0 ;

    value = values( test ) ;

    call missing( of outputs(*) );

    do col=1 to &num_tests ;

       if results( test, col ) ='x' then do ;

          found + 1 ;

          outputs( col ) = value ;

       end ;

    end ;

    if found  >= &required then output ;

end ;

stop ;

run ;

tested and working for this set of data

If performance is an issue, you could stop testing once all flags have been set for a value

Ksharp
Super User

How about:

data test;
input f1 f2 f3 f4;
cards;
1 2 3 4
4 5 1 2
8 1 9 0
3 4 4 1
9 12 12 14
2 16 17 17
;
run;
proc sql noprint;
 select quote(cats('_',name)) into : name separated by ','
  from dictionary.columns
   where libname='WORK' and memname='TEST' and upcase(name) like 'F%'
    order by varnum;

 select cats('_',name) into : _name separated by ' '
  from dictionary.columns
   where libname='WORK' and memname='TEST' and upcase(name) like 'F%'
    order by varnum;
quit;
%put _user_;
data want(keep=_f:);
 if _n_ eq 1 then do;
                    length k &_name 8;
                    declare hash ha (hashexp: 20);
                    declare hiter hi ('ha');
                     ha.definekey('k');
                     ha.definedata(&name);
                     ha.definedone();
                   end;
 set test end=last;
 array ff{*} f: ; array _ff{*} _f: ;
 do i=1 to dim(ff);
  k=ff{i]; rc=ha.find(); _ff{i}=ff{i}; ha.replace();
  call missing(of _ff{*});
 end;
 if last then do;
               do while(hi.next()=0);
                 if n(of _:) ge 3 then output;
               end;
              end;
run;



Ksharp

Message was edited by: xia keshan

R_Win
Calcite | Level 5

Thqs ksharp but how can i do the same for character varaibles

Ksharp
Super User
data test;
input (f1 f2 f3 f4) ($);
cards;
1 2 3 4
4 5 1 2
8 1 9 0
3 4 4 1
9 12 12 14
2 16 17 17
;
run;
proc sql noprint;
 select quote(cats('_',name)) into : name separated by ','
  from dictionary.columns
   where libname='WORK' and memname='TEST' and upcase(name) like 'F%'
    order by varnum;

 select cats('_',name) into : _name separated by ' '
  from dictionary.columns
   where libname='WORK' and memname='TEST' and upcase(name) like 'F%'
    order by varnum;
quit;
%put _user_;
data want(keep=_f:);
 if _n_ eq 1 then do;
                    length k &_name $ 8;
                    declare hash ha (hashexp: 20);
                    declare hiter hi ('ha');
                     ha.definekey('k');
                     ha.definedata(&name);
                     ha.definedone();
                   end;
 set test end=last;
 array ff{*} $ f: ; array _ff{*} $ _f: ;
 do i=1 to dim(ff);
  k=ff{i]; rc=ha.find(); _ff{i}=ff{i}; ha.replace();
  call missing(of _ff{*});
 end;
 if last then do;
               do while(hi.next()=0);
                 if cmiss(of _f:) lt 2 then output;
               end;
              end;
run;


Ksharp

sas_Forum
Calcite | Level 5

Thqs Ksharp for your answer it worked

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 994 views
  • 0 likes
  • 6 in conversation