BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
victor1893
Calcite | Level 5

Hi,

 

I have a table from which I want to store in a macro variable all columns that contain a specific value. For example:

 

data have;
input abc def ghi jkl 8.;
datalines;
1 4 2 3
6 99 0 8
3 99 6 5
2 4 0 99
;
run;

 

Here I would like to obtain all columns which contain the value 99: def, jkl.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@victor1893 wrote:

The solution that I'm looking needs to be run on tables with thousands of records and hundreds of columns. Proc transpose is not very computationally efficient in this case. 


Then you will have to use more complicated code.  Something like this:

(NOTE: You could easily generate the code from a list of variable.  You could generate the list of variable using PROC CONTENT if needed.)

proc sql;
create table summary as 
select max( abc=99 ) as abc
     , max( def=99 ) as def
     , max( ghi=99 ) as ghi
     , max( jkl=99 ) as jkl
from have
;
quit;

proc transpose data=summary out=want(where=(col1=1)) ;
  var _all_;
run;

You could possibly use a data step and an array (or a couple of arrays)  that would have the advantage that you could stop as soon as you found that 99 appeared in every column.  If you have more than 5000 numeric variables then increase the size of the temporary array.

data want;
  set have end=eof;
  array vars _numeric_;
  array flags[5000] _temporary_ (5000*0);
  length _name_ $32 ;
  do _n_=1 to dim(vars);
    if vars[_n_]=99 then flags[_n_]=1;
  end;
  if eof or sum(of flags[*])=dim(vars) then do;
    do _n_=1 to dim(vars);
      if flags[_n_] then do;
        _name_ = vname(vars[_n_]);
        output;
      end;
    end;
    stop;
  end;
  keep _name_;
run;

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

Is 99 a value chosen to be larger than any other "real" value that might appear? Why use 99 instead of missing? Or could there be the number 100 in a column?

 

If 99 is the largest value in the columns, you can use PROC SUMMARY

 

proc summary data=have;
    var _numeric_;
    output out=_stats_ max=;
run;

proc transpose data=_stats_ out=_stats2_;
run;

proc sql noprint;
select _name_ into :which_cols separated by ' ' from _stats2_ where col1=99;
quit;

%put &=which_cols;
--
Paige Miller
victor1893
Calcite | Level 5

99 is an arbitrary value given as an example. I am looking for a solution that can work for any given value. The proc summary solution doesn't work unfortunately but I appreciate your effort!

Tom
Super User Tom
Super User

Why not first transpose the data?

First let's fix your data step.  It does not make any sense to read exactly 8 bytes to get the value for the last variable on the line, just let SAS read it the same as you did the other three.

data have;
  input abc def ghi jkl ;
datalines;
1 4 2 3
6 99 0 8
3 99 6 5
2 4 0 99
;

If you transpose the data then you can use something like WHICHN() (or use WHICHC() if you are searching for character values).

proc transpose data=have out=trans ;
  var _all_;
run;

data want;
  set trans;
  if whichn(99,of col:);
  keep _name_;
run;

You could then pull that list of variable name into a macro variable.

proc sql noprint;
select nliteral(_name_) into :varlist separated by ' '
from want;
quit;
victor1893
Calcite | Level 5

The solution that I'm looking needs to be run on tables with thousands of records and hundreds of columns. Proc transpose is not very computationally efficient in this case. 

Tom
Super User Tom
Super User

@victor1893 wrote:

The solution that I'm looking needs to be run on tables with thousands of records and hundreds of columns. Proc transpose is not very computationally efficient in this case. 


Then you will have to use more complicated code.  Something like this:

(NOTE: You could easily generate the code from a list of variable.  You could generate the list of variable using PROC CONTENT if needed.)

proc sql;
create table summary as 
select max( abc=99 ) as abc
     , max( def=99 ) as def
     , max( ghi=99 ) as ghi
     , max( jkl=99 ) as jkl
from have
;
quit;

proc transpose data=summary out=want(where=(col1=1)) ;
  var _all_;
run;

You could possibly use a data step and an array (or a couple of arrays)  that would have the advantage that you could stop as soon as you found that 99 appeared in every column.  If you have more than 5000 numeric variables then increase the size of the temporary array.

data want;
  set have end=eof;
  array vars _numeric_;
  array flags[5000] _temporary_ (5000*0);
  length _name_ $32 ;
  do _n_=1 to dim(vars);
    if vars[_n_]=99 then flags[_n_]=1;
  end;
  if eof or sum(of flags[*])=dim(vars) then do;
    do _n_=1 to dim(vars);
      if flags[_n_] then do;
        _name_ = vname(vars[_n_]);
        output;
      end;
    end;
    stop;
  end;
  keep _name_;
run;
victor1893
Calcite | Level 5

The data step solution (although ugly) worked as expected.  Thank you very much for the effort!

Tom
Super User Tom
Super User

How to generate the PROC SQL method from the variable names.

proc transpose data=have(obs=0) out=names; var _all_; run;
filename code temp;
data _null_;
  file code;
  set names end=eof;
  length nliteral $64 ;
  nliteral=nliteral(_name_);
  if _n_=1 then put
  'proc sql;'
 /'create table summary as'
 /'select ' @;
  else put
  '     , ' @;
  put 'max( ' nliteral '=99 ) as ' nliteral ;
  if eof then put 'from have;'/'quit;';
run;
%include code ;
proc transpose data=summary out=want(where=(col1=1)) ;
  var _all_;
run;

Note that if the actual dataset lives in a remote database (Oracle, Snowflake, Teradata, etc) this has the advantage (perhaps requiring changing the simple boolean test into case when var=99 then 1 else 0 end) of being able to be pushed into a remote data to execute thus massively decreasing the amount of data that needs to move from the remote database into the SAS server.

victor1893
Calcite | Level 5

Appreciate the tip, but I believe this is a bit too complex for the average sas user in my organization, so I don't want to go deeper than this. Thanks!

Ksharp
Super User

The most convenient and succinct way is using PROC IML.

 

data have;
input abc def ghi jkl 8.;
datalines;
1 4 2 3
6 99 0 8
3 99 6 5
2 4 0 99
;
run;

proc iml;
use have;
read all var _all_ into x[c=vname];
close;
idx=loc((x=99)[<>,]);
want=vname[idx];
print want;
quit;

Ksharp_0-1714712900107.png

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 566 views
  • 2 likes
  • 4 in conversation