Hi,
I have a table from which I want to store in a macro variable all columns that contain a specific value. For example:
data have;
input abc def ghi jkl 8.;
datalines;
1 4 2 3
6 99 0 8
3 99 6 5
2 4 0 99
;
run;
Here I would like to obtain all columns which contain the value 99: def, jkl.
@victor1893 wrote:
The solution that I'm looking needs to be run on tables with thousands of records and hundreds of columns. Proc transpose is not very computationally efficient in this case.
Then you will have to use more complicated code. Something like this:
(NOTE: You could easily generate the code from a list of variable. You could generate the list of variable using PROC CONTENT if needed.)
proc sql;
create table summary as
select max( abc=99 ) as abc
, max( def=99 ) as def
, max( ghi=99 ) as ghi
, max( jkl=99 ) as jkl
from have
;
quit;
proc transpose data=summary out=want(where=(col1=1)) ;
var _all_;
run;
You could possibly use a data step and an array (or a couple of arrays) that would have the advantage that you could stop as soon as you found that 99 appeared in every column. If you have more than 5000 numeric variables then increase the size of the temporary array.
data want;
set have end=eof;
array vars _numeric_;
array flags[5000] _temporary_ (5000*0);
length _name_ $32 ;
do _n_=1 to dim(vars);
if vars[_n_]=99 then flags[_n_]=1;
end;
if eof or sum(of flags[*])=dim(vars) then do;
do _n_=1 to dim(vars);
if flags[_n_] then do;
_name_ = vname(vars[_n_]);
output;
end;
end;
stop;
end;
keep _name_;
run;
Is 99 a value chosen to be larger than any other "real" value that might appear? Why use 99 instead of missing? Or could there be the number 100 in a column?
If 99 is the largest value in the columns, you can use PROC SUMMARY
proc summary data=have;
var _numeric_;
output out=_stats_ max=;
run;
proc transpose data=_stats_ out=_stats2_;
run;
proc sql noprint;
select _name_ into :which_cols separated by ' ' from _stats2_ where col1=99;
quit;
%put &=which_cols;
99 is an arbitrary value given as an example. I am looking for a solution that can work for any given value. The proc summary solution doesn't work unfortunately but I appreciate your effort!
Why not first transpose the data?
First let's fix your data step. It does not make any sense to read exactly 8 bytes to get the value for the last variable on the line, just let SAS read it the same as you did the other three.
data have;
input abc def ghi jkl ;
datalines;
1 4 2 3
6 99 0 8
3 99 6 5
2 4 0 99
;
If you transpose the data then you can use something like WHICHN() (or use WHICHC() if you are searching for character values).
proc transpose data=have out=trans ;
var _all_;
run;
data want;
set trans;
if whichn(99,of col:);
keep _name_;
run;
You could then pull that list of variable name into a macro variable.
proc sql noprint;
select nliteral(_name_) into :varlist separated by ' '
from want;
quit;
The solution that I'm looking needs to be run on tables with thousands of records and hundreds of columns. Proc transpose is not very computationally efficient in this case.
@victor1893 wrote:
The solution that I'm looking needs to be run on tables with thousands of records and hundreds of columns. Proc transpose is not very computationally efficient in this case.
Then you will have to use more complicated code. Something like this:
(NOTE: You could easily generate the code from a list of variable. You could generate the list of variable using PROC CONTENT if needed.)
proc sql;
create table summary as
select max( abc=99 ) as abc
, max( def=99 ) as def
, max( ghi=99 ) as ghi
, max( jkl=99 ) as jkl
from have
;
quit;
proc transpose data=summary out=want(where=(col1=1)) ;
var _all_;
run;
You could possibly use a data step and an array (or a couple of arrays) that would have the advantage that you could stop as soon as you found that 99 appeared in every column. If you have more than 5000 numeric variables then increase the size of the temporary array.
data want;
set have end=eof;
array vars _numeric_;
array flags[5000] _temporary_ (5000*0);
length _name_ $32 ;
do _n_=1 to dim(vars);
if vars[_n_]=99 then flags[_n_]=1;
end;
if eof or sum(of flags[*])=dim(vars) then do;
do _n_=1 to dim(vars);
if flags[_n_] then do;
_name_ = vname(vars[_n_]);
output;
end;
end;
stop;
end;
keep _name_;
run;
The data step solution (although ugly) worked as expected. Thank you very much for the effort!
How to generate the PROC SQL method from the variable names.
proc transpose data=have(obs=0) out=names; var _all_; run;
filename code temp;
data _null_;
file code;
set names end=eof;
length nliteral $64 ;
nliteral=nliteral(_name_);
if _n_=1 then put
'proc sql;'
/'create table summary as'
/'select ' @;
else put
' , ' @;
put 'max( ' nliteral '=99 ) as ' nliteral ;
if eof then put 'from have;'/'quit;';
run;
%include code ;
proc transpose data=summary out=want(where=(col1=1)) ;
var _all_;
run;
Note that if the actual dataset lives in a remote database (Oracle, Snowflake, Teradata, etc) this has the advantage (perhaps requiring changing the simple boolean test into case when var=99 then 1 else 0 end) of being able to be pushed into a remote data to execute thus massively decreasing the amount of data that needs to move from the remote database into the SAS server.
Appreciate the tip, but I believe this is a bit too complex for the average sas user in my organization, so I don't want to go deeper than this. Thanks!
The most convenient and succinct way is using PROC IML.
data have;
input abc def ghi jkl 8.;
datalines;
1 4 2 3
6 99 0 8
3 99 6 5
2 4 0 99
;
run;
proc iml;
use have;
read all var _all_ into x[c=vname];
close;
idx=loc((x=99)[<>,]);
want=vname[idx];
print want;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.