Hello, I have a dataset test which has 7000 variables. I have a task to find all the variables that have .B and .C in it. For example
data test;
input K1 K2 K3 K4 K5;
cards;
1 0 .B 2 1
0 2 2 .C 0
2 1 1 1 0
;
run;
The output should give me K3,K4 vars. I have tried creating an array and do loop as below
data tmp1;
set test;
array formats[*] _all_;
do i = 1 to dim(formats);
if formats[i] in (.B,.C) ;
end;
run;
The above code is giving me the observations that have .B and .C . Can anyone help me in directing the right way to get ONLY the list of variables that have .B and .C. Thanks.
Here are two different ways. The first uses a data step and loops through the array, Every time it encounters a missing value it outputs that variable name to a data set and then you can remove duplicates after.
data singleStep;
set test;
array K(*) k1-k5;
do i=1 to dim(k);
if k(i) in (.B, .C) then
do;
variableName=vname(k(i));
output;
end;
end;
keep variableName;
run;
proc sort data=singleStep nodupkey;
by variableName;
run;
This is more of a dynamic approach, that uses a format to map the values of interest to one group and everything else to another group. Then you summarize and see which variables have the values of interest. The method above is probably simpler but this works as well and puts the list of variables into a comma separated macro variable at the end.
proc format;
value find_fmt
.B, .C='Found' other='Not of Interest';
run;
*summarize into missing/not missing/
ods select none;
ods output onewayfreqs=summary;
proc freq data=test;
table _all_ / missing;
format _numeric_ find_fmt.;
run;
ods select all;
*Format output - probably more than is needed by code that I already have;
data want;
length variable $32. variable_value $50.;
set summary;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable' variable_value='Variable Value';
run;
proc sql;
select distinct variable Into :var_list separated by ", " from want where
variable_value="Found";
quit;
Is this what you want?
data want;
set test;
array vars [*] k:;
do i = 1 to dim(vars);
if vars[i] in (".B", ".C") then flag = 1;
end;
drop i;
run;
I want the variables that have the particular value. The above code gives me the observations. Is there a way to find the list of variables ? Thanks.
data want;
set test;
array vars [*] k:;
do i = 1 to dim(vars);
if vars[i] in (".B", ".C") then var_of_interest = vname(vars[i]);
end;
drop i;
run;
Obs K1 K2 K3 K4 K5 var_of_interest 1 1 0 B 2 1 K3 2 0 2 2 C 0 K4 3 2 1 1 1 0
Is that what you mean?
Here are two different ways. The first uses a data step and loops through the array, Every time it encounters a missing value it outputs that variable name to a data set and then you can remove duplicates after.
data singleStep;
set test;
array K(*) k1-k5;
do i=1 to dim(k);
if k(i) in (.B, .C) then
do;
variableName=vname(k(i));
output;
end;
end;
keep variableName;
run;
proc sort data=singleStep nodupkey;
by variableName;
run;
This is more of a dynamic approach, that uses a format to map the values of interest to one group and everything else to another group. Then you summarize and see which variables have the values of interest. The method above is probably simpler but this works as well and puts the list of variables into a comma separated macro variable at the end.
proc format;
value find_fmt
.B, .C='Found' other='Not of Interest';
run;
*summarize into missing/not missing/
ods select none;
ods output onewayfreqs=summary;
proc freq data=test;
table _all_ / missing;
format _numeric_ find_fmt.;
run;
ods select all;
*Format output - probably more than is needed by code that I already have;
data want;
length variable $32. variable_value $50.;
set summary;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable' variable_value='Variable Value';
run;
proc sql;
select distinct variable Into :var_list separated by ", " from want where
variable_value="Found";
quit;
Thank you for the explanation. It worked.
There is nothing in @Reeza code that is limiting what numeric variables it considers. The only reference to K1-K6 was in the data step she used to make some sample data
PS If you provide your own example data as a data step in your questions then people trying to help you won't need to make their own.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.