- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I have a dataset test which has 7000 variables. I have a task to find all the variables that have .B and .C in it. For example
data test;
input K1 K2 K3 K4 K5;
cards;
1 0 .B 2 1
0 2 2 .C 0
2 1 1 1 0
;
run;
The output should give me K3,K4 vars. I have tried creating an array and do loop as below
data tmp1;
set test;
array formats[*] _all_;
do i = 1 to dim(formats);
if formats[i] in (.B,.C) ;
end;
run;
The above code is giving me the observations that have .B and .C . Can anyone help me in directing the right way to get ONLY the list of variables that have .B and .C. Thanks.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here are two different ways. The first uses a data step and loops through the array, Every time it encounters a missing value it outputs that variable name to a data set and then you can remove duplicates after.
data singleStep;
set test;
array K(*) k1-k5;
do i=1 to dim(k);
if k(i) in (.B, .C) then
do;
variableName=vname(k(i));
output;
end;
end;
keep variableName;
run;
proc sort data=singleStep nodupkey;
by variableName;
run;
This is more of a dynamic approach, that uses a format to map the values of interest to one group and everything else to another group. Then you summarize and see which variables have the values of interest. The method above is probably simpler but this works as well and puts the list of variables into a comma separated macro variable at the end.
proc format;
value find_fmt
.B, .C='Found' other='Not of Interest';
run;
*summarize into missing/not missing/
ods select none;
ods output onewayfreqs=summary;
proc freq data=test;
table _all_ / missing;
format _numeric_ find_fmt.;
run;
ods select all;
*Format output - probably more than is needed by code that I already have;
data want;
length variable $32. variable_value $50.;
set summary;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable' variable_value='Variable Value';
run;
proc sql;
select distinct variable Into :var_list separated by ", " from want where
variable_value="Found";
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is this what you want?
data want;
set test;
array vars [*] k:;
do i = 1 to dim(vars);
if vars[i] in (".B", ".C") then flag = 1;
end;
drop i;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I want the variables that have the particular value. The above code gives me the observations. Is there a way to find the list of variables ? Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want;
set test;
array vars [*] k:;
do i = 1 to dim(vars);
if vars[i] in (".B", ".C") then var_of_interest = vname(vars[i]);
end;
drop i;
run;
Obs K1 K2 K3 K4 K5 var_of_interest 1 1 0 B 2 1 K3 2 0 2 2 C 0 K4 3 2 1 1 1 0
Is that what you mean?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here are two different ways. The first uses a data step and loops through the array, Every time it encounters a missing value it outputs that variable name to a data set and then you can remove duplicates after.
data singleStep;
set test;
array K(*) k1-k5;
do i=1 to dim(k);
if k(i) in (.B, .C) then
do;
variableName=vname(k(i));
output;
end;
end;
keep variableName;
run;
proc sort data=singleStep nodupkey;
by variableName;
run;
This is more of a dynamic approach, that uses a format to map the values of interest to one group and everything else to another group. Then you summarize and see which variables have the values of interest. The method above is probably simpler but this works as well and puts the list of variables into a comma separated macro variable at the end.
proc format;
value find_fmt
.B, .C='Found' other='Not of Interest';
run;
*summarize into missing/not missing/
ods select none;
ods output onewayfreqs=summary;
proc freq data=test;
table _all_ / missing;
format _numeric_ find_fmt.;
run;
ods select all;
*Format output - probably more than is needed by code that I already have;
data want;
length variable $32. variable_value $50.;
set summary;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable' variable_value='Variable Value';
run;
proc sql;
select distinct variable Into :var_list separated by ", " from want where
variable_value="Found";
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the explanation. It worked.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There is nothing in @Reeza code that is limiting what numeric variables it considers. The only reference to K1-K6 was in the data step she used to make some sample data
PS If you provide your own example data as a data step in your questions then people trying to help you won't need to make their own.