Good day!
Hope all is well with you. This is my first time posting a question on the SAS blog so I apologize if a listed it under the wrong location. I thought of listing it to enterprise guide since I am working in enterprise guide 😀.
I have the following problem: I want to remove variables in a data set that have more than 90% of the following values:-1 , -2, -3, -4, -5 , -6 , -7. Stated otherwise, if a variable is made up of more than 90% of the above listed values, the variable should be removed from the data set. Below is an example of the data set I am working with.
| ID | Target Variable | Var1 | Var2 | ... | Var675 |
| 001 | 1 | -2 | -2 | .. | -3 |
| 002 | 1 | -2 | -2 | ... | -2 |
| 003 | 0 | 24 | 1000 | ... | 100 |
| 004 | 0 | -2 | -4 | ... | -4 |
| 005 | 1 | -3 | -5 | ... | -3 |
I have not yet written any code. But I got the feeling that I will probably have to use a loop (to loop through all the columns). So therefore I created the following macro definition that will perform this process for all the numeric columns in any given data set. All the columns from Var1 - Var675 are numeric.
%macro control(dsn, method = Quantile);
%let lib = %upcase(%scan(&dsn,1,.));
%let dsn = %upcase(%scan(&dsn,2,.));
%let method = %upcase(&method);
/* Creatign macro's for each column in the data set */
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
call symputx('col'||strip(_n_),strip(name));
if eof then call symputx('max',strip(_n_));
run;
%do i = 1 %to &max;
I presume here is where the code will go for my problem.
%end;
%mend controlThanks in advance!
Thank you @Norman21 fro the help.
I managed to write the following code and it seems to work/provide the desired output.
Firstly, I created an empty table with the necessary specification before the macro as follows:
proc sql;
Create table Datasets.Special_Perc_&Split
(Variable char length = 23 format=$50.,
Sum_Special num length = 8 format = best12.,
Sum_positive num length = 8 format = best12.,
Percentage num length = 8 format = Percent6.2);
quit; Secondly, I wrote a macro to calculate the percentage of special values for each column in any given data set. This is then added to the table that I created above for each iteration of the loop.
%macro Special(dsn);
%let lib = %upcase(%scan(&dsn,1,.));
%let dsn = %upcase(%scan(&dsn,2,.));
/* Creatign macro's for each column in the data set */
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
call symputx('col'||strip(_n_),strip(name));
if eof then call symputx('max',strip(_n_));
run;
%do i = 1 %to &max;
proc sql noprint;
select sum(&&col&i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
sum(&&col&i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
into :SN, :SP, :Perc
from &lib..&dsn;
quit;
data b;
Variable = "&&col&i";
Sum_Special = &sn;
Sum_positive = &sp;
Percentage = &Perc;
run;
proc append base = Datasets.Special_Perc_&split data = b force;
run;
proc sql;
drop table b;
quit;
%end;
%mend special;
%special(Datasets.Longevity_Numeric_&split);Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%.
It may not be the best way to do it, but it got the job done.
Welcome to the Community!
You might find the following helpful:
Thank you @Norman21 fro the help.
I managed to write the following code and it seems to work/provide the desired output.
Firstly, I created an empty table with the necessary specification before the macro as follows:
proc sql;
Create table Datasets.Special_Perc_&Split
(Variable char length = 23 format=$50.,
Sum_Special num length = 8 format = best12.,
Sum_positive num length = 8 format = best12.,
Percentage num length = 8 format = Percent6.2);
quit; Secondly, I wrote a macro to calculate the percentage of special values for each column in any given data set. This is then added to the table that I created above for each iteration of the loop.
%macro Special(dsn);
%let lib = %upcase(%scan(&dsn,1,.));
%let dsn = %upcase(%scan(&dsn,2,.));
/* Creatign macro's for each column in the data set */
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
call symputx('col'||strip(_n_),strip(name));
if eof then call symputx('max',strip(_n_));
run;
%do i = 1 %to &max;
proc sql noprint;
select sum(&&col&i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
sum(&&col&i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
into :SN, :SP, :Perc
from &lib..&dsn;
quit;
data b;
Variable = "&&col&i";
Sum_Special = &sn;
Sum_positive = &sp;
Percentage = &Perc;
run;
proc append base = Datasets.Special_Perc_&split data = b force;
run;
proc sql;
drop table b;
quit;
%end;
%mend special;
%special(Datasets.Longevity_Numeric_&split);Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%.
It may not be the best way to do it, but it got the job done.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.