Good day!
Hope all is well with you. This is my first time posting a question on the SAS blog so I apologize if a listed it under the wrong location. I thought of listing it to enterprise guide since I am working in enterprise guide 😀.
I have the following problem: I want to remove variables in a data set that have more than 90% of the following values:-1 , -2, -3, -4, -5 , -6 , -7. Stated otherwise, if a variable is made up of more than 90% of the above listed values, the variable should be removed from the data set. Below is an example of the data set I am working with.
ID | Target Variable | Var1 | Var2 | ... | Var675 |
001 | 1 | -2 | -2 | .. | -3 |
002 | 1 | -2 | -2 | ... | -2 |
003 | 0 | 24 | 1000 | ... | 100 |
004 | 0 | -2 | -4 | ... | -4 |
005 | 1 | -3 | -5 | ... | -3 |
I have not yet written any code. But I got the feeling that I will probably have to use a loop (to loop through all the columns). So therefore I created the following macro definition that will perform this process for all the numeric columns in any given data set. All the columns from Var1 - Var675 are numeric.
%macro control(dsn, method = Quantile);
%let lib = %upcase(%scan(&dsn,1,.));
%let dsn = %upcase(%scan(&dsn,2,.));
%let method = %upcase(&method);
/* Creatign macro's for each column in the data set */
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
call symputx('col'||strip(_n_),strip(name));
if eof then call symputx('max',strip(_n_));
run;
%do i = 1 %to &max;
I presume here is where the code will go for my problem.
%end;
%mend control
Thanks in advance!
Thank you @Norman21 fro the help.
I managed to write the following code and it seems to work/provide the desired output.
Firstly, I created an empty table with the necessary specification before the macro as follows:
proc sql;
Create table Datasets.Special_Perc_&Split
(Variable char length = 23 format=$50.,
Sum_Special num length = 8 format = best12.,
Sum_positive num length = 8 format = best12.,
Percentage num length = 8 format = Percent6.2);
quit;
Secondly, I wrote a macro to calculate the percentage of special values for each column in any given data set. This is then added to the table that I created above for each iteration of the loop.
%macro Special(dsn);
%let lib = %upcase(%scan(&dsn,1,.));
%let dsn = %upcase(%scan(&dsn,2,.));
/* Creatign macro's for each column in the data set */
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
call symputx('col'||strip(_n_),strip(name));
if eof then call symputx('max',strip(_n_));
run;
%do i = 1 %to &max;
proc sql noprint;
select sum(&&col&i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
sum(&&col&i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
into :SN, :SP, :Perc
from &lib..&dsn;
quit;
data b;
Variable = "&&col&i";
Sum_Special = &sn;
Sum_positive = &sp;
Percentage = &Perc;
run;
proc append base = Datasets.Special_Perc_&split data = b force;
run;
proc sql;
drop table b;
quit;
%end;
%mend special;
%special(Datasets.Longevity_Numeric_&split);
Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%.
It may not be the best way to do it, but it got the job done.
Welcome to the Community!
You might find the following helpful:
Thank you @Norman21 fro the help.
I managed to write the following code and it seems to work/provide the desired output.
Firstly, I created an empty table with the necessary specification before the macro as follows:
proc sql;
Create table Datasets.Special_Perc_&Split
(Variable char length = 23 format=$50.,
Sum_Special num length = 8 format = best12.,
Sum_positive num length = 8 format = best12.,
Percentage num length = 8 format = Percent6.2);
quit;
Secondly, I wrote a macro to calculate the percentage of special values for each column in any given data set. This is then added to the table that I created above for each iteration of the loop.
%macro Special(dsn);
%let lib = %upcase(%scan(&dsn,1,.));
%let dsn = %upcase(%scan(&dsn,2,.));
/* Creatign macro's for each column in the data set */
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
call symputx('col'||strip(_n_),strip(name));
if eof then call symputx('max',strip(_n_));
run;
%do i = 1 %to &max;
proc sql noprint;
select sum(&&col&i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
sum(&&col&i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
into :SN, :SP, :Perc
from &lib..&dsn;
quit;
data b;
Variable = "&&col&i";
Sum_Special = &sn;
Sum_positive = &sp;
Percentage = &Perc;
run;
proc append base = Datasets.Special_Perc_&split data = b force;
run;
proc sql;
drop table b;
quit;
%end;
%mend special;
%special(Datasets.Longevity_Numeric_&split);
Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%.
It may not be the best way to do it, but it got the job done.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: