Good day!
It seems that my first post was somehow marked as SPAM so I apologize if you are seeing this post a second time.
I have the following problem: I want to remove variables in a data set that contain more than 90% of the following values: -1,-2,-3,-4,-5,-6,-7. Stated otherwise, If a variable (of which there are 675) is made up of more than 90% of the above listed values, it should be removed from the data set. Below is an example of the data set I am working with.
ID | Target Variable | Var1 | Var2 | ... | Var675 |
001 | 1 | -2 | -2 | ... | -2 |
002 | 1 | -2 | -2 | ... | -3 |
003 | 0 | -2 | -2 | ... | 1500 |
004 | 1 | 5 | -3 | ... | -3 |
005 | 0 | -3 | 18 | ... | -2 |
Since there is a large number of columns in the data set I figured that I will probably need a do loop to do this. Therefore, I created the following macro to remove variables for any given data set that contain more than 90% of these special values.
%macro control(dsn, method = Quantile);
%let lib = %upcase(%scan(&dsn,1,.));
%let dsn = %upcase(%scan(&dsn,2,.));
%let method = %upcase(&method);
/* Creatign macro's for each column in the data set */
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
call symputx('col'||strip(_n_),strip(name));
if eof then call symputx('max',strip(_n_));
run;
%do i = 1 %to &max;
I assume that this is were the code will be written to determine the percentage of special
values in a variable and then drop it from the original data set if it exceeds 90%.
%mend control;
Thank you in advance!
Thank you @sbxkoenk for the assistance.
I also managed to write the following code and it seems to work/provide the desired output.
Firstly, I created an empty table with the necessary specification before the macro as follows:
proc sql;
Create table Datasets.Special_Perc_&Split
(Variable char length = 23 format=$50.,
Sum_Special num length = 8 format = best12.,
Sum_positive num length = 8 format = best12.,
Percentage num length = 8 format = Percent6.2);
quit;
Secondly, I wrote a macro to calculate the percentage of special values for each column in any given data set. This is then added to the table that I created above for each iteration of the loop.
%macro Special(dsn);
%let lib = %upcase(%scan(&dsn,1,.));
%let dsn = %upcase(%scan(&dsn,2,.));
/* Creatign macro's for each column in the data set */
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
call symputx('col'||strip(_n_),strip(name));
if eof then call symputx('max',strip(_n_));
run;
%do i = 1 %to &max;
proc sql noprint;
select sum(&&col&i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
sum(&&col&i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
into :SN, :SP, :Perc
from &lib..&dsn;
quit;
data b;
Variable = "&&col&i";
Sum_Special = &sn;
Sum_positive = &sp;
Percentage = &Perc;
run;
proc append base = Datasets.Special_Perc_&split data = b force;
run;
proc sql;
drop table b;
quit;
%end;
%mend special;
%special(Datasets.Longevity_Numeric_&split);
Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%.
It may not be the best way to do it, but it got the job done. 😀
Hello @EduanR ,
Something like the below :
data work.have;
input ID $ Target_Variable Var1 Var2 Var3;
cards;
001 1 -2 -2 -2
002 1 -2 -2 -3
003 0 -2 -2 1500
004 1 5 -3 -3
005 0 -3 18 -2
;
run;
data work.have1(drop=i);
set work.have;
array myvars{3} Var1- Var3;
array countvar{3} c_Var1-c_Var3;
do i=1 to dim(myvars);
if myvars(i) in (-1,-2,-3,-4,-5,-6,-7) then countvar(i)=1; else countvar(i)=0;
end;
run;
ods trace off;
ods output OneWayFreqs=work.OneWayFreqs;
PROC FREQ data=work.have1 /*noprint*/;
tables c_Var1-c_Var3 / list /* missing out=work.myds */;
run;
data work.vars_to_throw_away;
set work.OneWayFreqs;
where c_Var1=1 OR c_Var2=1 OR c_Var3=1;
where also Percent >= 80;
run;
PROC SQL noprint;
select scan(Table,2,'_') into :Lijst separated by ' '
from work.vars_to_throw_away;
QUIT;
%PUT &=Lijst;
data work.want;
set work.have(drop=&Lijst.);
run;
/* end of program */
Koen
Thank you @sbxkoenk for the assistance.
I also managed to write the following code and it seems to work/provide the desired output.
Firstly, I created an empty table with the necessary specification before the macro as follows:
proc sql;
Create table Datasets.Special_Perc_&Split
(Variable char length = 23 format=$50.,
Sum_Special num length = 8 format = best12.,
Sum_positive num length = 8 format = best12.,
Percentage num length = 8 format = Percent6.2);
quit;
Secondly, I wrote a macro to calculate the percentage of special values for each column in any given data set. This is then added to the table that I created above for each iteration of the loop.
%macro Special(dsn);
%let lib = %upcase(%scan(&dsn,1,.));
%let dsn = %upcase(%scan(&dsn,2,.));
/* Creatign macro's for each column in the data set */
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
call symputx('col'||strip(_n_),strip(name));
if eof then call symputx('max',strip(_n_));
run;
%do i = 1 %to &max;
proc sql noprint;
select sum(&&col&i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
sum(&&col&i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
into :SN, :SP, :Perc
from &lib..&dsn;
quit;
data b;
Variable = "&&col&i";
Sum_Special = &sn;
Sum_positive = &sp;
Percentage = &Perc;
run;
proc append base = Datasets.Special_Perc_&split data = b force;
run;
proc sql;
drop table b;
quit;
%end;
%mend special;
%special(Datasets.Longevity_Numeric_&split);
Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%.
It may not be the best way to do it, but it got the job done. 😀
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.