Hi All,
I am looking for a macro code which needs to identify missing values percentage for each variable and drop the variables with missing values greater than 80% automatically.
If anyone have macro code for this, plz share with us.
Thanks in advance.
Regards
Ramesh
OK. No problem.
data have;
set sashelp.class;
if _n_=1 then call missing(sex,age,name);
if _n_ in (2:16) then call missing(name,age);
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql noprint;
select count(*) into : total from have ;
select catx(' ','nmiss(',_name_,")/&total as ",_name_) into : vnames separated by ','
from temp;
create table nmissing as
select &vnames from have;
quit;
proc transpose data=nmissing out=x;
run;
proc sql noprint;
select _name_ into : drop separated by ',' from x where col1>0.8;
alter table have
drop &drop;
quit;
I haven't seen a macro, but see this thread
From there, it shouldn't be too hard to build a macro
Why would you need a macro? You can simply do a proc freq on the data to get counts and missing counts, then a datastep to work out which ones you want to drop, then call execute a datastep to drop the given variables. Its three steps.
proc format;
value fmt
._-.z='missing'
other='nonmiss';
value $fmt
' '='missing'
other='nonmiss';
run;
proc freq data=have;
table _all_/missing;
format _numeric_ fmt. _character_ $fmt.;
run;
If we have less number of variables, it's easy to use code mentioned above. In my dataset, I have around 300 variables and to check each and every variable, will take more time.
My requirement is to have one macro code which will read all the missing values and drop the variables with greater than 80% missing values.
Plz share with me if you have any code like this.
Thanks in advance.
Or using IML if you have. Check IML's function countmiss() .
IF you don't have big table ,try IML code.
Otherwise ,I would write some SQL code.
data have;
set sashelp.class;
call missing(sex,age);
run;
proc iml;
use have nobs nobs;
read all var _num_ into x1[c=vname1];
read all var _char_ into x2[c=vname2];
close;
idx1=loc(countmiss(x1,'col')/nobs >0.8);
idx2=loc(countmiss(x2,'col')/nobs >0.8);
drop=vname1[idx1]||vname2[idx2];
submit drop;
data want;
set have;
drop &drop;
run;
endsubmit;
quit;
I have a large dataset. So, if possible plz write some sql code to fulfill my requirement. Thanks in advance.
OK. No problem.
data have;
set sashelp.class;
if _n_=1 then call missing(sex,age,name);
if _n_ in (2:16) then call missing(name,age);
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql noprint;
select count(*) into : total from have ;
select catx(' ','nmiss(',_name_,")/&total as ",_name_) into : vnames separated by ','
from temp;
create table nmissing as
select &vnames from have;
quit;
proc transpose data=nmissing out=x;
run;
proc sql noprint;
select _name_ into : drop separated by ',' from x where col1>0.8;
alter table have
drop &drop;
quit;
Thanks for the quick response Ksharp.
I have some other requirement like as follows. We need to have single or multiple SAS macro codes which will need to perform below tasks.
1. It should check all missing values in the dataset and then drop the variables with high missing values and at the same time impute those numeric and character missing values based on some conditions like in model development we used to follow something like that.
2. It should check for any outliers in the dataset and it has to treat those outliers.
3. It should create dummy variables like we have "Dummies" package in R which will create dummy variables and drop original variables after creating dummy variables.
If you have any macros to fulfill the above requirements, please share with me.
Thanks in advance.
You are asking too many question.I suggest you to start a new session to talk about these.
1. It should check all missing values in the dataset and then drop the variables with high missing values and at the same time impute those numeric and character missing values based on some conditions like in model development we used to follow something like that.
PROC MI or PROC PLS can impute missing value.
2. It should check for any outliers in the dataset and it has to treat those outliers.
PROC PLS or PROC TRANREG
3. It should create dummy variables like we have "Dummies" package in R which will create dummy variables and drop original variables after creating dummy variables.
I remember @Rick_SAS has written a blog about it .
I will post each question separately....
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.