Desktop productivity for business analysts and programmers

Macro Code to Identify Missing values and drop the variables with greater than 80% missing values.

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

Macro Code to Identify Missing values and drop the variables with greater than 80% missing values.

Hi All,

 

I am looking for a macro code which needs to identify missing values percentage for each variable and drop the variables with missing values greater than 80% automatically.

 

If anyone have macro code for this, plz share with us.

 

Thanks in advance.

 

Regards

 

Ramesh

 

 


Accepted Solutions
Solution
‎01-30-2018 01:05 AM
Super User
Posts: 10,626

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

OK. No problem.

 

data have;
 set sashelp.class;
 if _n_=1 then call missing(sex,age,name);
 if _n_ in (2:16) then call missing(name,age);
run;


proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql noprint;
select count(*) into : total from have ;
select catx(' ','nmiss(',_name_,")/&total as ",_name_) into : vnames separated by ','
 from temp;
 
create table nmissing as
 select &vnames from have;
quit;

proc transpose data=nmissing out=x;
run;
proc sql noprint;
select _name_ into : drop separated by ',' from x where col1>0.8;

alter table have
 drop &drop;
quit;

View solution in original post


All Replies
PROC Star
Posts: 1,193

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

I haven't seen a macro, but see this thread

 

https://communities.sas.com/t5/Base-SAS-Programming/Is-there-any-macro-to-check-percentage-of-missin...

 

From there, it shouldn't be too hard to build a macro

Super User
Super User
Posts: 9,227

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Why would you need a macro?  You can simply do a proc freq on the data to get counts and missing counts, then a datastep to work out which ones you want to drop, then call execute a datastep to drop the given variables.  Its three steps.

Super User
Posts: 10,626

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value


proc format;
 value fmt
  ._-.z='missing'
  other='nonmiss';
 value $fmt
  ' '='missing'
  other='nonmiss';
run;
proc freq data=have;
table _all_/missing;
format _numeric_ fmt. _character_ $fmt.;
run;
Occasional Contributor
Posts: 12

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

[ Edited ]

If we have less number of variables, it's easy to use code mentioned above. In my dataset, I have around 300 variables and to check each and every variable, will take more time.

 

My requirement is to have one macro code which will read all the missing values and drop the variables with greater than 80% missing values.

 

Plz share with me if you have any code like this.

 

Thanks in advance.

Super User
Posts: 10,626

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Or using IML if you have.

Check IML's function   countmiss() .

Super User
Posts: 10,626

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

IF you don't have big table ,try IML code.

Otherwise ,I would write some SQL code.

 

data have;
 set sashelp.class;
 call missing(sex,age);
run;


proc iml;
use have nobs nobs;
read all var _num_ into x1[c=vname1];
read all var _char_ into x2[c=vname2];
close;

idx1=loc(countmiss(x1,'col')/nobs >0.8);
idx2=loc(countmiss(x2,'col')/nobs >0.8);
drop=vname1[idx1]||vname2[idx2]; 
submit drop;
 data want;
  set have;
  drop &drop;
 run;
endsubmit;
quit;

Occasional Contributor
Posts: 12

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

I have a large dataset. So, if possible plz write some sql code to fulfill my requirement. Thanks in advance.

Solution
‎01-30-2018 01:05 AM
Super User
Posts: 10,626

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

OK. No problem.

 

data have;
 set sashelp.class;
 if _n_=1 then call missing(sex,age,name);
 if _n_ in (2:16) then call missing(name,age);
run;


proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql noprint;
select count(*) into : total from have ;
select catx(' ','nmiss(',_name_,")/&total as ",_name_) into : vnames separated by ','
 from temp;
 
create table nmissing as
 select &vnames from have;
quit;

proc transpose data=nmissing out=x;
run;
proc sql noprint;
select _name_ into : drop separated by ',' from x where col1>0.8;

alter table have
 drop &drop;
quit;
Occasional Contributor
Posts: 12

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Thanks for the quick response Ksharp.

 

I have some other requirement like as follows. We need to have single or multiple SAS macro codes which will need to perform below tasks.

1. It should check all missing values in the dataset and then drop the variables with high missing values and at the same time impute those numeric and character missing values based on some conditions like in model development we used to follow something like that.

2. It should check for any outliers in the dataset and it has to treat those outliers.

3. It should create dummy variables like we have "Dummies" package in R which will create dummy variables and drop original variables after creating dummy variables.

 

If you have any macros to fulfill the above requirements, please share with me.

 

Thanks in advance.

Super User
Posts: 10,626

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

You are asking too many question.I suggest you to start a new session to talk about these.

 

1. It should check all missing values in the dataset and then drop the variables with high missing values and at the same time impute those numeric and character missing values based on some conditions like in model development we used to follow something like that.

 

PROC MI  or  PROC PLS can impute missing value.

 

2. It should check for any outliers in the dataset and it has to treat those outliers.

 

PROC PLS or PROC TRANREG

 

3. It should create dummy variables like we have "Dummies" package in R which will create dummy variables and drop original variables after creating dummy variables.

 

I remember @Rick_SAS has written a blog about it . 

Occasional Contributor
Posts: 12

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

I will post each question separately....

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 313 views
  • 2 likes
  • 4 in conversation