BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dsp1
Calcite | Level 5

Hi All,

 

I am looking for a macro code which needs to identify missing values percentage for each variable and drop the variables with missing values greater than 80% automatically.

 

If anyone have macro code for this, plz share with us.

 

Thanks in advance.

 

Regards

 

Ramesh

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OK. No problem.

 

data have;
 set sashelp.class;
 if _n_=1 then call missing(sex,age,name);
 if _n_ in (2:16) then call missing(name,age);
run;


proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql noprint;
select count(*) into : total from have ;
select catx(' ','nmiss(',_name_,")/&total as ",_name_) into : vnames separated by ','
 from temp;
 
create table nmissing as
 select &vnames from have;
quit;

proc transpose data=nmissing out=x;
run;
proc sql noprint;
select _name_ into : drop separated by ',' from x where col1>0.8;

alter table have
 drop &drop;
quit;

View solution in original post

11 REPLIES 11
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Why would you need a macro?  You can simply do a proc freq on the data to get counts and missing counts, then a datastep to work out which ones you want to drop, then call execute a datastep to drop the given variables.  Its three steps.

Ksharp
Super User

proc format;
 value fmt
  ._-.z='missing'
  other='nonmiss';
 value $fmt
  ' '='missing'
  other='nonmiss';
run;
proc freq data=have;
table _all_/missing;
format _numeric_ fmt. _character_ $fmt.;
run;
dsp1
Calcite | Level 5

If we have less number of variables, it's easy to use code mentioned above. In my dataset, I have around 300 variables and to check each and every variable, will take more time.

 

My requirement is to have one macro code which will read all the missing values and drop the variables with greater than 80% missing values.

 

Plz share with me if you have any code like this.

 

Thanks in advance.

Ksharp
Super User
Or using IML if you have.

Check IML's function   countmiss() .

Ksharp
Super User

IF you don't have big table ,try IML code.

Otherwise ,I would write some SQL code.

 

data have;
 set sashelp.class;
 call missing(sex,age);
run;


proc iml;
use have nobs nobs;
read all var _num_ into x1[c=vname1];
read all var _char_ into x2[c=vname2];
close;

idx1=loc(countmiss(x1,'col')/nobs >0.8);
idx2=loc(countmiss(x2,'col')/nobs >0.8);
drop=vname1[idx1]||vname2[idx2]; 
submit drop;
 data want;
  set have;
  drop &drop;
 run;
endsubmit;
quit;

dsp1
Calcite | Level 5

I have a large dataset. So, if possible plz write some sql code to fulfill my requirement. Thanks in advance.

Ksharp
Super User

OK. No problem.

 

data have;
 set sashelp.class;
 if _n_=1 then call missing(sex,age,name);
 if _n_ in (2:16) then call missing(name,age);
run;


proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql noprint;
select count(*) into : total from have ;
select catx(' ','nmiss(',_name_,")/&total as ",_name_) into : vnames separated by ','
 from temp;
 
create table nmissing as
 select &vnames from have;
quit;

proc transpose data=nmissing out=x;
run;
proc sql noprint;
select _name_ into : drop separated by ',' from x where col1>0.8;

alter table have
 drop &drop;
quit;
dsp1
Calcite | Level 5

Thanks for the quick response Ksharp.

 

I have some other requirement like as follows. We need to have single or multiple SAS macro codes which will need to perform below tasks.

1. It should check all missing values in the dataset and then drop the variables with high missing values and at the same time impute those numeric and character missing values based on some conditions like in model development we used to follow something like that.

2. It should check for any outliers in the dataset and it has to treat those outliers.

3. It should create dummy variables like we have "Dummies" package in R which will create dummy variables and drop original variables after creating dummy variables.

 

If you have any macros to fulfill the above requirements, please share with me.

 

Thanks in advance.

Ksharp
Super User

You are asking too many question.I suggest you to start a new session to talk about these.

 

1. It should check all missing values in the dataset and then drop the variables with high missing values and at the same time impute those numeric and character missing values based on some conditions like in model development we used to follow something like that.

 

PROC MI  or  PROC PLS can impute missing value.

 

2. It should check for any outliers in the dataset and it has to treat those outliers.

 

PROC PLS or PROC TRANREG

 

3. It should create dummy variables like we have "Dummies" package in R which will create dummy variables and drop original variables after creating dummy variables.

 

I remember @Rick_SAS has written a blog about it . 

dsp1
Calcite | Level 5

I will post each question separately....

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 3241 views
  • 2 likes
  • 4 in conversation