Solved: Re: Macro Code to Identify Missing values and drop the variables with ...

dsp1 · Posted 01-24-2018 01:17 AM

Hi All,

I am looking for a macro code which needs to identify missing values percentage for each variable and drop the variables with missing values greater than 80% automatically.

If anyone have macro code for this, plz share with us.

Thanks in advance.

Regards

Ramesh

Ksharp · Posted 01-29-2018 07:34 AM

OK. No problem.

data have;
 set sashelp.class;
 if _n_=1 then call missing(sex,age,name);
 if _n_ in (2:16) then call missing(name,age);
run;


proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql noprint;
select count(*) into : total from have ;
select catx(' ','nmiss(',_name_,")/&total as ",_name_) into : vnames separated by ','
 from temp;
 
create table nmissing as
 select &vnames from have;
quit;

proc transpose data=nmissing out=x;
run;
proc sql noprint;
select _name_ into : drop separated by ',' from x where col1>0.8;

alter table have
 drop &drop;
quit;

View solution in original post

PeterClemmensen · Posted 01-24-2018 03:09 AM

I haven't seen a macro, but see this thread

https://communities.sas.com/t5/Base-SAS-Programming/Is-there-any-macro-to-check-percentage-of-missin...

From there, it shouldn't be too hard to build a macro

The DATA to DATA Step Macro
Blog: SASnrd

RW9 · Posted 01-24-2018 04:26 AM

Why would you need a macro? You can simply do a proc freq on the data to get counts and missing counts, then a datastep to work out which ones you want to drop, then call execute a datastep to drop the given variables. Its three steps.

Ksharp · Posted 01-24-2018 07:33 AM


proc format;
 value fmt
  ._-.z='missing'
  other='nonmiss';
 value $fmt
  ' '='missing'
  other='nonmiss';
run;
proc freq data=have;
table _all_/missing;
format _numeric_ fmt. _character_ $fmt.;
run;

dsp1 · Posted 01-29-2018 05:29 AM

If we have less number of variables, it's easy to use code mentioned above. In my dataset, I have around 300 variables and to check each and every variable, will take more time.

My requirement is to have one macro code which will read all the missing values and drop the variables with greater than 80% missing values.

Plz share with me if you have any code like this.

Thanks in advance.

Ksharp · Posted 01-24-2018 07:36 AM

Or using IML if you have.

Check IML's function   countmiss() .

Ksharp · Posted 01-29-2018 07:08 AM

IF you don't have big table ,try IML code.

Otherwise ,I would write some SQL code.

data have;
 set sashelp.class;
 call missing(sex,age);
run;


proc iml;
use have nobs nobs;
read all var _num_ into x1[c=vname1];
read all var _char_ into x2[c=vname2];
close;

idx1=loc(countmiss(x1,'col')/nobs >0.8);
idx2=loc(countmiss(x2,'col')/nobs >0.8);
drop=vname1[idx1]||vname2[idx2]; 
submit drop;
 data want;
  set have;
  drop &drop;
 run;
endsubmit;
quit;

dsp1 · Posted 01-29-2018 07:21 AM

I have a large dataset. So, if possible plz write some sql code to fulfill my requirement. Thanks in advance.

Ksharp · Posted 01-29-2018 07:34 AM

OK. No problem.

data have;
 set sashelp.class;
 if _n_=1 then call missing(sex,age,name);
 if _n_ in (2:16) then call missing(name,age);
run;


proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql noprint;
select count(*) into : total from have ;
select catx(' ','nmiss(',_name_,")/&total as ",_name_) into : vnames separated by ','
 from temp;
 
create table nmissing as
 select &vnames from have;
quit;

proc transpose data=nmissing out=x;
run;
proc sql noprint;
select _name_ into : drop separated by ',' from x where col1>0.8;

alter table have
 drop &drop;
quit;

dsp1 · Posted 02-01-2018 01:02 AM

Thanks for the quick response Ksharp.

I have some other requirement like as follows. We need to have single or multiple SAS macro codes which will need to perform below tasks.

1. It should check all missing values in the dataset and then drop the variables with high missing values and at the same time impute those numeric and character missing values based on some conditions like in model development we used to follow something like that.

2. It should check for any outliers in the dataset and it has to treat those outliers.

3. It should create dummy variables like we have "Dummies" package in R which will create dummy variables and drop original variables after creating dummy variables.

If you have any macros to fulfill the above requirements, please share with me.

Thanks in advance.

Ksharp · Posted 02-01-2018 07:17 AM

You are asking too many question.I suggest you to start a new session to talk about these.

1. It should check all missing values in the dataset and then drop the variables with high missing values and at the same time impute those numeric and character missing values based on some conditions like in model development we used to follow something like that.

PROC MI or PROC PLS can impute missing value.

2. It should check for any outliers in the dataset and it has to treat those outliers.

PROC PLS or PROC TRANREG

3. It should create dummy variables like we have "Dummies" package in R which will create dummy variables and drop original variables after creating dummy variables.

I remember @Rick_SAS has written a blog about it .

dsp1 · Posted 02-02-2018 12:08 AM

I will post each question separately....

Macro Code to Identify Missing values and drop the variables with greater than 80% missing values.

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

Re: Macro Code to Identify Missing values and drop the variables with greater than 80% missing value

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away