Help using Base SAS procedures

How to delete variable with large missing values

Accepted Solution Solved
Reply
Regular Contributor
Posts: 162
Accepted Solution

How to delete variable with large missing values

Hi all,

I understand how to delete variable using "drop" function. However, I have a large data set with many variables, it is time consuming for me to drop them one-by-one. Imagine that by merely copy and paste the variable names also take an hour.

I used proc means with an option of NMISS to view all the variables with their number of missing values. Some variables have very large number of missing values (even 100% missing). I would wish to delete these variables.

I wish to gather opinions here on how to write a program that can drop variables with conditions that if NMISS of variables is more than a certain number, for example, 69710.

Thank you.

Regards,

mspak


Accepted Solutions
Solution
‎03-17-2012 10:02 AM
Regular Contributor
Posts: 162

Re: How to delete variable with large missing values

Hi Haikuo,

The same answer generated even I replaced "eq" with ">=" As I have many variables with exact NMISS=69711, I don't think something wrong with "eq".

mspak.

View solution in original post


All Replies
Regular Contributor
Posts: 162

Re: How to delete variable with large missing values

I did it by copy and paste the NMISS report into excel, then I filter them according my criteria.  Lastly, I copy the list with the variables that to be deleted by using drop function. It works. But if there is an easy way via SAS program, will be great!!!

Thanks.

Regards,

mspak 

Super Contributor
Posts: 1,636

Re: How to delete variable with large missing values

Hi mspak,

Below is Ksharp's code to drop variables with all missing values. You can modify it to drop variables with certain number of missing values.

Note:  the libname and dataset name have to be  CAPITAL LETTERS in " where libname='WORK' and memname='HAVE';"

replace "&nobs" with  "your- number" in "  if _x{i} eq &nobs then _list=catx(' ',_list,vname(_x{i}));".

you need to increase the length of " _list" in "length _list $ 4000;".

data have;

infile cards missover;

input   Name  $   DOB  :$10.    (Add1  Add2 Addr3) (:$);

cards;

     John     10/10/80          data           data

     Alan      11/11/81          data

     Paul      10/10/79          data

;

run;

proc sql noprint;

select catx(' ','nmiss(',name,') as',name) into : list separated by ','

  from dictionary.columns

   where libname='WORK' and memname='HAVE';

select count(*) into : nobs from have;

create table temp as

  select &list from have;

quit;

data _null_;

set temp;

length _list $ 4000;

array _x{*} _numeric_;

do i=1 to dim(_x);

  if _x{i} eq &nobs then _list=catx(' ',_list,vname(_x{i}));

end;

call symputx('drop',_list);

run;

data want;

set have(drop=&drop);

run;

Linlin

Regular Contributor
Posts: 162

Re: How to delete variable with large missing values

Thanks Linlin,

I believe this is the way, but it seems more complicated than I obtained the list from excel. If there is a macro function to work out this steps, would be great!!!

mspak

Super Contributor
Posts: 1,636

Re: How to delete variable with large missing values

Hi mspak,

Have you tried the code? I don't understand why you think it is complicated to use the code. what you need to do is just replacing the libname and dataset with your own, then all the unwanted variables are dropped.

there are other ways from the link http://communities.sas.com/message/119802#119802

Good luck!

Regular Contributor
Posts: 162

Re: How to delete variable with large missing values

Hi again,

I couldn't get the answer. I replaced the "WORK" with my libname "GEOG" and the file name "HAVE" with "USA". The criteria if NMISS=69711, then variables to be deleted was set. There must some errors but tehre is no error message in log.

proc sql noprint;

select catx(' ','nmiss(',name,') as',name) into : list separated by ','

  from dictionary.columns

   where libname='GEOG' and memname='USA';

select count(*) into : nobs from GEOG.USA;

create table temp as

   select &list from GEOG.USA;

quit;

data _null_;

set geog.usa;

length _list $ 4000;

array _x{*} _numeric_;

do i=1 to dim(_x);

  if _x{i} eq 69711 then _list=catx(' ',_list,vname(_x{i}));

end;

call symputx('drop',_list);

run;

data want;

set GEOG.USA(drop=&drop);

run;

PROC MEANS DATA=WANT NMISS N; run;

I checked the answer again using proc means, I found there are not deleted.

Thank you.

mspak

Respected Advisor
Posts: 3,156

Re: How to delete variable with large missing values

If you criteria is 69711, then you should not use 'eq', which only took out those variablers with exact 69711 missing obs. Instead, you should use 'ge' or '>=',

if _x{i} >= 69711 then _list=catx(' ',_list,vname(_x{i}));

Also, take out the following statements, since you are not using the total number of obs as criteria, that will save you some time:

select count(*) into : nobs from GEOG.USA;

Haikuo

Solution
‎03-17-2012 10:02 AM
Regular Contributor
Posts: 162

Re: How to delete variable with large missing values

Hi Haikuo,

The same answer generated even I replaced "eq" with ">=" As I have many variables with exact NMISS=69711, I don't think something wrong with "eq".

mspak.

Respected Advisor
Posts: 3,156

Re: How to delete variable with large missing values

My bad, I have checked your code again, there was an error:

data _null_;

set geog.usa;

length _list $ 4000;

array _x{*} _numeric_;

it should be:

data _null_;

set temp;

length _list $ 4000;

array _x{*} _numeric_;

That is the whole purpose of temp.

Regular Contributor
Posts: 162

How to delete variable with large missing values

Thanks HaiKuo & Lilin,

So sorry that I replaced the file name wrongly. I got the answer now Smiley Happy.

mspak

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 362 views
  • 3 likes
  • 3 in conversation