turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- How to delete variable with large missing values

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2012 04:59 AM

Hi all,

I understand how to delete variable using "drop" function. However, I have a large data set with many variables, it is time consuming for me to drop them one-by-one. Imagine that by merely copy and paste the variable names also take an hour.

I used proc means with an option of NMISS to view all the variables with their number of missing values. Some variables have very large number of missing values (even 100% missing). I would wish to delete these variables.

I wish to gather opinions here on how to write a program that can drop variables with conditions that if NMISS of variables is more than a certain number, for example, 69710.

Thank you.

Regards,

mspak

Accepted Solutions

Solution

03-17-2012
10:02 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2012 10:02 AM

Hi Haikuo,

The same answer generated even I replaced "eq" with ">=" As I have many variables with exact NMISS=69711, I don't think something wrong with "eq".

mspak.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2012 06:04 AM

I did it by copy and paste the NMISS report into excel, then I filter them according my criteria. Lastly, I copy the list with the variables that to be deleted by using drop function. It works. But if there is an easy way via SAS program, will be great!!!

Thanks.

Regards,

mspak

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2012 08:39 AM

Hi mspak,

Below is Ksharp's code to drop variables with all missing values. You can modify it to drop variables with certain number of missing values.

Note: the libname and dataset name have to be CAPITAL LETTERS in " where libname='WORK' and memname='HAVE';"

replace "&nobs" with "your- number" in " if _x{i} eq &nobs then _list=catx(' ',_list,vname(_x{i}));".

you need to increase the length of " _list" in "length _list $ 4000;".

data have;

infile cards missover;

input Name $ DOB :$10. (Add1 Add2 Addr3) (:$);

cards;

John 10/10/80 data data

Alan 11/11/81 data

Paul 10/10/79 data

;

run;

proc sql noprint;

select catx(' ','nmiss(',name,') as',name) into : list separated by ','

from dictionary.columns

where libname='WORK' and memname='HAVE';

select count(*) into : nobs from have;

create table temp as

select &list from have;

quit;

data _null_;

set temp;

length _list $ 4000;

array _x{*} _numeric_;

do i=1 to dim(_x);

if _x{i} eq &nobs then _list=catx(' ',_list,vname(_x{i}));

end;

call symputx('drop',_list);

run;

data want;

set have(drop=&drop);

run;

Linlin

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2012 08:58 AM

Thanks Linlin,

I believe this is the way, but it seems more complicated than I obtained the list from excel. If there is a macro function to work out this steps, would be great!!!

mspak

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2012 09:17 AM

Hi mspak,

Have you tried the code? I don't understand why you think it is complicated to use the code. what you need to do is just replacing the libname and dataset with your own, then all the unwanted variables are dropped.

there are other ways from the link http://communities.sas.com/message/119802#119802

Good luck!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2012 09:44 AM

Hi again,

I couldn't get the answer. I replaced the "WORK" with my libname "GEOG" and the file name "HAVE" with "USA". The criteria if NMISS=69711, then variables to be deleted was set. There must some errors but tehre is no error message in log.

proc sql noprint;

select catx(' ','nmiss(',name,') as',name) into : list separated by ','

from dictionary.columns

where libname='GEOG' and memname='USA';

select count(*) into : nobs from GEOG.USA;

create table temp as

select &list from GEOG.USA;

quit;

data _null_;

set geog.usa;

length _list $ 4000;

array _x{*} _numeric_;

do i=1 to dim(_x);

if _x{i} eq 69711 then _list=catx(' ',_list,vname(_x{i}));

end;

call symputx('drop',_list);

run;

data want;

set GEOG.USA(drop=&drop);

run;

PROC MEANS DATA=WANT NMISS N; run;

I checked the answer again using proc means, I found there are not deleted.

Thank you.

mspak

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2012 09:55 AM

If you criteria is 69711, then you should not use 'eq', which only took out those variablers with exact 69711 missing obs. Instead, you should use 'ge' or '>=',

if _x{i} >= 69711 then _list=catx(' ',_list,vname(_x{i}));

Also, take out the following statements, since you are not using the total number of obs as criteria, that will save you some time:

select count(*) into : nobs from GEOG.USA;

Haikuo

Solution

03-17-2012
10:02 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2012 10:02 AM

Hi Haikuo,

The same answer generated even I replaced "eq" with ">=" As I have many variables with exact NMISS=69711, I don't think something wrong with "eq".

mspak.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2012 10:07 AM

My bad, I have checked your code again, there was an error:

data _null_;

set geog.usa;

length _list $ 4000;

array _x{*} _numeric_;

it should be:

data _null_;

set temp;

length _list $ 4000;

array _x{*} _numeric_;

That is the whole purpose of temp.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-17-2012 10:15 AM

Thanks HaiKuo & Lilin,

So sorry that I replaced the file name wrongly. I got the answer now .

mspak