BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mspak
Quartz | Level 8

Hi all,

I understand how to delete variable using "drop" function. However, I have a large data set with many variables, it is time consuming for me to drop them one-by-one. Imagine that by merely copy and paste the variable names also take an hour.

I used proc means with an option of NMISS to view all the variables with their number of missing values. Some variables have very large number of missing values (even 100% missing). I would wish to delete these variables.

I wish to gather opinions here on how to write a program that can drop variables with conditions that if NMISS of variables is more than a certain number, for example, 69710.

Thank you.

Regards,

mspak

1 ACCEPTED SOLUTION

Accepted Solutions
mspak
Quartz | Level 8

Hi Haikuo,

The same answer generated even I replaced "eq" with ">=" As I have many variables with exact NMISS=69711, I don't think something wrong with "eq".

mspak.

View solution in original post

9 REPLIES 9
mspak
Quartz | Level 8

I did it by copy and paste the NMISS report into excel, then I filter them according my criteria.  Lastly, I copy the list with the variables that to be deleted by using drop function. It works. But if there is an easy way via SAS program, will be great!!!

Thanks.

Regards,

mspak 

Linlin
Lapis Lazuli | Level 10

Hi mspak,

Below is Ksharp's code to drop variables with all missing values. You can modify it to drop variables with certain number of missing values.

Note:  the libname and dataset name have to be  CAPITAL LETTERS in " where libname='WORK' and memname='HAVE';"

replace "&nobs" with  "your- number" in "  if _x{i} eq &nobs then _list=catx(' ',_list,vname(_x{i}));".

you need to increase the length of " _list" in "length _list $ 4000;".

data have;

infile cards missover;

input   Name  $   DOB  :$10.    (Add1  Add2 Addr3) (:$);

cards;

     John     10/10/80          data           data

     Alan      11/11/81          data

     Paul      10/10/79          data

;

run;

proc sql noprint;

select catx(' ','nmiss(',name,') as',name) into : list separated by ','

  from dictionary.columns

   where libname='WORK' and memname='HAVE';

select count(*) into : nobs from have;

create table temp as

  select &list from have;

quit;

data _null_;

set temp;

length _list $ 4000;

array _x{*} _numeric_;

do i=1 to dim(_x);

  if _x{i} eq &nobs then _list=catx(' ',_list,vname(_x{i}));

end;

call symputx('drop',_list);

run;

data want;

set have(drop=&drop);

run;

Linlin

mspak
Quartz | Level 8

Thanks Linlin,

I believe this is the way, but it seems more complicated than I obtained the list from excel. If there is a macro function to work out this steps, would be great!!!

mspak

Linlin
Lapis Lazuli | Level 10

Hi mspak,

Have you tried the code? I don't understand why you think it is complicated to use the code. what you need to do is just replacing the libname and dataset with your own, then all the unwanted variables are dropped.

there are other ways from the link http://communities.sas.com/message/119802#119802

Good luck!

mspak
Quartz | Level 8

Hi again,

I couldn't get the answer. I replaced the "WORK" with my libname "GEOG" and the file name "HAVE" with "USA". The criteria if NMISS=69711, then variables to be deleted was set. There must some errors but tehre is no error message in log.

proc sql noprint;

select catx(' ','nmiss(',name,') as',name) into : list separated by ','

  from dictionary.columns

   where libname='GEOG' and memname='USA';

select count(*) into : nobs from GEOG.USA;

create table temp as

   select &list from GEOG.USA;

quit;

data _null_;

set geog.usa;

length _list $ 4000;

array _x{*} _numeric_;

do i=1 to dim(_x);

  if _x{i} eq 69711 then _list=catx(' ',_list,vname(_x{i}));

end;

call symputx('drop',_list);

run;

data want;

set GEOG.USA(drop=&drop);

run;

PROC MEANS DATA=WANT NMISS N; run;

I checked the answer again using proc means, I found there are not deleted.

Thank you.

mspak

Haikuo
Onyx | Level 15

If you criteria is 69711, then you should not use 'eq', which only took out those variablers with exact 69711 missing obs. Instead, you should use 'ge' or '>=',

if _x{i} >= 69711 then _list=catx(' ',_list,vname(_x{i}));

Also, take out the following statements, since you are not using the total number of obs as criteria, that will save you some time:

select count(*) into : nobs from GEOG.USA;

Haikuo

mspak
Quartz | Level 8

Hi Haikuo,

The same answer generated even I replaced "eq" with ">=" As I have many variables with exact NMISS=69711, I don't think something wrong with "eq".

mspak.

Haikuo
Onyx | Level 15

My bad, I have checked your code again, there was an error:

data _null_;

set geog.usa;

length _list $ 4000;

array _x{*} _numeric_;

it should be:

data _null_;

set temp;

length _list $ 4000;

array _x{*} _numeric_;

That is the whole purpose of temp.

mspak
Quartz | Level 8

Thanks HaiKuo & Lilin,

So sorry that I replaced the file name wrongly. I got the answer now Smiley Happy.

mspak

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1376 views
  • 3 likes
  • 3 in conversation