Hi all,
I understand how to delete variable using "drop" function. However, I have a large data set with many variables, it is time consuming for me to drop them one-by-one. Imagine that by merely copy and paste the variable names also take an hour.
I used proc means with an option of NMISS to view all the variables with their number of missing values. Some variables have very large number of missing values (even 100% missing). I would wish to delete these variables.
I wish to gather opinions here on how to write a program that can drop variables with conditions that if NMISS of variables is more than a certain number, for example, 69710.
Thank you.
Regards,
mspak
Hi Haikuo,
The same answer generated even I replaced "eq" with ">=" As I have many variables with exact NMISS=69711, I don't think something wrong with "eq".
mspak.
I did it by copy and paste the NMISS report into excel, then I filter them according my criteria. Lastly, I copy the list with the variables that to be deleted by using drop function. It works. But if there is an easy way via SAS program, will be great!!!
Thanks.
Regards,
mspak
Hi mspak,
Below is Ksharp's code to drop variables with all missing values. You can modify it to drop variables with certain number of missing values.
Note: the libname and dataset name have to be CAPITAL LETTERS in " where libname='WORK' and memname='HAVE';"
replace "&nobs" with "your- number" in " if _x{i} eq &nobs then _list=catx(' ',_list,vname(_x{i}));".
you need to increase the length of " _list" in "length _list $ 4000;".
data have;
infile cards missover;
input Name $ DOB :$10. (Add1 Add2 Addr3) (:$);
cards;
John 10/10/80 data data
Alan 11/11/81 data
Paul 10/10/79 data
;
run;
proc sql noprint;
select catx(' ','nmiss(',name,') as',name) into : list separated by ','
from dictionary.columns
where libname='WORK' and memname='HAVE';
select count(*) into : nobs from have;
create table temp as
select &list from have;
quit;
data _null_;
set temp;
length _list $ 4000;
array _x{*} _numeric_;
do i=1 to dim(_x);
if _x{i} eq &nobs then _list=catx(' ',_list,vname(_x{i}));
end;
call symputx('drop',_list);
run;
data want;
set have(drop=&drop);
run;
Linlin
Thanks Linlin,
I believe this is the way, but it seems more complicated than I obtained the list from excel. If there is a macro function to work out this steps, would be great!!!
mspak
Hi mspak,
Have you tried the code? I don't understand why you think it is complicated to use the code. what you need to do is just replacing the libname and dataset with your own, then all the unwanted variables are dropped.
there are other ways from the link http://communities.sas.com/message/119802#119802
Good luck!
Hi again,
I couldn't get the answer. I replaced the "WORK" with my libname "GEOG" and the file name "HAVE" with "USA". The criteria if NMISS=69711, then variables to be deleted was set. There must some errors but tehre is no error message in log.
proc sql noprint;
select catx(' ','nmiss(',name,') as',name) into : list separated by ','
from dictionary.columns
where libname='GEOG' and memname='USA';
select count(*) into : nobs from GEOG.USA;
create table temp as
select &list from GEOG.USA;
quit;
data _null_;
set geog.usa;
length _list $ 4000;
array _x{*} _numeric_;
do i=1 to dim(_x);
if _x{i} eq 69711 then _list=catx(' ',_list,vname(_x{i}));
end;
call symputx('drop',_list);
run;
data want;
set GEOG.USA(drop=&drop);
run;
PROC MEANS DATA=WANT NMISS N; run;
I checked the answer again using proc means, I found there are not deleted.
Thank you.
mspak
If you criteria is 69711, then you should not use 'eq', which only took out those variablers with exact 69711 missing obs. Instead, you should use 'ge' or '>=',
if _x{i} >= 69711 then _list=catx(' ',_list,vname(_x{i}));
Also, take out the following statements, since you are not using the total number of obs as criteria, that will save you some time:
select count(*) into : nobs from GEOG.USA;
Haikuo
Hi Haikuo,
The same answer generated even I replaced "eq" with ">=" As I have many variables with exact NMISS=69711, I don't think something wrong with "eq".
mspak.
My bad, I have checked your code again, there was an error:
data _null_;
set geog.usa;
length _list $ 4000;
array _x{*} _numeric_;
it should be:
data _null_;
set temp;
length _list $ 4000;
array _x{*} _numeric_;
That is the whole purpose of temp.
Thanks HaiKuo & Lilin,
So sorry that I replaced the file name wrongly. I got the answer now .
mspak
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.