Hi All,
I have some variables in my dataset that doesn't have any data at all. can someone tell me how do I ignore them during the import stage or any other data step? Thanks.
PROC IMPORT DATAFILE="C:\DATA\havexlsx"
DBMS=xlsx replace
OUT=Apple.want;
SHEET='sheet1';
RUN;
Tedious but simple to do with a _null_ data step. An example with sashelp.cars
data test;
set sashelp.cars;
call missing (Origin, Cylinders); /* Make them all missing */
run;
data _null_;
set test end=done;
array _n _numeric_ _dumn;
array _c _character_ _dumc;
array _nn{2000} _temporary_;
array _nc{2000} _temporary_;
do i = 1 to dim(_n)-1;
if not _nn{i} and not missing(_n{i}) then _nn{i} = 1;
end;
do i = 1 to dim(_c)-1;
if not _nc{i} and not missing(_c{i}) then _nc{i} = 1;
end;
length _vars $1000;
if done then do;
do i = 1 to dim(_n)-1;
if not _nn{i} then _vars = catx(" ", _vars, vname(_n{i}));
end;
do i = 1 to dim(_c)-1;
if not _nc{i} then _vars = catx(" ", _vars, vname(_c{i}));
end;
call symputx("droplist", _vars);
end;
run;
data test2;
_dummy = 0; /* In case the drop list is empty */
set test;
drop &droplist _dummy;
run;
What do you mean by "ignore"?
Missing values are ignored when calculations, such a avverage, are made.
Or do you want want to remove the column from the data set? This is dangerous as there might be some data when you import refreshed versions of the file later on, and you then end up with different table structures, and with code must deal with columns that may exist or not.
I've used this in the past:
Sample 24612: Delete variables that have only missing values
http://support.sas.com/kb/24/612.html
See "Full Code" tab for code.
Many alternatives to find on the www (but make sure these programs do exactly what you want in all circumstances!).
Koen
Or a very crude:
data Apple.want;
set Apple.want;
drop VarB VarQ VarX;
run;
If you know the names of the variables you don't want put them in place of the VarB VarQ VarX.
Tedious but simple to do with a _null_ data step. An example with sashelp.cars
data test;
set sashelp.cars;
call missing (Origin, Cylinders); /* Make them all missing */
run;
data _null_;
set test end=done;
array _n _numeric_ _dumn;
array _c _character_ _dumc;
array _nn{2000} _temporary_;
array _nc{2000} _temporary_;
do i = 1 to dim(_n)-1;
if not _nn{i} and not missing(_n{i}) then _nn{i} = 1;
end;
do i = 1 to dim(_c)-1;
if not _nc{i} and not missing(_c{i}) then _nc{i} = 1;
end;
length _vars $1000;
if done then do;
do i = 1 to dim(_n)-1;
if not _nn{i} then _vars = catx(" ", _vars, vname(_n{i}));
end;
do i = 1 to dim(_c)-1;
if not _nc{i} then _vars = catx(" ", _vars, vname(_c{i}));
end;
call symputx("droplist", _vars);
end;
run;
data test2;
_dummy = 0; /* In case the drop list is empty */
set test;
drop &droplist _dummy;
run;
Like this?
data HAVE;
do i=1 to 12;
'VARlkj-- *&^*&^,'n = ' ';
output;
end;
run;
proc sql noprint;
select NAME into :var_list separated by '\'
from DICTIONARY.COLUMNS
where LIBNAME='WORK' and MEMNAME='HAVE';
select catx(' ',
%macro loop_vars;
%local i ;
%do i=1 %to %sysfunc(countw(%superq(var_list),\));
%if &i>1 %then ,;
missing(max("%scan(%superq(var_list),&i,\)"n))
%end;
%mend; %loop_vars
)
into :missing_flag_list
from HAVE
;
create table WANT as
select *
from HAVE(drop=
%macro loop_vars;
%local i ;
%do i=1 %to %sysfunc(countw(%superq(var_list),\));
%if %scan(&missing_flag_list,&i) %then "%scan(%superq(var_list),&i,\)"n ;
%end;
%mend; %loop_vars
)
;
quit;
Why is that? It sounds to me like you don't know your own data then. What does the data transfer agreement say? Does that allow missing columns? Don't have one of those, well that is the problem. Your trying to guess what the data is without knowing the file strcuture or the contents, fix the process (i.e. do it properly in a documented manner using appropriate file formats and processes) and these problems dissappear.
@RW9, I agree with your evaluation of spreadsheets as databases. But not everyone uses SAS within a process. Some are data buyers and consumers, but others are just gleaners. Research and journalism, for example, often have no control over the structure of the data that they gather.
It is probably easier to use the NLEVELS option on PROC FREQ than using the older macros that have been around for years.
data test;
set sashelp.class ;
no_number=.;
no_character=' ';
run;
ods exclude nlevels;
ods output nlevels=nlevels;
proc freq nlevels data=test;
tables _all_ / noprint;
run;
You can then get the list of variable names that are all missing from the NLEVELS dataset.
proc sql noprint;
select tablevar into :droplist separated by ' '
from nlevels
where nnonmisslevels=0
;
quit;
Or you could do the reserve and get the list of variables that have non-missing data.
Complete @Tom's code with the last step from my program up there.
Creating a new dataset without selected variables is just the simple case of writing a data step with a DROP statement in it. The dynamic part is to have a method where the list of variables does not need to be hard coded. That is what the macro variable is for.
data want ;
set have;
drop &droplist;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.