06-28-2012 02:33 PM
I have a file with millions of records( nearly 300 fields) and I would like to know whether the filed is populating or not.
I tried using PROC FREQ, due to space constraints in my server, jobs are failing.
Please suggest me is there any other way to find out whether the field is populated or not ? Am looking for only whether the particular field is populated or not.
Kindly give me some suggestions.
06-28-2012 02:55 PM
A simple query might be executed on the server :
select cats("count(", name, ") as N_", name) into :list separated by ","
where libname="SASHELP" and memname="CLASS"; /* use uppercase names */
create table temp as
select &list from sashelp.class;
proc transpose data=temp out=counts;
proc print data=counts; run;
06-28-2012 04:24 PM
Something like this may be useful:
' ' = 'Not Populated'
other = 'Populated'
. = 'Not Populated'
other = 'Populated'
proc freq data = sashelp.class;
format age populated.
table age name;
06-28-2012 04:43 PM
hi ... another idea (learned from data_null_ postings), maybe this would work for you ...
input a $1. w x y z;
a . 9 9 .a
. . 0 . .b
c . . 6 .c
d . 8 . .d
q . 7 7 .e
proc freq data=test nlevels;
ods select nlevels;
if no nonmissing levels, all values are missing (even counts different values of missing data, ._ through .z) ...
Number of Variable Levels
Variable Levels Levels Levels
a 5 1 4
w 1 1 0
x 5 1 4
y 4 1 3
z 5 5 0
bigger test ... PROC FREQ with only NLEVELS using the following (1 million observations, 100 variables) took about 8 seconds on my not-that-fast PC
do i = 1 to 1e6;
do j = 1 to 100;
if mod(j,15) eq 0 then x(j) = .;
else x(j) = ceil(10*ranuni(999));
drop i j;
670 proc freq data=test nlevels;
671 ods select nlevels;
NOTE: There were 1000000 observations read from the data set WORK.TEST.
NOTE: PROCEDURE FREQ used (Total process time):
real time 8.13 seconds
cpu time 8.14 seconds
06-29-2012 06:38 AM
If this is about a single variable then may be a simple datastep would do:
do _i=1 to 100000000;
if MyVar=0 then MyVar=.;
data check(keep=ObsInTable ObsMissing);
set have nobs=nobs end=last;
if missing(MyVar) then ObsMissing+1;
if last then
23 data check(keep=ObsInTable ObsMissing);
24 set have nobs=nobs end=last;
25 if missing(MyVar) then ObsMissing+1;
26 if last then
NOTE: There were 100000000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.CHECK has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 6.79 seconds
cpu time 6.47 seconds