DATA Step, Macro, Functions and more

Need to find whether the field is populated or not in SAS File.

Reply
Contributor
Posts: 42

Need to find whether the field is populated or not in SAS File.


Dear All,

I have a file with millions of records( nearly 300 fields) and I would like to know whether the filed is populating or not.

I tried using PROC FREQ, due to space constraints in my server, jobs are failing.

Please suggest me is there any other way to find out whether the field is populated or not ? Am looking for only whether the particular field is populated or not.

Kindly give me some suggestions.

Regards,

S Ravuri.

Respected Advisor
Posts: 4,919

Re: Need to find whether the field is populated or not in SAS File.

Posted in reply to sambasiva_ravuri_tcs_com

A simple query might be executed on the server :


proc sql;

select cats("count(", name, ") as N_", name) into :list separated by ","

from dictionary.columns

where libname="SASHELP" and memname="CLASS"; /* use uppercase names */

create table temp as

select &list from sashelp.class;

quit;

proc transpose data=temp out=counts;

var _all_;

run;

proc print data=counts; run;

PG

PG
Super User
Posts: 3,250

Re: Need to find whether the field is populated or not in SAS File.

Something like this may be useful:

proc format;

  value $populated

  ' ' = 'Not Populated'

  other = 'Populated'

  ;

  value populated

  . = 'Not Populated'

  other = 'Populated'

  ;

quit;

proc freq data = sashelp.class;

  format age populated.

         name $populated.;

  table age name;

run;


Valued Guide
Posts: 765

Re: Need to find whether the field is populated or not in SAS File.

Posted in reply to sambasiva_ravuri_tcs_com

hi ... another idea (learned from data_null_ postings), maybe this would work for you  ...

data test;

input a $1. w x y z;

datalines;

a . 9 9 .a

. . 0 . .b

c . . 6 .c

d . 8 . .d

q . 7 7 .e

;

proc freq data=test nlevels;

ods select nlevels;

run;


if no nonmissing levels, all values are missing (even counts different values of missing data, ._ through .z) ...

          Number of Variable Levels

                         Missing    Nonmissing

Variable      Levels      Levels        Levels

a                  5           1             4

w                  1           1             0

x                  5           1             4

y                  4           1             3

z                  5           5             0

bigger test ... PROC FREQ with only NLEVELS using the following (1 million observations, 100 variables) took about  8 seconds on my not-that-fast PC

data test;

array x(100);

do i = 1 to 1e6;

do j = 1 to 100;

   if mod(j,15) eq 0 then x(j) = .;

   else x(j) = ceil(10*ranuni(999));

end;

output;

end;

drop i j;

run;

LOG ...

670  proc freq data=test nlevels;

671  ods select nlevels;

672  run;

NOTE: There were 1000000 observations read from the data set WORK.TEST.

NOTE: PROCEDURE FREQ used (Total process time):

      real time           8.13 seconds

      cpu time            8.14 seconds

Super User
Posts: 10,020

Re: Need to find whether the field is populated or not in SAS File.

Posted in reply to sambasiva_ravuri_tcs_com
Respected Advisor
Posts: 4,173

Re: Need to find whether the field is populated or not in SAS File.

Posted in reply to sambasiva_ravuri_tcs_com

If this is about a single variable then may be a simple datastep would do:

data have;
  do _i=1 to 100000000;
    MyVar=floor(ranuni(1)*1000);
    if MyVar=0 then MyVar=.;
    output;
  end;
run;

data check(keep=ObsInTable ObsMissing);
  set have nobs=nobs end=last;
  if missing(MyVar) then ObsMissing+1;
  if last then
  do;
    ObsInTable=nobs;
    ObsMissing=coalesce(ObsMissing,0);
    output;
  end;
run;


23         data check(keep=ObsInTable ObsMissing);
24           set have nobs=nobs end=last;
25           if missing(MyVar) then ObsMissing+1;
26           if last then
27           do;
28             ObsInTable=nobs;
29             ObsMissing=coalesce(ObsMissing,0);
30             output;
31           end;
32         run;

NOTE: There were 100000000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.CHECK has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           6.79 seconds
      cpu time            6.47 seconds
     

Ask a Question
Discussion stats
  • 5 replies
  • 807 views
  • 0 likes
  • 6 in conversation