Help using Base SAS procedures

Missing report by year??

Reply
Contributor
Posts: 30

Missing report by year??

Hey all!

I am searching for missing data in very big data sets (200 mio observations) and many variables (around 200).

I have to make a missing report for each variable telling the amount of missing observations for each year.

I can do it manual, - making a PDF for each variable - by making proc freq with format (with the code below) for each variable, but I am searching for a program that can make a missing report for each variable by year, because doing the programming for each variable below is very tedious, working with many data sets.

----------------

code for the manual program :

data xxxxx1;

set xxxxx.xxxxx;

format      _numeric_ nm.

               _character_ $ch.;

format     year;

run;

ods pdf file=".....";

proc freq data= xxxxx1;

tables year*VARIABLE / nocol nopercent missing;

run;

osd pdf close;

---------------

I hope somebody have good ideas and help! Thanks. / Best regards Silas, Denmark, Aarhus University

Super User
Posts: 11,343

Re: Missing report by year??

Posted in reply to silasskovsbo

For the numerics this is easy.

Create your YEAR variable as string and on each record as in the following code it would be both a class and analysis variable and proc tabulate doesn't like that.

proc tabulate data= xxxxxx1;

     class year;

     var _numeric_;

     table year,

             _numeric_,

             nmiss;

run;

This will create an output table for each year with the numeric variables as the rows and a count of how many time each variable is missing.

An option for the character would be to assign the name or label of the character variable as the label to newly created numeric variables that hold the length of the character variables but set to missing if the length is 0 and use the above approach.

Contributor
Posts: 30

Re: Missing report by year??

Thanks for the fast reply. Smiley Happy

It was also helpful with a way to get a assembled table, but what I need is one PDF for each variable, and, if possible, a code which can both handle numeric and character-variables.??

Trusted Advisor
Posts: 1,019

Re: Missing report by year??

Posted in reply to silasskovsbo

Why not apply the proc freq to all vars at once, as in:

proc format;

   value nm  low-high='Valid'  other='Miss';
   value $cm   '  ' = 'Miss'   other='Valid';

run;

proc sql noprint;

  select name into :vars separated by ' '

  from dictionary.columns

  where libname='MYLIB' and memname='HAVE' and upcase(name)^='YEAR';

quit;

proc freq data=mylib.have;

  tables year * (&vars) / nocol nopercent missing;

  format _numeric_ nm.  _character_ $cm.  year 4.;

run;

It prints one page per variable, all in one pdf file.  Or were you trying to print it all in a single table?

Contributor
Posts: 30

Re: Missing report by year??

The best would be one PDF pr variable because there are many variables and the researchers often only need some missing statistics on specific variables.

One PDF with one page per variable would also be pretty good though! Smiley Happy

I have no experience with proc sql, but I have tried your code but there appears to be some problems (showed in the log):

- after 'proc freq data=mylib.have;' the log sais: 'ERROR: file mylib.have.data does not exist?

- after 'tables year*(&vars) / nocol nopercent missing; the log sais, referring to (&vars) 'ERROR no data set open to look up variables. WARNING: Apparent symbolic reference VARS not resolved'

But thanks very much for the fast answer and hope you can help me a bit more!

Trusted Advisor
Posts: 1,019

Re: Missing report by year??

Posted in reply to silasskovsbo

MYLIB and HAVE were meant to be replaced by you with the libname and data set name you want to process.  (In your initial post you had a libname of xxxxx and dataset name of xxxxx).

Now, if you want one pdf file per variable, I suspect you will also have to have one proc freq per variable.

So first, create a macro to do just one variable:

%macro one_pdf(varname);

  proc format;
    .... same as before ....;

  run;

  ods pdf file="...../&varname..pdf"

  proc freq data=xxxxx.xxxxx;

     tables year * &varname / nocol nopercent missing ;

  run;

  ods pdf close;

%mend;

Then make a sequence of sas statements, with one line per variable, to call the above macro, like:

%one_pdf(var1);

%one_pdf(var2);

I'd suggest using sashelp.vcolumn (analogous to dictionary.columns, but accessible in a DATA step).  It has one record for each variable in each data set.  So in your case you could get the variables of interest in xxxxx.xxxxx and use the "call execute" statement to tell SAS to take the text argument and stack it in the SAS script right after the current data step finishes, as in:

data _null_;

  set sashelp.vcolumn;

  where libname='XXXXX' and memname='XXXXX' and upcase(name) ^= 'YEAR';

  execute_text =   cats('%one_pdf',name,');');

  call execute(trim(execute_text));

run;

This should result in one macro call per variable, executed in sequence after the above data step.

Contributor
Posts: 30

Re: Missing report by year??

Thank you very much mkeintz.

I have now have succes in making the missing reports in one PDF which was also very helpfull to know that possibility.

The macro you have made to me does also function, but I do not fully understand the last part of your instruction. When I need to run the macro do I need to write

%one_pdf(var1);

for all the variables one by one or is the last a way to get rid of this? It was very great to find the data set sashelp.vcolumn btw.

When I copy the last part of the program

data _null_ ....

...

run;

, I get this error:

ERROR: File DB.VAREK.DATA does not exist.

ERROR: FIle DB.VARES.DATA does not exist.

??? and again, thx!! :smileygrin:

Super User
Posts: 10,028

Re: Missing report by year??

Posted in reply to silasskovsbo

What does your sample data look like ? and your report ?

data have;
do year=2001 to 2006;
 do month=1 to 12;
  stock='P'; amount=12;
  if ranuni(-1) lt 0.2 then call missing(stock);
   else if ranuni(-1) lt 0.4 then call missing(amount);
    else if ranuni(-1) lt 0.5 then call missing(month);  
  output;
end;
end;
run;

proc sql noprint;
select cats('nmiss(',name,') as nmiss_',name) into : list separated by ','
 from dictionary.columns
  where libname='WORK' and memname='HAVE' and name ne 'year';

create table want as
select year,&list from have group by year ;
quit;

Ksharp

Contributor
Posts: 30

Re: Missing report by year??

Hey Ksharp. Thank you for your attention.

My sample is a dataset with both categoric and numeric variables. For some variables most is missing and for other there is no missing or a bit.

My report just is a proc freq report.

I have tried to see your code to understand it.

Creating your sample was without problems, it is similar to mine data set. But when I do the last part there appears a warning and error as this :

WARNING: Apparent symbolic reference LIST not resolved.

ERROR 22-322: Syntax error .....

It appears by the '&list'

?? Smiley Happy

Super Contributor
Posts: 1,636

Re: Missing report by year??

Posted in reply to silasskovsbo

Hi,

The code below created macro variable &list:

proc sql noprint;

select cats('nmiss(',name,') as nmiss_',name) into : list separated by ','

from dictionary.columns

  where libname='WORK' and memname='HAVE' and name ne 'year';

Please make sure your libname and memname are using upcase. it may also help to change "name ne 'year' " to "upcase(name) ne 'YEAR' "

Super User
Posts: 10,028

Re: Missing report by year??

Posted in reply to silasskovsbo

There is no error for me . Did you try the recommendation from Linlin?

Ask a Question
Discussion stats
  • 10 replies
  • 320 views
  • 4 likes
  • 5 in conversation