BookmarkSubscribeRSS Feed
silasskovsbo
Calcite | Level 5

Hey all!

I am searching for missing data in very big data sets (200 mio observations) and many variables (around 200).

I have to make a missing report for each variable telling the amount of missing observations for each year.

I can do it manual, - making a PDF for each variable - by making proc freq with format (with the code below) for each variable, but I am searching for a program that can make a missing report for each variable by year, because doing the programming for each variable below is very tedious, working with many data sets.

----------------

code for the manual program :

data xxxxx1;

set xxxxx.xxxxx;

format      _numeric_ nm.

               _character_ $ch.;

format     year;

run;

ods pdf file=".....";

proc freq data= xxxxx1;

tables year*VARIABLE / nocol nopercent missing;

run;

osd pdf close;

---------------

I hope somebody have good ideas and help! Thanks. / Best regards Silas, Denmark, Aarhus University

10 REPLIES 10
ballardw
Super User

For the numerics this is easy.

Create your YEAR variable as string and on each record as in the following code it would be both a class and analysis variable and proc tabulate doesn't like that.

proc tabulate data= xxxxxx1;

     class year;

     var _numeric_;

     table year,

             _numeric_,

             nmiss;

run;

This will create an output table for each year with the numeric variables as the rows and a count of how many time each variable is missing.

An option for the character would be to assign the name or label of the character variable as the label to newly created numeric variables that hold the length of the character variables but set to missing if the length is 0 and use the above approach.

silasskovsbo
Calcite | Level 5

Thanks for the fast reply. Smiley Happy

It was also helpful with a way to get a assembled table, but what I need is one PDF for each variable, and, if possible, a code which can both handle numeric and character-variables.??

mkeintz
PROC Star

Why not apply the proc freq to all vars at once, as in:

proc format;

   value nm  low-high='Valid'  other='Miss';
   value $cm   '  ' = 'Miss'   other='Valid';

run;

proc sql noprint;

  select name into :vars separated by ' '

  from dictionary.columns

  where libname='MYLIB' and memname='HAVE' and upcase(name)^='YEAR';

quit;

proc freq data=mylib.have;

  tables year * (&vars) / nocol nopercent missing;

  format _numeric_ nm.  _character_ $cm.  year 4.;

run;

It prints one page per variable, all in one pdf file.  Or were you trying to print it all in a single table?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
silasskovsbo
Calcite | Level 5

The best would be one PDF pr variable because there are many variables and the researchers often only need some missing statistics on specific variables.

One PDF with one page per variable would also be pretty good though! Smiley Happy

I have no experience with proc sql, but I have tried your code but there appears to be some problems (showed in the log):

- after 'proc freq data=mylib.have;' the log sais: 'ERROR: file mylib.have.data does not exist?

- after 'tables year*(&vars) / nocol nopercent missing; the log sais, referring to (&vars) 'ERROR no data set open to look up variables. WARNING: Apparent symbolic reference VARS not resolved'

But thanks very much for the fast answer and hope you can help me a bit more!

mkeintz
PROC Star

MYLIB and HAVE were meant to be replaced by you with the libname and data set name you want to process.  (In your initial post you had a libname of xxxxx and dataset name of xxxxx).

Now, if you want one pdf file per variable, I suspect you will also have to have one proc freq per variable.

So first, create a macro to do just one variable:

%macro one_pdf(varname);

  proc format;
    .... same as before ....;

  run;

  ods pdf file="...../&varname..pdf"

  proc freq data=xxxxx.xxxxx;

     tables year * &varname / nocol nopercent missing ;

  run;

  ods pdf close;

%mend;

Then make a sequence of sas statements, with one line per variable, to call the above macro, like:

%one_pdf(var1);

%one_pdf(var2);

I'd suggest using sashelp.vcolumn (analogous to dictionary.columns, but accessible in a DATA step).  It has one record for each variable in each data set.  So in your case you could get the variables of interest in xxxxx.xxxxx and use the "call execute" statement to tell SAS to take the text argument and stack it in the SAS script right after the current data step finishes, as in:

data _null_;

  set sashelp.vcolumn;

  where libname='XXXXX' and memname='XXXXX' and upcase(name) ^= 'YEAR';

  execute_text =   cats('%one_pdf',name,');');

  call execute(trim(execute_text));

run;

This should result in one macro call per variable, executed in sequence after the above data step.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
silasskovsbo
Calcite | Level 5

Thank you very much mkeintz.

I have now have succes in making the missing reports in one PDF which was also very helpfull to know that possibility.

The macro you have made to me does also function, but I do not fully understand the last part of your instruction. When I need to run the macro do I need to write

%one_pdf(var1);

for all the variables one by one or is the last a way to get rid of this? It was very great to find the data set sashelp.vcolumn btw.

When I copy the last part of the program

data _null_ ....

...

run;

, I get this error:

ERROR: File DB.VAREK.DATA does not exist.

ERROR: FIle DB.VARES.DATA does not exist.

??? and again, thx!! :smileygrin:

Ksharp
Super User

What does your sample data look like ? and your report ?

data have;
do year=2001 to 2006;
 do month=1 to 12;
  stock='P'; amount=12;
  if ranuni(-1) lt 0.2 then call missing(stock);
   else if ranuni(-1) lt 0.4 then call missing(amount);
    else if ranuni(-1) lt 0.5 then call missing(month);  
  output;
end;
end;
run;

proc sql noprint;
select cats('nmiss(',name,') as nmiss_',name) into : list separated by ','
 from dictionary.columns
  where libname='WORK' and memname='HAVE' and name ne 'year';

create table want as
select year,&list from have group by year ;
quit;

Ksharp

silasskovsbo
Calcite | Level 5

Hey Ksharp. Thank you for your attention.

My sample is a dataset with both categoric and numeric variables. For some variables most is missing and for other there is no missing or a bit.

My report just is a proc freq report.

I have tried to see your code to understand it.

Creating your sample was without problems, it is similar to mine data set. But when I do the last part there appears a warning and error as this :

WARNING: Apparent symbolic reference LIST not resolved.

ERROR 22-322: Syntax error .....

It appears by the '&list'

?? Smiley Happy

Linlin
Lapis Lazuli | Level 10

Hi,

The code below created macro variable &list:

proc sql noprint;

select cats('nmiss(',name,') as nmiss_',name) into : list separated by ','

from dictionary.columns

  where libname='WORK' and memname='HAVE' and name ne 'year';

Please make sure your libname and memname are using upcase. it may also help to change "name ne 'year' " to "upcase(name) ne 'YEAR' "

Ksharp
Super User

There is no error for me . Did you try the recommendation from Linlin?

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1158 views
  • 4 likes
  • 5 in conversation