08-14-2012 09:23 AM
I am searching for missing data in very big data sets (200 mio observations) and many variables (around 200).
I have to make a missing report for each variable telling the amount of missing observations for each year.
I can do it manual, - making a PDF for each variable - by making proc freq with format (with the code below) for each variable, but I am searching for a program that can make a missing report for each variable by year, because doing the programming for each variable below is very tedious, working with many data sets.
code for the manual program :
format _numeric_ nm.
ods pdf file=".....";
proc freq data= xxxxx1;
tables year*VARIABLE / nocol nopercent missing;
osd pdf close;
I hope somebody have good ideas and help! Thanks. / Best regards Silas, Denmark, Aarhus University
08-14-2012 11:05 AM
For the numerics this is easy.
Create your YEAR variable as string and on each record as in the following code it would be both a class and analysis variable and proc tabulate doesn't like that.
proc tabulate data= xxxxxx1;
This will create an output table for each year with the numeric variables as the rows and a count of how many time each variable is missing.
An option for the character would be to assign the name or label of the character variable as the label to newly created numeric variables that hold the length of the character variables but set to missing if the length is 0 and use the above approach.
08-14-2012 12:02 PM
Thanks for the fast reply.
It was also helpful with a way to get a assembled table, but what I need is one PDF for each variable, and, if possible, a code which can both handle numeric and character-variables.??
08-14-2012 11:33 AM
Why not apply the proc freq to all vars at once, as in:
value nm low-high='Valid' other='Miss';
value $cm ' ' = 'Miss' other='Valid';
proc sql noprint;
select name into :vars separated by ' '
where libname='MYLIB' and memname='HAVE' and upcase(name)^='YEAR';
proc freq data=mylib.have;
tables year * (&vars) / nocol nopercent missing;
format _numeric_ nm. _character_ $cm. year 4.;
It prints one page per variable, all in one pdf file. Or were you trying to print it all in a single table?
08-14-2012 11:58 AM
The best would be one PDF pr variable because there are many variables and the researchers often only need some missing statistics on specific variables.
One PDF with one page per variable would also be pretty good though!
I have no experience with proc sql, but I have tried your code but there appears to be some problems (showed in the log):
- after 'proc freq data=mylib.have;' the log sais: 'ERROR: file mylib.have.data does not exist?
- after 'tables year*(&vars) / nocol nopercent missing; the log sais, referring to (&vars) 'ERROR no data set open to look up variables. WARNING: Apparent symbolic reference VARS not resolved'
But thanks very much for the fast answer and hope you can help me a bit more!
08-14-2012 01:45 PM
MYLIB and HAVE were meant to be replaced by you with the libname and data set name you want to process. (In your initial post you had a libname of xxxxx and dataset name of xxxxx).
Now, if you want one pdf file per variable, I suspect you will also have to have one proc freq per variable.
So first, create a macro to do just one variable:
.... same as before ....;
ods pdf file="...../&varname..pdf"
proc freq data=xxxxx.xxxxx;
tables year * &varname / nocol nopercent missing ;
ods pdf close;
Then make a sequence of sas statements, with one line per variable, to call the above macro, like:
I'd suggest using sashelp.vcolumn (analogous to dictionary.columns, but accessible in a DATA step). It has one record for each variable in each data set. So in your case you could get the variables of interest in xxxxx.xxxxx and use the "call execute" statement to tell SAS to take the text argument and stack it in the SAS script right after the current data step finishes, as in:
where libname='XXXXX' and memname='XXXXX' and upcase(name) ^= 'YEAR';
execute_text = cats('%one_pdf',name,');');
This should result in one macro call per variable, executed in sequence after the above data step.
08-22-2012 08:49 AM
Thank you very much mkeintz.
I have now have succes in making the missing reports in one PDF which was also very helpfull to know that possibility.
The macro you have made to me does also function, but I do not fully understand the last part of your instruction. When I need to run the macro do I need to write
for all the variables one by one or is the last a way to get rid of this? It was very great to find the data set sashelp.vcolumn btw.
When I copy the last part of the program
data _null_ ....
, I get this error:
ERROR: File DB.VAREK.DATA does not exist.
ERROR: FIle DB.VARES.DATA does not exist.
??? and again, thx!! :smileygrin:
08-14-2012 10:44 PM
What does your sample data look like ? and your report ?
data have; do year=2001 to 2006; do month=1 to 12; stock='P'; amount=12; if ranuni(-1) lt 0.2 then call missing(stock); else if ranuni(-1) lt 0.4 then call missing(amount); else if ranuni(-1) lt 0.5 then call missing(month); output; end; end; run; proc sql noprint; select cats('nmiss(',name,') as nmiss_',name) into : list separated by ',' from dictionary.columns where libname='WORK' and memname='HAVE' and name ne 'year'; create table want as select year,&list from have group by year ; quit;
08-22-2012 09:09 AM
Hey Ksharp. Thank you for your attention.
My sample is a dataset with both categoric and numeric variables. For some variables most is missing and for other there is no missing or a bit.
My report just is a proc freq report.
I have tried to see your code to understand it.
Creating your sample was without problems, it is similar to mine data set. But when I do the last part there appears a warning and error as this :
WARNING: Apparent symbolic reference LIST not resolved.
ERROR 22-322: Syntax error .....
It appears by the '&list'
08-22-2012 09:59 AM
The code below created macro variable &list:
proc sql noprint;
select cats('nmiss(',name,') as nmiss_',name) into : list separated by ','
where libname='WORK' and memname='HAVE' and name ne 'year';
Please make sure your libname and memname are using upcase. it may also help to change "name ne 'year' " to "upcase(name) ne 'YEAR' "