Hey all!
I am searching for missing data in very big data sets (200 mio observations) and many variables (around 200).
I have to make a missing report for each variable telling the amount of missing observations for each year.
I can do it manual, - making a PDF for each variable - by making proc freq with format (with the code below) for each variable, but I am searching for a program that can make a missing report for each variable by year, because doing the programming for each variable below is very tedious, working with many data sets.
----------------
code for the manual program :
data xxxxx1;
set xxxxx.xxxxx;
format _numeric_ nm.
_character_ $ch.;
format year;
run;
ods pdf file=".....";
proc freq data= xxxxx1;
tables year*VARIABLE / nocol nopercent missing;
run;
osd pdf close;
---------------
I hope somebody have good ideas and help! Thanks. / Best regards Silas, Denmark, Aarhus University
For the numerics this is easy.
Create your YEAR variable as string and on each record as in the following code it would be both a class and analysis variable and proc tabulate doesn't like that.
proc tabulate data= xxxxxx1;
class year;
var _numeric_;
table year,
_numeric_,
nmiss;
run;
This will create an output table for each year with the numeric variables as the rows and a count of how many time each variable is missing.
An option for the character would be to assign the name or label of the character variable as the label to newly created numeric variables that hold the length of the character variables but set to missing if the length is 0 and use the above approach.
Thanks for the fast reply.
It was also helpful with a way to get a assembled table, but what I need is one PDF for each variable, and, if possible, a code which can both handle numeric and character-variables.??
Why not apply the proc freq to all vars at once, as in:
proc format;
value nm low-high='Valid' other='Miss';
value $cm ' ' = 'Miss' other='Valid';
run;
proc sql noprint;
select name into :vars separated by ' '
from dictionary.columns
where libname='MYLIB' and memname='HAVE' and upcase(name)^='YEAR';
quit;
proc freq data=mylib.have;
tables year * (&vars) / nocol nopercent missing;
format _numeric_ nm. _character_ $cm. year 4.;
run;
It prints one page per variable, all in one pdf file. Or were you trying to print it all in a single table?
The best would be one PDF pr variable because there are many variables and the researchers often only need some missing statistics on specific variables.
One PDF with one page per variable would also be pretty good though!
I have no experience with proc sql, but I have tried your code but there appears to be some problems (showed in the log):
- after 'proc freq data=mylib.have;' the log sais: 'ERROR: file mylib.have.data does not exist?
- after 'tables year*(&vars) / nocol nopercent missing; the log sais, referring to (&vars) 'ERROR no data set open to look up variables. WARNING: Apparent symbolic reference VARS not resolved'
But thanks very much for the fast answer and hope you can help me a bit more!
MYLIB and HAVE were meant to be replaced by you with the libname and data set name you want to process. (In your initial post you had a libname of xxxxx and dataset name of xxxxx).
Now, if you want one pdf file per variable, I suspect you will also have to have one proc freq per variable.
So first, create a macro to do just one variable:
%macro one_pdf(varname);
proc format;
.... same as before ....;
run;
ods pdf file="...../&varname..pdf"
proc freq data=xxxxx.xxxxx;
tables year * &varname / nocol nopercent missing ;
run;
ods pdf close;
%mend;
Then make a sequence of sas statements, with one line per variable, to call the above macro, like:
%one_pdf(var1);
%one_pdf(var2);
I'd suggest using sashelp.vcolumn (analogous to dictionary.columns, but accessible in a DATA step). It has one record for each variable in each data set. So in your case you could get the variables of interest in xxxxx.xxxxx and use the "call execute" statement to tell SAS to take the text argument and stack it in the SAS script right after the current data step finishes, as in:
data _null_;
set sashelp.vcolumn;
where libname='XXXXX' and memname='XXXXX' and upcase(name) ^= 'YEAR';
execute_text = cats('%one_pdf',name,');');
call execute(trim(execute_text));
run;
This should result in one macro call per variable, executed in sequence after the above data step.
Thank you very much mkeintz.
I have now have succes in making the missing reports in one PDF which was also very helpfull to know that possibility.
The macro you have made to me does also function, but I do not fully understand the last part of your instruction. When I need to run the macro do I need to write
%one_pdf(var1);
for all the variables one by one or is the last a way to get rid of this? It was very great to find the data set sashelp.vcolumn btw.
When I copy the last part of the program
data _null_ ....
...
run;
, I get this error:
ERROR: File DB.VAREK.DATA does not exist.
ERROR: FIle DB.VARES.DATA does not exist.
??? and again, thx!! :smileygrin:
What does your sample data look like ? and your report ?
data have; do year=2001 to 2006; do month=1 to 12; stock='P'; amount=12; if ranuni(-1) lt 0.2 then call missing(stock); else if ranuni(-1) lt 0.4 then call missing(amount); else if ranuni(-1) lt 0.5 then call missing(month); output; end; end; run; proc sql noprint; select cats('nmiss(',name,') as nmiss_',name) into : list separated by ',' from dictionary.columns where libname='WORK' and memname='HAVE' and name ne 'year'; create table want as select year,&list from have group by year ; quit;
Ksharp
Hey Ksharp. Thank you for your attention.
My sample is a dataset with both categoric and numeric variables. For some variables most is missing and for other there is no missing or a bit.
My report just is a proc freq report.
I have tried to see your code to understand it.
Creating your sample was without problems, it is similar to mine data set. But when I do the last part there appears a warning and error as this :
WARNING: Apparent symbolic reference LIST not resolved.
ERROR 22-322: Syntax error .....
It appears by the '&list'
??
Hi,
The code below created macro variable &list:
proc sql noprint;
select cats('nmiss(',name,') as nmiss_',name) into : list separated by ','
from dictionary.columns
where libname='WORK' and memname='HAVE' and name ne 'year';
Please make sure your libname and memname are using upcase. it may also help to change "name ne 'year' " to "upcase(name) ne 'YEAR' "
There is no error for me . Did you try the recommendation from Linlin?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.