02-08-2017 02:21 PM

Hi everyone,

I have a very large data set which includes many variables. I want to learn Missing percentage for every variables based on Column. The first thing coming to my mind is using the PROC TABULATE. I try to use the following code but this did not give my desired output. I prepared a sample code and desired image as below. Can somebody help me, please?

Additively, I have hundreds variables in my data set. Is it possible to write macro code to get whole variables's Missing percentage more easier?

```
Data Have;
Length Variable1 8 Variable2 8 Variable3 8 Variable4 8 Variable5 8 Variable6 8 Variable7 8 Variable8 8;
Infile Datalines Missover;
Input Variable1 Variable2 Variable3 Variable4 Variable5 Variable6 Variable7 Variable8;
Datalines;
1 2 3 4 5 6 7 8
. 2 3 4 5 6 7 8
. . 3 4 5 6 7 8
. . . 4 5 6 7 8
. . . . 5 6 7 8
. . . . . 6 7 8
. . . . . . 7 8
. . . . . . . 8
. . . . . . . .
;
Run;
PROC TABULATE DATA=HAVE;
VAR Variable1 Variable2 Variable3 Variable4 Variable5 Variable6 Variable7 Variable8;
TABLE /* Column Dimension */
Variable1*(N NMiss)
Variable2*(N NMiss)
Variable3*(N NMiss)
Variable4*(N NMiss)
Variable5*(N NMiss)
Variable6*(N NMiss)
Variable7*(N NMiss)
Variable8*(N NMiss)
;
RUN;
```

Thanks

Posted in reply to ertr

02-09-2017 05:50 AM

Any idea about this subject?

Posted in reply to ertr

02-09-2017 12:23 PM

Proc tabulate will not use the result of one statistic to calculate another. Either a couple passes through the data, proc summary to get the n and nmiss and then a data step for the percent or possibly proc report which will allow you to use results of statistics to calculate using the column results.

Posted in reply to ballardw

02-09-2017 12:47 PM

Or, you could do this in PROC REPORT. The code would be a bit verbose, but very do-able. It really depends on what you already know and whether you have the time to learn PROC REPORT if you don't already know it.

cynthia

cynthia