02-08-2017 02:21 PM
I have a very large data set which includes many variables. I want to learn Missing percentage for every variables based on Column. The first thing coming to my mind is using the PROC TABULATE. I try to use the following code but this did not give my desired output. I prepared a sample code and desired image as below. Can somebody help me, please?
Additively, I have hundreds variables in my data set. Is it possible to write macro code to get whole variables's Missing percentage more easier?
Data Have; Length Variable1 8 Variable2 8 Variable3 8 Variable4 8 Variable5 8 Variable6 8 Variable7 8 Variable8 8; Infile Datalines Missover; Input Variable1 Variable2 Variable3 Variable4 Variable5 Variable6 Variable7 Variable8; Datalines; 1 2 3 4 5 6 7 8 . 2 3 4 5 6 7 8 . . 3 4 5 6 7 8 . . . 4 5 6 7 8 . . . . 5 6 7 8 . . . . . 6 7 8 . . . . . . 7 8 . . . . . . . 8 . . . . . . . . ; Run; PROC TABULATE DATA=HAVE; VAR Variable1 Variable2 Variable3 Variable4 Variable5 Variable6 Variable7 Variable8; TABLE /* Column Dimension */ Variable1*(N NMiss) Variable2*(N NMiss) Variable3*(N NMiss) Variable4*(N NMiss) Variable5*(N NMiss) Variable6*(N NMiss) Variable7*(N NMiss) Variable8*(N NMiss) ; RUN;
02-09-2017 12:23 PM
Proc tabulate will not use the result of one statistic to calculate another. Either a couple passes through the data, proc summary to get the n and nmiss and then a data step for the percent or possibly proc report which will allow you to use results of statistics to calculate using the column results.
02-09-2017 12:47 PM