Hi everyone, I am trying to find an elegant way to collect some metadata for all columns in a data set. What I want to know is, for each and every variable in a data set, how many rows are populated? I would like the result of this query to be output into a single data set which contains one row per variable in the original data, and a column giving the number of non-missing rows for that variable. Here's some example data and output: data demo_data;
infile datalines dsd truncover;
input Make:$13. Model:$40. Type:$8. Origin:$6. DriveTrain:$5. MSRP:DOLLAR8. Invoice:DOLLAR8. EngineSize:32. Cylinders:32. Horsepower:32. MPG_City:32. MPG_Highway:32. Weight:32. Wheelbase:32. Length:32.;
datalines4;
Acura,,SUV,Asia,All,"$36,945","$33,337",3.5,6,,17,23,4451,106,189
Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820","$21,761",2,4,,24,31,2778,101,172
Acura,TSX 4dr,Sedan,Asia,Front,"$26,990","$24,647",2.4,4,,22,29,3230,105,183
Acura,TL 4dr,Sedan,Asia,Front,"$33,195","$30,299",,6,270,20,28,3575,108,186
Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755","$39,014",,6,225,18,24,3880,115,197
;;;; Desired output: name num_not_missing Cylinders 5 DriveTrain 5 EngineSize 3 Horsepower 2 Invoice 5 Length 5 MPG_City 5 MPG_Highway 5 MSRP 5 Make 5 Model 4 Origin 5 Type 5 Weight 5 Wheelbase 5 ... where I have inserted missing values in the demo dataset and the resulting metadata shows that some columns contain missing values. I do have a solution to this problem already, but I would like to know whether there is a more elegant or SAS-like way to get the same results (if only to satisfy my own curiosity!). My current (inelegant) solution is as follows: proc contents data=demo_data out=cont_demo_data(keep=name varnum type length) noprint; run;
proc sql noprint;
select catx(" ", "count(", name, ") as ", name)
into :list separated by ', '
from cont_demo_data;
quit;
proc sql;
create table counts_demo_data as
select
&list.
from demo_data;
quit;
proc transpose
data = counts_demo_data
out = counts_demo_data_t(rename=(_NAME_=name COL1=num_not_missing));
run; So as you can see, it's somewhat inelegant but it does at least get to the correct answer. Other approaches I've tried that don't give me the result I need: Using proc means as this will only work for numeric variables proc freq with _NUMERIC_ or _CHAR_ in the tables statement - this gives a separate table for each variable, not all in the same table Thanks! Adam p.s. I'm using base SAS 9.2
... View more