Hi,
I'm trying to create a missing table in which one of variables have multiple subcategories. I used this code but it only gives me the overall missing count and I want to see how much is missing in each section including the count and percent . An example of the table I am trying to create is below. Thanks!
proc freq data=work.sleep;
tables educ*sleep /missing;
run;
| sleep hrs | ||
| 5 hrs | 6 hrs | |
| high school | count (%missing) | count (%missing) |
| college | count(%missing) | count (%missing) |
| graduate | count (%missing) | count (%missing) |
You can fully control the contents of a cell if you create a format.
For example:
data CARS;
set SASHELP.CARS;
if mod(_N_,5) = 0 then LENGTH=.;
keep ORIGIN TYPE LENGTH;
run;
proc sql;
create table CNT as
select a.ORIGIN
, b.TYPE as CARTYPE
, sum(LENGTH is missing) as M
, count(*) as N
, 'miss' as FMTNAME
, 'N' as TYPE
, monotonic() as OBS
, case when calculated M = calculated N
then 'No records'
else catx(' ', calculated M, 'missing out of' , calculated N, '=', put(calculated M /calculated N, percent.) )
end as LABEL
from (select unique ORIGIN from CARS) a
full outer join
(select unique TYPE from CARS) b
on 1
left join
CARS c
on a.ORIGIN = c.ORIGIN
and b.TYPE = c.TYPE
group by 1, 2;
quit;
proc format cntlin=CNT(rename=(OBS=START));
run;
proc report data=CNT;
columns CARTYPE ORIGIN, OBS ;
define CARTYPE / group ;
define ORIGIN / across;
define OBS / sum format=miss. ' ';
run;
Note:
case when calculated M = calculated N
should be
case when missing(c.ORIGIN) | missing(c.TYPE)
but this ruins the group by operation for some reason.
If someone knows why...
If someone knows why...
I answer my own question:
Testing c.ORIGIN and c.TYPE assesses observation-level values while testing calculated M and calculated N assesses summarised values.
Adding a presence flag to see whether a record is prsent computes the counts properly:
data HAVE;
retain FLG 1;
set SASHELP.CARS;
if mod(_N_,5) = 0 then LENGTH=.;
if ORIGIN='Asia' and TYPE='SUV' then LENGTH=.;
run;
proc sql;
create table COUNTS as
select a.ORIGIN
, b.TYPE as CARTYPE
, sum(LENGTH is missing) as M
, count(*) as N
, 'miss' as FMTNAME
, 'N' as TYPE
, monotonic() as OBS
, case when sum(FLG) < 1
then 'No records'
else catx(' ', calculated M, 'missing out of' , calculated N, '=', put(calculated M /calculated N, percent.) )
end as LABEL
from (select unique ORIGIN from HAVE) a
full outer join
(select unique TYPE from HAVE) b
on 1
left join
HAVE c
on a.ORIGIN = c.ORIGIN
and b.TYPE = c.TYPE
group by 1, 2;
quit;
proc format cntlin=COUNTS(rename=(OBS=START));
run;
proc report data=COUNTS;
columns CARTYPE ORIGIN, OBS;
define CARTYPE / group ;
define ORIGIN / across;
define OBS / sum format=miss. ' ';
run;
@coolbeans : If you absolutely have to have text in each cell like "count (%missing)", and if you absolutely have to have redundant titles over the columns, then your programming will be more complicated and hard, as shown by @ChrisNZ .
If instead you can produce a table that looks something like this (which contains the same information and in my opinion is easier to read)
then this is a lot easier to produce via PROC REPORT or PROC TABULATE without the complicated SQL. Please provide the information requested by @Ksharp so that we can provide example code.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.