Hi,
I'm trying to create a missing table in which one of variables have multiple subcategories. I used this code but it only gives me the overall missing count and I want to see how much is missing in each section including the count and percent . An example of the table I am trying to create is below. Thanks!
proc freq data=work.sleep;
tables educ*sleep /missing;
run;
sleep hrs | ||
5 hrs | 6 hrs | |
high school | count (%missing) | count (%missing) |
college | count(%missing) | count (%missing) |
graduate | count (%missing) | count (%missing) |
You can fully control the contents of a cell if you create a format.
For example:
data CARS;
set SASHELP.CARS;
if mod(_N_,5) = 0 then LENGTH=.;
keep ORIGIN TYPE LENGTH;
run;
proc sql;
create table CNT as
select a.ORIGIN
, b.TYPE as CARTYPE
, sum(LENGTH is missing) as M
, count(*) as N
, 'miss' as FMTNAME
, 'N' as TYPE
, monotonic() as OBS
, case when calculated M = calculated N
then 'No records'
else catx(' ', calculated M, 'missing out of' , calculated N, '=', put(calculated M /calculated N, percent.) )
end as LABEL
from (select unique ORIGIN from CARS) a
full outer join
(select unique TYPE from CARS) b
on 1
left join
CARS c
on a.ORIGIN = c.ORIGIN
and b.TYPE = c.TYPE
group by 1, 2;
quit;
proc format cntlin=CNT(rename=(OBS=START));
run;
proc report data=CNT;
columns CARTYPE ORIGIN, OBS ;
define CARTYPE / group ;
define ORIGIN / across;
define OBS / sum format=miss. ' ';
run;
Note:
case when calculated M = calculated N
should be
case when missing(c.ORIGIN) | missing(c.TYPE)
but this ruins the group by
operation for some reason.
If someone knows why...
If someone knows why...
I answer my own question:
Testing c.ORIGIN
and c.TYPE
assesses observation-level values while testing calculated M
and calculated N
assesses summarised values.
Adding a presence flag to see whether a record is prsent computes the counts properly:
data HAVE;
retain FLG 1;
set SASHELP.CARS;
if mod(_N_,5) = 0 then LENGTH=.;
if ORIGIN='Asia' and TYPE='SUV' then LENGTH=.;
run;
proc sql;
create table COUNTS as
select a.ORIGIN
, b.TYPE as CARTYPE
, sum(LENGTH is missing) as M
, count(*) as N
, 'miss' as FMTNAME
, 'N' as TYPE
, monotonic() as OBS
, case when sum(FLG) < 1
then 'No records'
else catx(' ', calculated M, 'missing out of' , calculated N, '=', put(calculated M /calculated N, percent.) )
end as LABEL
from (select unique ORIGIN from HAVE) a
full outer join
(select unique TYPE from HAVE) b
on 1
left join
HAVE c
on a.ORIGIN = c.ORIGIN
and b.TYPE = c.TYPE
group by 1, 2;
quit;
proc format cntlin=COUNTS(rename=(OBS=START));
run;
proc report data=COUNTS;
columns CARTYPE ORIGIN, OBS;
define CARTYPE / group ;
define ORIGIN / across;
define OBS / sum format=miss. ' ';
run;
@coolbeans : If you absolutely have to have text in each cell like "count (%missing)", and if you absolutely have to have redundant titles over the columns, then your programming will be more complicated and hard, as shown by @ChrisNZ .
If instead you can produce a table that looks something like this (which contains the same information and in my opinion is easier to read)
then this is a lot easier to produce via PROC REPORT or PROC TABULATE without the complicated SQL. Please provide the information requested by @Ksharp so that we can provide example code.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.