Hi folks,
I'd like to create a table of a descriptive statistics to show endpoints (N), min, 25th, median, 75th percentiles and the NMISS.
However:
1. Sum, Q1, NMISS, Q3 statistics were not created in the output dataset.
2. N is misleading as shown in the table below, even if i wanted to use it in the place of SUM.
I appreciate your time and help.
Thanks in advance.
proc means data=E.SEER_CANCERS_COUNTY NMISS SUM MIN Q1 MEDIAN Q3 MAX MAXDEC=1;
CLASS CANCER_TYPE SEX;
var N_COUNTY;
OUTPUT OUT=E.SUMMARY_ALL_LONG(DROP=_FREQ_ _TYPE_ WHERE=(CANCER_TYPE NE " " AND SEX NE " "));
run;
proc report data=E.SUMMARY_ALL_LONG;
column CANCER_TYPE _STAT_,SEX,N_COUNTY;
define CANCER_TYPE/ ' ' group;
define SEX/ across 'Gender';
define _STAT_/ across ' ';
DEFINE N_COUNTY/ ' ' FORMAT=9.1;
run;
Why not use PROC TABULATE directly instead?
proc tabulate data=sashelp.class;
class sex;
var weight height;
table weight height, (max mean min N STD)*sex;
run;
@Cruise wrote:
Great. Thank you so much. I got it. Proc means worked out and now I'm stumbled with the following Proc Report. Would you please take a look at the image attached and see what I'm doing wrong in the code for the 'proc report' using the changed layout of the output data from proc means?
proc means data=E.SEER_CANCERS_COUNTY NMISS SUM MIN Q1 MEDIAN Q3 MAX MAXDEC=1; CLASS CANCER_TYPE SEX; var N_COUNTY; OUTPUT OUT=E.SUMMARY_ALL_LONG(DROP=_FREQ_ _TYPE_ WHERE=(CANCER_TYPE NE " " AND SEX NE " ")) min=amin q1=aq1 median=amedian q3=aq3 nmiss=amiss sum=asum max=amax; run; proc report data=E.SUMMARY_ALL_LONG; column CANCER_TYPE SEX,(AMIN AQ1 AMEDIAN AQ3 AMISS ASUM AMAX); define CANCER_TYPE/ ' ' group; define SEX/ across 'Gender'; DEFINE ASUM/ 'N CASES' FORMAT=9.1; DEFINE AMIN/ 'MIN' FORMAT=9.1; DEFINE AQ1/ 'MIN' FORMAT=9.1; DEFINE AMEDIAN/ 'MIN' FORMAT=9.1; DEFINE AQ3/ 'MIN' FORMAT=9.1; DEFINE AMISS/ 'MIN' FORMAT=9.1; DEFINE AMAX/ 'MAX' FORMAT=9.1; run;
The options that you place on the PROC MEANS statement (such as NMISS SUM Q1) affect the report printed by PROC MEANS. They have no impact on the contents of the output data set E.SUMMARY_ALL_LONG.
To control the contents of the output data set, you need to add a description of the statistics you would like, as part of the OUTPUT statement. Since you have just one analysis variable, a simple set of additions would be sufficient, such as:
OUTPUT OUT=E.SUMMARY_ALL_LONG(DROP=_FREQ_ _TYPE_ WHERE=(CANCER_TYPE NE " " AND SEX NE " "))
q1=q1 median=median q3=q3;
Whatever statistics you want in the output data set, you must name and assign a variable name to, as part of the OUTPUT statement.
Great. Thank you so much. I got it. Proc means worked out and now I'm stumbled with the following Proc Report. Would you please take a look at the image attached and see what I'm doing wrong in the code for the 'proc report' using the changed layout of the output data from proc means?
proc means data=E.SEER_CANCERS_COUNTY NMISS SUM MIN Q1 MEDIAN Q3 MAX MAXDEC=1;
CLASS CANCER_TYPE SEX;
var N_COUNTY;
OUTPUT OUT=E.SUMMARY_ALL_LONG(DROP=_FREQ_ _TYPE_ WHERE=(CANCER_TYPE NE " " AND SEX NE " "))
min=amin q1=aq1 median=amedian q3=aq3 nmiss=amiss sum=asum max=amax;
run;
proc report data=E.SUMMARY_ALL_LONG;
column CANCER_TYPE SEX,(AMIN AQ1 AMEDIAN AQ3 AMISS ASUM AMAX);
define CANCER_TYPE/ ' ' group;
define SEX/ across 'Gender';
DEFINE ASUM/ 'N CASES' FORMAT=9.1;
DEFINE AMIN/ 'MIN' FORMAT=9.1;
DEFINE AQ1/ 'MIN' FORMAT=9.1;
DEFINE AMEDIAN/ 'MIN' FORMAT=9.1;
DEFINE AQ3/ 'MIN' FORMAT=9.1;
DEFINE AMISS/ 'MIN' FORMAT=9.1;
DEFINE AMAX/ 'MAX' FORMAT=9.1;
run;
Why not use PROC TABULATE directly instead?
proc tabulate data=sashelp.class;
class sex;
var weight height;
table weight height, (max mean min N STD)*sex;
run;
@Cruise wrote:
Great. Thank you so much. I got it. Proc means worked out and now I'm stumbled with the following Proc Report. Would you please take a look at the image attached and see what I'm doing wrong in the code for the 'proc report' using the changed layout of the output data from proc means?
proc means data=E.SEER_CANCERS_COUNTY NMISS SUM MIN Q1 MEDIAN Q3 MAX MAXDEC=1; CLASS CANCER_TYPE SEX; var N_COUNTY; OUTPUT OUT=E.SUMMARY_ALL_LONG(DROP=_FREQ_ _TYPE_ WHERE=(CANCER_TYPE NE " " AND SEX NE " ")) min=amin q1=aq1 median=amedian q3=aq3 nmiss=amiss sum=asum max=amax; run; proc report data=E.SUMMARY_ALL_LONG; column CANCER_TYPE SEX,(AMIN AQ1 AMEDIAN AQ3 AMISS ASUM AMAX); define CANCER_TYPE/ ' ' group; define SEX/ across 'Gender'; DEFINE ASUM/ 'N CASES' FORMAT=9.1; DEFINE AMIN/ 'MIN' FORMAT=9.1; DEFINE AQ1/ 'MIN' FORMAT=9.1; DEFINE AMEDIAN/ 'MIN' FORMAT=9.1; DEFINE AQ3/ 'MIN' FORMAT=9.1; DEFINE AMISS/ 'MIN' FORMAT=9.1; DEFINE AMAX/ 'MAX' FORMAT=9.1; run;
@Reeza , why directly using proc tabulate didn't work for me was that I needed bunch of stats that proc tabulate doesn't produce. However, using it in series with proc means did the job for me. Just in case, if this helps somebody, posted below is what worked.
proc means data=E.SEER_CANCERS_COUNTY NMISS SUM MIN Q1 MEDIAN Q3 MAX MAXDEC=1 NOPRINT;
CLASS CANCER_TYPE SEX;
var N_COUNTY;
OUTPUT OUT=E.SUMMARY_ALL_LONG(DROP=_FREQ_ _TYPE_ WHERE=(CANCER_TYPE NE " " AND SEX NE " "))
min=MIN q1=Q1 median=MEDIAN q3=Q3 nmiss=NOCASE sum=N max=MAX;
run;
proc tabulate data=E.SUMMARY_ALL_LONG;
CLASS CANCER_TYPE SEX;
var MIN Q1 MEDIAN Q3 NOCASE N MAX;
table CANCER_TYPE=' ',(N='N CASE' NOCASE='NO CASE' MIN Q1 MEDIAN Q3 MAX)*F=9.*SUM=' '*SEX='Gender';
FORMAT SEX $SEX.;
run;
Yes it does, you do need to specify it correctly by indicating MISSING/MISS to allow for missing to be included but in general, it can definitely do those calculations as well.
proc tabulate data=sashelp.class;
class sex;
var weight height;
table weight height, (N NMISS Q1 Median Q3)*sex;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.