Solved: Number of observations and percent in a table

Garpe · Posted 06-23-2020 06:11 AM

Hello,

I would like to build a table which would display the number of people in my table (ID), the number of people who started the race (START), and those who finished it (END) by country (COUNTRY) and by month (MONTH). A second column would give the pourcent of people who started the race compared to the total number of people and the pourcent of people who ended the race compared to the total number of people.

The expected results are on attachment.

Here are some data:

DATA test;
input id start end country $ time $;
datalines;
1 3 6 FR FEB
2 5 8 US FEB
3 3 . FR FEB
4 . . FR JAN
5 . . US JAN
RUN;

(the number in START and END are the departure and arrival times)

I tried with PROC REPORT and PROC FREQ but that gives me the numbers by modalities and not the total number. With the PROC MEANS, I can't calculate a percent. Any ideas ?

Many thanks in advance,

Garpe

EDIT :

RichardDeVen · Posted 06-25-2020 10:11 AM

Consider sample data with a little more JAN2020, AND an extra variable HALF, between START and END. These extras will demonstrate some of the problems you will encounter.

A common first try would be TABULATE, however the percentages you want are from n per variable (and n is not representing mutually exclusive conditions), so the COLPCTN concept won't apply -- neither data as is or data in transpose.

First look at SUMMARY counts and examine how the percentages could be calculated from that.

DATA have;
input ID START HALF END country $ time monyy7.;
format id start half end 4.;
attrib time format=monyy7. label='Month';
datalines;
1 3 5 6 FR FEB2020
2 5 7 8 US FEB2020
3 3 . . FR FEB2020
4 . . . FR JAN2020
5 . . . US JAN2020
6 4 6 . FR JAN2020
;

proc summary noprint data=have;
   class time country;
   var start half end id;
   output out=counts(index=(order=(time country))) n=/autoname;
run;

Summary output

Arrows showing numerators and common denominator that would be used to compute each original variables fraction (aka percentage) within SUMMARY row. I.e. START_N / ID_N is a fractional value [0..1] that can be displayed with format PERCENTn.d

For your desired output PRINT or REPORT procedures would require two groups of the summary variables (n and computed %) to be pivoted in parallel row-wise.

Transforming DATA step

The following DATA step (in spoiler) computes values and pivots rows. The column pivoted into, such as PERCENT, is also formatted here. An additional variable , N_PCT, is calculated as "nnn ( ##.#% )" and is for use in a slightly alternate report structure.

Spoiler

data counts2;
  set 
    counts(where=(_type_=3))
    counts(where=(_type_=2))  /* row wise want time totals after time>country totals, this is setup for a later ORDER=DATA */
  ;
  array counts   ID_N   START_N   HALF_N   END_N;
  array percents ID_PCT START_PCT HALF_PCT END_PCT;
  do _n_ = 1 to dim(percents);
    percents(_n_) = counts(_n_) / id_n;
  end;

  call missing(ID_PCT);

  if _type_ = 2 then country = 'Total';

  length _NAME_ $32;

  * loop for array based transpose (i.e. pivot) of summary data and computed values;
  do _n_ = 1 to dim(percents);
    _NAME_ = scan(vname(percents(_n_)),1,'_');
    COUNT   = counts  (_n_);
    PERCENT = percents(_n_);

    ************************************** ;
    * compute values "<n> ( <pct> )" for also reporting in a single columns under country;
    ************************************** ;
    length _NAME2_ $32 N_PCT $25;
    if _NAME_ ne 'ID' 
      then _NAME2_ = _NAME_; 
      else _NAME2_ = 'ID (n)'; 

    if not missing(PERCENT) 
      then N_PCT = vvalue(COUNT) || ' ( ' || vvalue(PERCENT) || ' )';
      else N_PCT = vvalue(COUNT);
    **************************************;

    output;
  end;

  keep time country _name_ _name2_ COUNT PERCENT N_PCT;

  format COUNT comma9.;
  format PERCENT id_pct start_pct half_pct end_pct percent8.1;
run;

data counts2; set counts(where=(_type_=3)) counts(where=(_type_=2)) /* row wise want time totals after time>country totals, this is setup for a later ORDER=DATA */ ; array counts ID_N START_N HALF_N END_N; array percents ID_PCT START_PCT HALF_PCT END_PCT; do _n_ = 1 to dim(percents); percents(_n_) = counts(_n_) / id_n; end; call missing(ID_PCT); if _type_ = 2 then country = 'Total'; length _NAME_ $32; * loop for array based transpose (i.e. pivot) of summary data and computed values; do _n_ = 1 to dim(percents); _NAME_ = scan(vname(percents(_n_)),1,'_'); COUNT = counts (_n_); PERCENT = percents(_n_); ************************************** ; * compute values "<n> ( <pct> )" for also reporting in a single columns under country; ************************************** ; length _NAME2_ $32 N_PCT $25; if _NAME_ ne 'ID' then _NAME2_ = _NAME_; else _NAME2_ = 'ID (n)'; if not missing(PERCENT) then N_PCT = vvalue(COUNT) || ' ( ' || vvalue(PERCENT) || ' )'; else N_PCT = vvalue(COUNT); **************************************; output; end; keep time country _name_ _name2_ COUNT PERCENT N_PCT; format COUNT comma9.; format PERCENT id_pct start_pct half_pct end_pct percent8.1; run;

Summary data pivoted with additional computed values

The data is now in a categorical layout. The time, country and original variable names can be used in TABULATE (CLASS) or REPORT (stacked /ACROSS or /GROUP).

REPORT

Two columns under each country, one for N, one for %. Pretty standard reporting. The header stacking (aka column dimension hierarchy) is specified with time, country, (COUNT PERCENT)

options missing = ' ';

proc report data=counts2;
  title 'Counts and computed %''s REPORT, time&country across';
  columns _name_ time, country, (COUNT PERCENT);
  define time    / ' ' across order=internal;
  define country / ' ' across order=data style=[cellwidth=5em textalign=center] ;
  define _name_  / ' ' group order=data;
  define COUNT   / 'n' format=comma9.;
  define PERCENT / '%' ;
run;

Output

REPORT #2

One column under each country, nnn ( ##.#% ). This report code is more complicated and uses some tricks.

dummy column Z / noprint
custom row format for ID row

/* Trick: 
 * dummy column, Z, needed to display character variables under across variables
 * N_PCT is precomputed character value
 */

proc report data=counts2;
  title 'Counts and computed %''s REPORT, time&country across';
  columns _name2_ time,country,(N_PCT) Z;
  define time    / ' ' across order=internal;
  define country / ' ' across order=data style=[cellwidth=4em textalign=center] ;
  define _name2_ / ' ' group order=data;
  define N_PCT   / 'n (%)' ;
  define Z / computed noprint;   /* trick */
  compute _name2_;
    if _name2_ =: 'ID' then 
      call define (_ROW_, 'STYLE', 'style=[textalign=center]');
  endcomp;
run;

Output

View solution in original post

PaigeMiller · Posted 06-23-2020 09:45 AM

I tried with PROC REPORT and PROC FREQ but that gives me the numbers by modalities and not the total number. With the PROC MEANS, I can't calculate a percent. Any ideas ?

Show us the code you used and the output. Paste the code (as text) into the box that appears when you click on the running man icon.

The expected results are on attachment.

Most of us will not download or open Microsoft Office documents as they can be security threats.

--
Paige Miller

Garpe · Posted 06-23-2020 10:38 AM

Thanks for your advices. Here is a photo of the expected table:

With PROC MEANS:

PROC SORT data = test;
by COUNTRY TIME;
RUN; 

PROC MEANS data = test n;
  var ID START END;
  by COUNTRY TIME;
RUN;

With this code, the results are a table for each combination of month and country, and no percent...

The other solutions that I have considered (with a "simple" proc freq, or a proc report) display the numbers for each modality. That's why I'm confused with the strategy to adopt.

Garpe

RichardDeVen · Posted 06-25-2020 10:11 AM

Consider sample data with a little more JAN2020, AND an extra variable HALF, between START and END. These extras will demonstrate some of the problems you will encounter.

A common first try would be TABULATE, however the percentages you want are from n per variable (and n is not representing mutually exclusive conditions), so the COLPCTN concept won't apply -- neither data as is or data in transpose.

First look at SUMMARY counts and examine how the percentages could be calculated from that.

DATA have;
input ID START HALF END country $ time monyy7.;
format id start half end 4.;
attrib time format=monyy7. label='Month';
datalines;
1 3 5 6 FR FEB2020
2 5 7 8 US FEB2020
3 3 . . FR FEB2020
4 . . . FR JAN2020
5 . . . US JAN2020
6 4 6 . FR JAN2020
;

proc summary noprint data=have;
   class time country;
   var start half end id;
   output out=counts(index=(order=(time country))) n=/autoname;
run;

Summary output

Arrows showing numerators and common denominator that would be used to compute each original variables fraction (aka percentage) within SUMMARY row. I.e. START_N / ID_N is a fractional value [0..1] that can be displayed with format PERCENTn.d

For your desired output PRINT or REPORT procedures would require two groups of the summary variables (n and computed %) to be pivoted in parallel row-wise.

Transforming DATA step

The following DATA step (in spoiler) computes values and pivots rows. The column pivoted into, such as PERCENT, is also formatted here. An additional variable , N_PCT, is calculated as "nnn ( ##.#% )" and is for use in a slightly alternate report structure.

Spoiler

data counts2;
  set 
    counts(where=(_type_=3))
    counts(where=(_type_=2))  /* row wise want time totals after time>country totals, this is setup for a later ORDER=DATA */
  ;
  array counts   ID_N   START_N   HALF_N   END_N;
  array percents ID_PCT START_PCT HALF_PCT END_PCT;
  do _n_ = 1 to dim(percents);
    percents(_n_) = counts(_n_) / id_n;
  end;

  call missing(ID_PCT);

  if _type_ = 2 then country = 'Total';

  length _NAME_ $32;

  * loop for array based transpose (i.e. pivot) of summary data and computed values;
  do _n_ = 1 to dim(percents);
    _NAME_ = scan(vname(percents(_n_)),1,'_');
    COUNT   = counts  (_n_);
    PERCENT = percents(_n_);

    ************************************** ;
    * compute values "<n> ( <pct> )" for also reporting in a single columns under country;
    ************************************** ;
    length _NAME2_ $32 N_PCT $25;
    if _NAME_ ne 'ID' 
      then _NAME2_ = _NAME_; 
      else _NAME2_ = 'ID (n)'; 

    if not missing(PERCENT) 
      then N_PCT = vvalue(COUNT) || ' ( ' || vvalue(PERCENT) || ' )';
      else N_PCT = vvalue(COUNT);
    **************************************;

    output;
  end;

  keep time country _name_ _name2_ COUNT PERCENT N_PCT;

  format COUNT comma9.;
  format PERCENT id_pct start_pct half_pct end_pct percent8.1;
run;

data counts2; set counts(where=(_type_=3)) counts(where=(_type_=2)) /* row wise want time totals after time>country totals, this is setup for a later ORDER=DATA */ ; array counts ID_N START_N HALF_N END_N; array percents ID_PCT START_PCT HALF_PCT END_PCT; do _n_ = 1 to dim(percents); percents(_n_) = counts(_n_) / id_n; end; call missing(ID_PCT); if _type_ = 2 then country = 'Total'; length _NAME_ $32; * loop for array based transpose (i.e. pivot) of summary data and computed values; do _n_ = 1 to dim(percents); _NAME_ = scan(vname(percents(_n_)),1,'_'); COUNT = counts (_n_); PERCENT = percents(_n_); ************************************** ; * compute values "<n> ( <pct> )" for also reporting in a single columns under country; ************************************** ; length _NAME2_ $32 N_PCT $25; if _NAME_ ne 'ID' then _NAME2_ = _NAME_; else _NAME2_ = 'ID (n)'; if not missing(PERCENT) then N_PCT = vvalue(COUNT) || ' ( ' || vvalue(PERCENT) || ' )'; else N_PCT = vvalue(COUNT); **************************************; output; end; keep time country _name_ _name2_ COUNT PERCENT N_PCT; format COUNT comma9.; format PERCENT id_pct start_pct half_pct end_pct percent8.1; run;

Summary data pivoted with additional computed values

The data is now in a categorical layout. The time, country and original variable names can be used in TABULATE (CLASS) or REPORT (stacked /ACROSS or /GROUP).

REPORT

Two columns under each country, one for N, one for %. Pretty standard reporting. The header stacking (aka column dimension hierarchy) is specified with time, country, (COUNT PERCENT)

options missing = ' ';

proc report data=counts2;
  title 'Counts and computed %''s REPORT, time&country across';
  columns _name_ time, country, (COUNT PERCENT);
  define time    / ' ' across order=internal;
  define country / ' ' across order=data style=[cellwidth=5em textalign=center] ;
  define _name_  / ' ' group order=data;
  define COUNT   / 'n' format=comma9.;
  define PERCENT / '%' ;
run;

Output

REPORT #2

One column under each country, nnn ( ##.#% ). This report code is more complicated and uses some tricks.

dummy column Z / noprint
custom row format for ID row

/* Trick: 
 * dummy column, Z, needed to display character variables under across variables
 * N_PCT is precomputed character value
 */

proc report data=counts2;
  title 'Counts and computed %''s REPORT, time&country across';
  columns _name2_ time,country,(N_PCT) Z;
  define time    / ' ' across order=internal;
  define country / ' ' across order=data style=[cellwidth=4em textalign=center] ;
  define _name2_ / ' ' group order=data;
  define N_PCT   / 'n (%)' ;
  define Z / computed noprint;   /* trick */
  compute _name2_;
    if _name2_ =: 'ID' then 
      call define (_ROW_, 'STYLE', 'style=[textalign=center]');
  endcomp;
run;

Output

Garpe · Posted 07-27-2020 05:56 AM

Thank you so much for your explanations (and sorry for the delay in responding). I will try this solution with my full data but it seems to match perfectly.

Garpe

Number of observations and percent in a table

Re: Number of observations and percent in a table

Re: Number of observations and percent in a table

Re: Number of observations and percent in a table

Re: Number of observations and percent in a table

Re: Number of observations and percent in a table

Catch up on SAS Innovate 2026

Number of observations and percent in a table

Re: Number of observations and percent in a table

Re: Number of observations and percent in a table

Re: Number of observations and percent in a table

Re: Number of observations and percent in a table

Re: Number of observations and percent in a table

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away