BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Garpe
Obsidian | Level 7

Hello, 

I would like to build a table which would display the number of people in my table (ID), the number of people who started the race (START), and those who finished it (END) by country (COUNTRY) and by month (MONTH). A second column would give the pourcent of people who started the race compared to the total number of people and  the pourcent of people who ended the race compared to the total number of people.

 

The expected results are on attachment.

 

Here are some data: 

 
DATA test;
input id start end country $ time $;
datalines;
1 3 6 FR FEB
2 5 8 US FEB
3 3 . FR FEB
4 . . FR JAN
5 . . US JAN
RUN;
(the number in START and END are the departure and arrival times)
 

I tried with PROC REPORT and PROC FREQ but that gives me the numbers by modalities and not the total number. With the PROC MEANS, I can't calculate a percent. Any ideas ?

 

Many thanks in advance, 

Garpe

 

EDIT : 

Capture.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
RichardDeVen
Barite | Level 11

Consider sample data with a little more JAN2020, AND an extra variable HALF, between START and END.  These extras will demonstrate some of the problems you will encounter. 

 

A common first try would be TABULATE, however the percentages you want are from n per variable (and n is not representing mutually exclusive conditions), so the COLPCTN concept won't apply -- neither data as is or data in transpose.

 

First look at SUMMARY counts and examine how the percentages could be calculated from that.

DATA have;
input ID START HALF END country $ time monyy7.;
format id start half end 4.;
attrib time format=monyy7. label='Month';
datalines;
1 3 5 6 FR FEB2020
2 5 7 8 US FEB2020
3 3 . . FR FEB2020
4 . . . FR JAN2020
5 . . . US JAN2020
6 4 6 . FR JAN2020
;

proc summary noprint data=have;
   class time country;
   var start half end id;
   output out=counts(index=(order=(time country))) n=/autoname;
run;

Summary output

Arrows showing numerators and common denominator that would be used to compute each original variables fraction (aka percentage) within SUMMARY row.  I.e. START_N / ID_N is a fractional value [0..1] that can be displayed with format PERCENTn.d 

summary percentages.png

For your desired output PRINT or REPORT procedures would require two groups of the summary variables (n and computed %) to be pivoted in parallel row-wise.

 

Transforming DATA step

The following DATA step (in spoiler) computes values and pivots rows.  The column pivoted into, such as PERCENT, is also formatted here.  An additional variable , N_PCT, is calculated as "nnn ( ##.#% )" and is for use in a slightly alternate report structure.

 

Spoiler
data counts2;
  set 
    counts(where=(_type_=3))
    counts(where=(_type_=2))  /* row wise want time totals after time>country totals, this is setup for a later ORDER=DATA */
  ;
  array counts   ID_N   START_N   HALF_N   END_N;
  array percents ID_PCT START_PCT HALF_PCT END_PCT;
  do _n_ = 1 to dim(percents);
    percents(_n_) = counts(_n_) / id_n;
  end;

  call missing(ID_PCT);

  if _type_ = 2 then country = 'Total';

  length _NAME_ $32;

  * loop for array based transpose (i.e. pivot) of summary data and computed values;
do _n_ = 1 to dim(percents); _NAME_ = scan(vname(percents(_n_)),1,'_'); COUNT = counts (_n_); PERCENT = percents(_n_); ************************************** ;
* compute values "<n> ( <pct> )" for also reporting in a single columns under country; ************************************** ;
length _NAME2_ $32 N_PCT $25; if _NAME_ ne 'ID' then _NAME2_ = _NAME_; else _NAME2_ = 'ID (n)'; if not missing(PERCENT) then N_PCT = vvalue(COUNT) || ' ( ' || vvalue(PERCENT) || ' )'; else N_PCT = vvalue(COUNT);
**************************************; output; end; keep time country _name_ _name2_ COUNT PERCENT N_PCT; format COUNT comma9.; format PERCENT id_pct start_pct half_pct end_pct percent8.1; run;

Summary data pivoted with additional computed values

The data is now in a categorical layout.  The time, country and original variable names can be used in TABULATE (CLASS) or REPORT (stacked /ACROSS or /GROUP).

summary percentages pivoted.png

 

REPORT

Two columns under each country, one for N, one for %.  Pretty standard reporting.  The header stacking (aka column dimension hierarchy) is specified with time, country, (COUNT PERCENT)

 

options missing = ' ';

proc report data=counts2;
  title 'Counts and computed %''s REPORT, time&country across';
  columns _name_ time, country, (COUNT PERCENT);
  define time    / ' ' across order=internal;
  define country / ' ' across order=data style=[cellwidth=5em textalign=center] ;
  define _name_  / ' ' group order=data;
  define COUNT   / 'n' format=comma9.;
  define PERCENT / '%' ;
run;

 

Output

summary percentages pivoted report.png

 

REPORT #2

One column under each country, nnn ( ##.#% ). This report code is more complicated and uses some tricks.

  • dummy column Z / noprint
  • custom row format for ID row
/* Trick: 
 * dummy column, Z, needed to display character variables under across variables
 * N_PCT is precomputed character value
 */

proc report data=counts2;
  title 'Counts and computed %''s REPORT, time&country across';
  columns _name2_ time,country,(N_PCT) Z;
  define time    / ' ' across order=internal;
  define country / ' ' across order=data style=[cellwidth=4em textalign=center] ;
  define _name2_ / ' ' group order=data;
  define N_PCT   / 'n (%)' ;
  define Z / computed noprint;   /* trick */
  compute _name2_;
    if _name2_ =: 'ID' then 
      call define (_ROW_, 'STYLE', 'style=[textalign=center]');
  endcomp;
run;

Output

 summary percentages pivoted report#2.png

 

 

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

I tried with PROC REPORT and PROC FREQ but that gives me the numbers by modalities and not the total number. With the PROC MEANS, I can't calculate a percent. Any ideas ?

 

Show us the code you used and the output. Paste the code (as text) into the box that appears when you click on the running man icon. 

 

The expected results are on attachment.

 

Most of us will not download or open Microsoft Office documents as they can be security threats.

 

--
Paige Miller
Garpe
Obsidian | Level 7

Thanks for your advices. Here is a photo of the expected table: 

Capture.PNG

With PROC MEANS:

PROC SORT data = test;
by COUNTRY TIME;
RUN; 
PROC MEANS data = test n; var ID START END; by COUNTRY TIME; RUN;

With this code, the results are a table for each combination of month and country, and no percent...

 

The other solutions that I have considered (with a "simple" proc freq, or a proc report) display the numbers for each modality. That's why I'm confused with the strategy to adopt. 

 

Garpe

RichardDeVen
Barite | Level 11

Consider sample data with a little more JAN2020, AND an extra variable HALF, between START and END.  These extras will demonstrate some of the problems you will encounter. 

 

A common first try would be TABULATE, however the percentages you want are from n per variable (and n is not representing mutually exclusive conditions), so the COLPCTN concept won't apply -- neither data as is or data in transpose.

 

First look at SUMMARY counts and examine how the percentages could be calculated from that.

DATA have;
input ID START HALF END country $ time monyy7.;
format id start half end 4.;
attrib time format=monyy7. label='Month';
datalines;
1 3 5 6 FR FEB2020
2 5 7 8 US FEB2020
3 3 . . FR FEB2020
4 . . . FR JAN2020
5 . . . US JAN2020
6 4 6 . FR JAN2020
;

proc summary noprint data=have;
   class time country;
   var start half end id;
   output out=counts(index=(order=(time country))) n=/autoname;
run;

Summary output

Arrows showing numerators and common denominator that would be used to compute each original variables fraction (aka percentage) within SUMMARY row.  I.e. START_N / ID_N is a fractional value [0..1] that can be displayed with format PERCENTn.d 

summary percentages.png

For your desired output PRINT or REPORT procedures would require two groups of the summary variables (n and computed %) to be pivoted in parallel row-wise.

 

Transforming DATA step

The following DATA step (in spoiler) computes values and pivots rows.  The column pivoted into, such as PERCENT, is also formatted here.  An additional variable , N_PCT, is calculated as "nnn ( ##.#% )" and is for use in a slightly alternate report structure.

 

Spoiler
data counts2;
  set 
    counts(where=(_type_=3))
    counts(where=(_type_=2))  /* row wise want time totals after time>country totals, this is setup for a later ORDER=DATA */
  ;
  array counts   ID_N   START_N   HALF_N   END_N;
  array percents ID_PCT START_PCT HALF_PCT END_PCT;
  do _n_ = 1 to dim(percents);
    percents(_n_) = counts(_n_) / id_n;
  end;

  call missing(ID_PCT);

  if _type_ = 2 then country = 'Total';

  length _NAME_ $32;

  * loop for array based transpose (i.e. pivot) of summary data and computed values;
do _n_ = 1 to dim(percents); _NAME_ = scan(vname(percents(_n_)),1,'_'); COUNT = counts (_n_); PERCENT = percents(_n_); ************************************** ;
* compute values "<n> ( <pct> )" for also reporting in a single columns under country; ************************************** ;
length _NAME2_ $32 N_PCT $25; if _NAME_ ne 'ID' then _NAME2_ = _NAME_; else _NAME2_ = 'ID (n)'; if not missing(PERCENT) then N_PCT = vvalue(COUNT) || ' ( ' || vvalue(PERCENT) || ' )'; else N_PCT = vvalue(COUNT);
**************************************; output; end; keep time country _name_ _name2_ COUNT PERCENT N_PCT; format COUNT comma9.; format PERCENT id_pct start_pct half_pct end_pct percent8.1; run;

Summary data pivoted with additional computed values

The data is now in a categorical layout.  The time, country and original variable names can be used in TABULATE (CLASS) or REPORT (stacked /ACROSS or /GROUP).

summary percentages pivoted.png

 

REPORT

Two columns under each country, one for N, one for %.  Pretty standard reporting.  The header stacking (aka column dimension hierarchy) is specified with time, country, (COUNT PERCENT)

 

options missing = ' ';

proc report data=counts2;
  title 'Counts and computed %''s REPORT, time&country across';
  columns _name_ time, country, (COUNT PERCENT);
  define time    / ' ' across order=internal;
  define country / ' ' across order=data style=[cellwidth=5em textalign=center] ;
  define _name_  / ' ' group order=data;
  define COUNT   / 'n' format=comma9.;
  define PERCENT / '%' ;
run;

 

Output

summary percentages pivoted report.png

 

REPORT #2

One column under each country, nnn ( ##.#% ). This report code is more complicated and uses some tricks.

  • dummy column Z / noprint
  • custom row format for ID row
/* Trick: 
 * dummy column, Z, needed to display character variables under across variables
 * N_PCT is precomputed character value
 */

proc report data=counts2;
  title 'Counts and computed %''s REPORT, time&country across';
  columns _name2_ time,country,(N_PCT) Z;
  define time    / ' ' across order=internal;
  define country / ' ' across order=data style=[cellwidth=4em textalign=center] ;
  define _name2_ / ' ' group order=data;
  define N_PCT   / 'n (%)' ;
  define Z / computed noprint;   /* trick */
  compute _name2_;
    if _name2_ =: 'ID' then 
      call define (_ROW_, 'STYLE', 'style=[textalign=center]');
  endcomp;
run;

Output

 summary percentages pivoted report#2.png

 

 

Garpe
Obsidian | Level 7
Thank you so much for your explanations (and sorry for the delay in responding). I will try this solution with my full data but it seems to match perfectly.

Garpe

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1675 views
  • 6 likes
  • 3 in conversation