Hello,
I would like to build a table which would display the number of people in my table (ID), the number of people who started the race (START), and those who finished it (END) by country (COUNTRY) and by month (MONTH). A second column would give the pourcent of people who started the race compared to the total number of people and the pourcent of people who ended the race compared to the total number of people.
The expected results are on attachment.
Here are some data:
DATA test;
input id start end country $ time $;
datalines;
1 3 6 FR FEB
2 5 8 US FEB
3 3 . FR FEB
4 . . FR JAN
5 . . US JAN
RUN;
I tried with PROC REPORT and PROC FREQ but that gives me the numbers by modalities and not the total number. With the PROC MEANS, I can't calculate a percent. Any ideas ?
Many thanks in advance,
Garpe
EDIT :
Consider sample data with a little more JAN2020, AND an extra variable HALF, between START and END. These extras will demonstrate some of the problems you will encounter.
A common first try would be TABULATE, however the percentages you want are from n per variable (and n is not representing mutually exclusive conditions), so the COLPCTN concept won't apply -- neither data as is or data in transpose.
First look at SUMMARY counts and examine how the percentages could be calculated from that.
DATA have; input ID START HALF END country $ time monyy7.; format id start half end 4.; attrib time format=monyy7. label='Month'; datalines; 1 3 5 6 FR FEB2020 2 5 7 8 US FEB2020 3 3 . . FR FEB2020 4 . . . FR JAN2020 5 . . . US JAN2020 6 4 6 . FR JAN2020 ; proc summary noprint data=have; class time country; var start half end id; output out=counts(index=(order=(time country))) n=/autoname; run;
Summary output
Arrows showing numerators and common denominator that would be used to compute each original variables fraction (aka percentage) within SUMMARY row. I.e. START_N / ID_N is a fractional value [0..1] that can be displayed with format PERCENTn.d
For your desired output PRINT or REPORT procedures would require two groups of the summary variables (n and computed %) to be pivoted in parallel row-wise.
Transforming DATA step
The following DATA step (in spoiler) computes values and pivots rows. The column pivoted into, such as PERCENT, is also formatted here. An additional variable , N_PCT, is calculated as "nnn ( ##.#% )" and is for use in a slightly alternate report structure.
data counts2; set counts(where=(_type_=3)) counts(where=(_type_=2)) /* row wise want time totals after time>country totals, this is setup for a later ORDER=DATA */ ; array counts ID_N START_N HALF_N END_N; array percents ID_PCT START_PCT HALF_PCT END_PCT; do _n_ = 1 to dim(percents); percents(_n_) = counts(_n_) / id_n; end; call missing(ID_PCT); if _type_ = 2 then country = 'Total'; length _NAME_ $32; * loop for array based transpose (i.e. pivot) of summary data and computed values;
do _n_ = 1 to dim(percents); _NAME_ = scan(vname(percents(_n_)),1,'_'); COUNT = counts (_n_); PERCENT = percents(_n_); ************************************** ;
* compute values "<n> ( <pct> )" for also reporting in a single columns under country; ************************************** ;
length _NAME2_ $32 N_PCT $25; if _NAME_ ne 'ID' then _NAME2_ = _NAME_; else _NAME2_ = 'ID (n)'; if not missing(PERCENT) then N_PCT = vvalue(COUNT) || ' ( ' || vvalue(PERCENT) || ' )'; else N_PCT = vvalue(COUNT);
**************************************; output; end; keep time country _name_ _name2_ COUNT PERCENT N_PCT; format COUNT comma9.; format PERCENT id_pct start_pct half_pct end_pct percent8.1; run;
Summary data pivoted with additional computed values
The data is now in a categorical layout. The time, country and original variable names can be used in TABULATE (CLASS) or REPORT (stacked /ACROSS or /GROUP).
REPORT
Two columns under each country, one for N, one for %. Pretty standard reporting. The header stacking (aka column dimension hierarchy) is specified with time, country, (COUNT PERCENT)
options missing = ' ';
proc report data=counts2;
title 'Counts and computed %''s REPORT, time&country across';
columns _name_ time, country, (COUNT PERCENT);
define time / ' ' across order=internal;
define country / ' ' across order=data style=[cellwidth=5em textalign=center] ;
define _name_ / ' ' group order=data;
define COUNT / 'n' format=comma9.;
define PERCENT / '%' ;
run;
Output
REPORT #2
One column under each country, nnn ( ##.#% ). This report code is more complicated and uses some tricks.
/* Trick:
* dummy column, Z, needed to display character variables under across variables
* N_PCT is precomputed character value
*/
proc report data=counts2;
title 'Counts and computed %''s REPORT, time&country across';
columns _name2_ time,country,(N_PCT) Z;
define time / ' ' across order=internal;
define country / ' ' across order=data style=[cellwidth=4em textalign=center] ;
define _name2_ / ' ' group order=data;
define N_PCT / 'n (%)' ;
define Z / computed noprint; /* trick */
compute _name2_;
if _name2_ =: 'ID' then
call define (_ROW_, 'STYLE', 'style=[textalign=center]');
endcomp;
run;
Output
I tried with PROC REPORT and PROC FREQ but that gives me the numbers by modalities and not the total number. With the PROC MEANS, I can't calculate a percent. Any ideas ?
Show us the code you used and the output. Paste the code (as text) into the box that appears when you click on the running man icon.
The expected results are on attachment.
Most of us will not download or open Microsoft Office documents as they can be security threats.
Thanks for your advices. Here is a photo of the expected table:
With PROC MEANS:
PROC SORT data = test;
by COUNTRY TIME;
RUN;
PROC MEANS data = test n;
var ID START END;
by COUNTRY TIME;
RUN;
With this code, the results are a table for each combination of month and country, and no percent...
The other solutions that I have considered (with a "simple" proc freq, or a proc report) display the numbers for each modality. That's why I'm confused with the strategy to adopt.
Garpe
Consider sample data with a little more JAN2020, AND an extra variable HALF, between START and END. These extras will demonstrate some of the problems you will encounter.
A common first try would be TABULATE, however the percentages you want are from n per variable (and n is not representing mutually exclusive conditions), so the COLPCTN concept won't apply -- neither data as is or data in transpose.
First look at SUMMARY counts and examine how the percentages could be calculated from that.
DATA have; input ID START HALF END country $ time monyy7.; format id start half end 4.; attrib time format=monyy7. label='Month'; datalines; 1 3 5 6 FR FEB2020 2 5 7 8 US FEB2020 3 3 . . FR FEB2020 4 . . . FR JAN2020 5 . . . US JAN2020 6 4 6 . FR JAN2020 ; proc summary noprint data=have; class time country; var start half end id; output out=counts(index=(order=(time country))) n=/autoname; run;
Summary output
Arrows showing numerators and common denominator that would be used to compute each original variables fraction (aka percentage) within SUMMARY row. I.e. START_N / ID_N is a fractional value [0..1] that can be displayed with format PERCENTn.d
For your desired output PRINT or REPORT procedures would require two groups of the summary variables (n and computed %) to be pivoted in parallel row-wise.
Transforming DATA step
The following DATA step (in spoiler) computes values and pivots rows. The column pivoted into, such as PERCENT, is also formatted here. An additional variable , N_PCT, is calculated as "nnn ( ##.#% )" and is for use in a slightly alternate report structure.
data counts2; set counts(where=(_type_=3)) counts(where=(_type_=2)) /* row wise want time totals after time>country totals, this is setup for a later ORDER=DATA */ ; array counts ID_N START_N HALF_N END_N; array percents ID_PCT START_PCT HALF_PCT END_PCT; do _n_ = 1 to dim(percents); percents(_n_) = counts(_n_) / id_n; end; call missing(ID_PCT); if _type_ = 2 then country = 'Total'; length _NAME_ $32; * loop for array based transpose (i.e. pivot) of summary data and computed values;
do _n_ = 1 to dim(percents); _NAME_ = scan(vname(percents(_n_)),1,'_'); COUNT = counts (_n_); PERCENT = percents(_n_); ************************************** ;
* compute values "<n> ( <pct> )" for also reporting in a single columns under country; ************************************** ;
length _NAME2_ $32 N_PCT $25; if _NAME_ ne 'ID' then _NAME2_ = _NAME_; else _NAME2_ = 'ID (n)'; if not missing(PERCENT) then N_PCT = vvalue(COUNT) || ' ( ' || vvalue(PERCENT) || ' )'; else N_PCT = vvalue(COUNT);
**************************************; output; end; keep time country _name_ _name2_ COUNT PERCENT N_PCT; format COUNT comma9.; format PERCENT id_pct start_pct half_pct end_pct percent8.1; run;
Summary data pivoted with additional computed values
The data is now in a categorical layout. The time, country and original variable names can be used in TABULATE (CLASS) or REPORT (stacked /ACROSS or /GROUP).
REPORT
Two columns under each country, one for N, one for %. Pretty standard reporting. The header stacking (aka column dimension hierarchy) is specified with time, country, (COUNT PERCENT)
options missing = ' ';
proc report data=counts2;
title 'Counts and computed %''s REPORT, time&country across';
columns _name_ time, country, (COUNT PERCENT);
define time / ' ' across order=internal;
define country / ' ' across order=data style=[cellwidth=5em textalign=center] ;
define _name_ / ' ' group order=data;
define COUNT / 'n' format=comma9.;
define PERCENT / '%' ;
run;
Output
REPORT #2
One column under each country, nnn ( ##.#% ). This report code is more complicated and uses some tricks.
/* Trick:
* dummy column, Z, needed to display character variables under across variables
* N_PCT is precomputed character value
*/
proc report data=counts2;
title 'Counts and computed %''s REPORT, time&country across';
columns _name2_ time,country,(N_PCT) Z;
define time / ' ' across order=internal;
define country / ' ' across order=data style=[cellwidth=4em textalign=center] ;
define _name2_ / ' ' group order=data;
define N_PCT / 'n (%)' ;
define Z / computed noprint; /* trick */
compute _name2_;
if _name2_ =: 'ID' then
call define (_ROW_, 'STYLE', 'style=[textalign=center]');
endcomp;
run;
Output
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.