Solved: Re: Stacked bar chart with percentages

Quentin · Posted 02-23-2024 01:35 PM

Maybe I'm just having a bad Friday, but I'm struggling to make a stacked bar chart, and it feels like the kind of struggling that maybe I'm doing something dumb...

I have data with one record per person, with people from two states, and two years. For each person, I have a boolean

variable that indicates pass/fail. Data like:

data have ;
  input year state $2. Fail ;
  cards ;
2023 MA 1
2023 MA 1
2023 MA 0
2023 MA 0
2023 MA 0
2023 MA 0
2023 MA 0
2023 MA 0
2023 RI 1
2023 RI 0
2024 MA 1
2024 MA 1
2024 MA 1
2024 MA 1
2024 MA 0
2024 MA 0
2024 MA 0
2024 MA 0
2024 RI 1
2024 RI 1
;
run ;

So there are 10 people in each year. My goal is to make a bar chart that shows that 30% failed in 2023 and 60% failed in 2024. I want it to be a stacked bar chart, with state as the group. It should show 30% failed in 2023 (20% MA, 10%RI) and 60%failed in 2024 (40% MA, 20%RI).

As I couldn't make the chart I want directly from this data (but would be happy if you can), I first thought I should calculate the percentages I want myself. Even that feels overly complex, because I need to get the sum of the failures for each state-year, and then divide that by the count of rows for the year. I can get what I want with SQL, but it feels like it should be easier...

proc sql ;
  create table want as
  select a.year,a.state,sum(fail)/n as pct
  from have as a
      ,(select year,n(fail) as n from have group by year) as b
  where a.year=b.year
  group by a.year,state
  ;
quit ;

That gives:

Obs    year    state    pct

 1     2023     MA      0.2
 2     2023     RI      0.1
 3     2024     MA      0.4
 4     2024     RI      0.2

Then to get my stacked bar chart:

proc sgplot data=want ;
  vbar  year/ response=pct group=state;
  format pct percent. ;
run;

Which gives me what I want:

Is there a better/easier way to do this?

The point is not to compare the failure rate for MA vs RI. I'm not calculating that percentage. The goal is to compare the failure rate by year. And in each year, see how many of those failures came from MA vs RI.

yabwon · Posted 02-23-2024 03:03 PM

Small update on "missing class" colour:

data have2;
  set have have(in=i drop=state);
  if i then Fail = NOT Fail;
run;

data MyAttrMap;
  ID='grpMiss';
  value=' '; /* missing group */
  fillcolor='Black';
  filltransparency=1;
  output;
run;

proc sgplot 
  data=have2(where=(Fail)) 
  dattrmap=MyAttrMap 
  PCTLEVEL=group 
;
  vbar  year / group=state response=fail 
    STAT=PERCENT 
    GROUPORDER=DATA
    NOOUTLINE 
    MISSING 
    ATTRID=grpMiss
  ;
  yaxis max=0.6;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

View solution in original post

Quentin · Posted 02-23-2024 01:59 PM

I usually stay away from stacked bar charts. Maybe the line chart version is easier to read(and make):

data want ;
  set have ;
  FailMa=(State='MA' and Fail) ;
  FailRI=(State='RI' and Fail) ;
run ;

proc sgplot data=want ;
  vline year /response=fail stat=mean ;
  vline year /response=failMA stat=mean ;
  vline year /response=failRI stat=mean ;
run ;

yabwon · Posted 02-23-2024 02:36 PM

data have ;
  input year state $2. Fail ;
  cards ;
2023 MA 1
2023 MA 1
2023 MA 0
2023 MA 0
2023 MA 0
2023 MA 0
2023 MA 0
2023 MA 0
2023 RI 1
2023 RI 0
2024 MA 1
2024 MA 1
2024 MA 1
2024 MA 1
2024 MA 0
2024 MA 0
2024 MA 0
2024 MA 0
2024 RI 1
2024 RI 1
;
run ;

data have2;
  set have;
  state=".."; /* "extra" state to get 100% */
  Fail = NOT Fail;
run;

data have3;
  set have have2;
run;

ods graphics / reset attrpriority=color;
proc sgplot data=have3(where=(Fail)) PCTLEVEL=group ;
  styleattrs datacolors=(red blue white); /* last value white to hide in background */
  vbar  year / group=state 
  RESPONSE=fail 
  STAT=PERCENT 
  GROUPORDER=DATA /* to keep "extra" group "on top" */
  NOOUTLINE;
  yaxis max=0.6; /* to scale (optionally)*/
run;

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

yabwon · Posted 02-23-2024 03:03 PM

Small update on "missing class" colour:

data have2;
  set have have(in=i drop=state);
  if i then Fail = NOT Fail;
run;

data MyAttrMap;
  ID='grpMiss';
  value=' '; /* missing group */
  fillcolor='Black';
  filltransparency=1;
  output;
run;

proc sgplot 
  data=have2(where=(Fail)) 
  dattrmap=MyAttrMap 
  PCTLEVEL=group 
;
  vbar  year / group=state response=fail 
    STAT=PERCENT 
    GROUPORDER=DATA
    NOOUTLINE 
    MISSING 
    ATTRID=grpMiss
  ;
  yaxis max=0.6;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Quentin · Posted 02-23-2024 04:09 PM

Very creative solution @yabwon , I would have never thought of adding fake data to let vbar calculate the percentage I want via percentage of sum. As Paul D. would say, thanks 1E6!

FreelanceReinh · Posted 02-23-2024 04:12 PM

@Quentin wrote:

... I should calculate the percentages I want myself. Even that feels overly complex, ...

The percentages could also be calculated with PROC FREQ:

proc freq data=have noprint;
by year;
tables state*fail / out=want1(where=(fail));
run;

But then a format such as

proc format;
picture pctfmt low-high='009%';
run;

would be needed to get the percent signs into the tick mark labels. Also, the label of variable PERCENT ("Percent of Total Frequency") becomes the new default y-axis label.

Quentin · Posted 02-23-2024 04:20 PM

Thanks @FreelanceReinh . I had mucked about with PROC TABULATE for a bit, but I'm trash at calculating percentages with PROC TABULATE. I never even thought to try PROC FREQ with a BY statement. Much better than my SQL step.

jeanleid · Posted 03-07-2024 02:36 PM

You could also make some small modifications to your proc sql step and then use VBARPARM:

proc sql ;
create table want as
select a.year,a.state,sum(fail)/n as pct "% failed" FORMAT=PERCENT8.
from have as a
,(select year,n(fail) as n from have group by year) as b
where a.year=b.year
group by a.year,state
;
quit ;

PROC SGPLOT DATA= WANT NOBORDER ;
styleattrs datacolors=(darkred darkblue) datacontrastcolors=(black) ;
vbarparm category= year response= pct / group= state seglabel seglabelattrs=(color=white weight=bold);
run ;

This gives you a stacked bar chart with each bar segment labelled with the percentage values from PROC SQL.

(Sorry- I'm new to the SAS community and not quite sure how to paste in the resulting graph.)

Registration is open

SAS Training: Just a Click Away