Dear all,
I need to plot a histogram for a sample dataset like this using sgplot (rel. frequencies per year for each group):
Year of Diagnosis | Group | Number of Cases | Relative Frequecies per Year |
2015 | A | 10 | |
2015 | B | 20 | |
2016 | A | 15 | |
2016 | B | 5 | |
2017 | A | 16 | |
2017 | B | 30 | |
2017 | C | 50 | |
2018 | B | 13 | |
2018 | C | 5 |
my first question is that can I use proc sql or proc means to calculate the relative frequencies per year? If yes, how?
If I use proc freq, it calculates the sum over all years but I need the rel. frequencies for group per year. For example for 2015, the rel. frequency should be 10/30*100 for group A and 20/30*100 for group B
the sample plot will look like this: in my case this will be group A, B and C
I am quite unsure regarding what you are asking because the relative frequency (as defined by you) is just the % of the total of the year by group, and therefore the stacks will always add up to 100%, anyway below is my try:
data have;
infile datalines;
input year $ group $ nbr_of_cases;
datalines;
2015 A 10
2015 B 20
2016 A 15
2016 B 5
2017 A 16
2017 B 30
2017 C 50
2018 B 13
2018 C 5
;
run;
proc means data=want noprint nway;
class year;
var nbr_of_cases;
output out=have_sum (drop=_TYPE_ _FREQ_) sum= / autoname;
run;
data want;
merge have have_sum;
by year;
format rel_freq 8.2;
rel_freq = divide(nbr_of_cases,nbr_of_cases_sum);
drop nbr_of_cases nbr_of_cases_sum;
run;
proc sgplot data=want;
vbar year / response=rel_freq group=group;
run;
I am quite unsure regarding what you are asking because the relative frequency (as defined by you) is just the % of the total of the year by group, and therefore the stacks will always add up to 100%, anyway below is my try:
data have;
infile datalines;
input year $ group $ nbr_of_cases;
datalines;
2015 A 10
2015 B 20
2016 A 15
2016 B 5
2017 A 16
2017 B 30
2017 C 50
2018 B 13
2018 C 5
;
run;
proc means data=want noprint nway;
class year;
var nbr_of_cases;
output out=have_sum (drop=_TYPE_ _FREQ_) sum= / autoname;
run;
data want;
merge have have_sum;
by year;
format rel_freq 8.2;
rel_freq = divide(nbr_of_cases,nbr_of_cases_sum);
drop nbr_of_cases nbr_of_cases_sum;
run;
proc sgplot data=want;
vbar year / response=rel_freq group=group;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.