Hello!
I have a large case dataset. I would like to create a barchart showing case counts by month for 2019 (y-axis is frequency), with an average line overlayed on top. The line would represent the 5-year average case count by month. I am using the following code to create the barchart:
ods html file="P:\SASDATA\DescriptiveStats" file="campy.html";
proc gchart data=work.camp;
vbar datamonth/ discrete type=freq;
where datayear=2019;
run;
quit;
ods html close;
How can I create and add a 5-year average line to this barchart? Thanks for any thoughts!
Jessica
You have to calculate the 5 year average before you draw the plot, then store the value in a macro variable. Then use the REF= option of the VBAR statement.
Provide us with a portion of the actual data as SAS data step code, following these instructions: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Many people (including me) will not download Excel files because they are a security risk.
You can type or paste the de-identified data into your reply, click on the {i} icon and then type or paste the data into that window.
Thanks!
Attached and below is a sample of my dataset, de-identified. The sample contains cases from 2015 (the entire dataset spans 2015-2019).
Is this format ok? I just attached (and copied) the downloaded excel file of the dataset.
Obs | CaseID | Status | Effective_From_Date | Create_Date | DISEASE | DISEASE_STATUS | EVENT_DATE | EVENT_TYPE | STATE_HIDDEN | CASE_HOSPITALIZED_0 | SS_BLOODY_DIARRHEA | SS_DIARRHEA | SS_DIARRHEA_BLOODY | SS_DIARRHEA_DURATION | SS_DIARRHEA_ONSET_DATE | SS_DIARRHEA_ONSET_TIME | SS_FEVER | SS_THROMBOCYTOPENIA | SS_VOMITING | SS_VOMITING_ONSET_DATE | SS_VOMITING_ONSET_TIME | STILL_SYMPTOMATIC | SYMPTOMS | SYMPTOM_ONSET_DATE | SYMPTOM_ONSET_TIME | WORK_RELATED_INJURY | AGE_YEARS | COUNTRY | COUNTY | GENDER | HISPANIC | RACE_0 | DtSpec | Pathogen | TravelInt | DtSpec2 | datayear | datamonth | datamonth05yr | datamonth10yr | MMWR2 |
1 | 100515896 | Closed | 7-Jan-15 | 12-Jan-15 | CAMP | CONFIRMED | 7-Jan-15 | SYMPTOM_ONSET_DATE | CT | YES | YES | NO | 8 | 7-Jan-15 | YES | NO | YES | 7-Jan-15 | No | 64 | USA | New Haven County | MALE | NO | WHITE | 1/9/2015 | Campylobacter | No | 1/9/2015 | 2015 | 1 | 1.2 | 1.2 | 15_01 | |||||||
2 | 100516106 | . | . | . | . | . | . | . | 1/3/2015 | Campylobacter | No | 1/3/2015 | 2015 | 1 | 1 | 1.1111 | |||||||||||||||||||||||||
3 | 100516107 | Closed | 5-Jan-15 | 13-Jan-15 | CAMP | CONFIRMED | 5-Jan-15 | LAB_TEST_DATE | CT | NO | YES | NO | . | . | NO | YES | YES | . | 58 | USA | Fairfield County | FEMALE | NO | WHITE | 1/5/2015 | Campylobacter | No | 1/5/2015 | 2015 | 1 | 1 | 1.1 | 15_01 | ||||||||
4 | 100516116 | Closed | 2-Jan-15 | 13-Jan-15 | CAMP | CONFIRMED | 2-Jan-15 | LAB_TEST_DATE | CT | NO | . | . | UNKNOWN | . | 62 | USA | Fairfield County | MALE | NO | WHITE | 1/2/2015 | Campylobacter | Unknown | 1/2/2015 | 2015 | 1 | 1 | 1 | |||||||||||||
5 | 100516476 | Closed | 2-Jan-15 | 15-Jan-15 | CAMP | CONFIRMED | 2-Jan-15 | SYMPTOM_ONSET_DATE | CT | NO | YES | NO | 5 | 2-Jan-15 | am | NO | NO | YES | 2-Jan-15 | am | 56 | USA | New Haven County | FEMALE | NO | BLACK_AFRICAN_AMERICAN | 1/7/2015 | Campylobacter | No | 1/7/2015 | 2015 | 1 | 1 | 1 | 15_01 | ||||||
6 | 100516481 | Closed | 5-Jan-15 | 15-Jan-15 | CAMP | CONFIRMED | 5-Jan-15 | SYMPTOM_ONSET_DATE | CT | NO | YES | NO | 2 | 5-Jan-15 | 9:00 PM | YES | NO | YES | 5-Jan-15 | 31 | USA | New Haven County | FEMALE | NO | WHITE | 1/7/2015 | Campylobacter | Yes | 1/7/2015 | 2015 | 1 | 1 | 1 | 15_01 | |||||||
7 | 100516484 | Closed | 8-Jan-15 | 15-Jan-15 | CAMP | CONFIRMED | 8-Jan-15 | LAB_TEST_DATE | CT | NO | . | . | UNKNOWN | . | 57 | USA | New London County | FEMALE | UNKNOWN | BLACK_AFRICAN_AMERICAN | 1/8/2015 | Campylobacter | Unknown | 1/8/2015 | 2015 | 1 | 1 | 1 | 15_01 | ||||||||||||
8 | 100516485 | Closed | 4-Jan-15 | 15-Jan-15 | CAMP | CONFIRMED | 4-Jan-15 | SYMPTOM_ONSET_DATE | CT | YES | YES | NO | 8 | 6-Jan-15 | NO | NO | YES | 4-Jan-15 | 61 | USA | New Haven County | MALE | NO | WHITE | 1/11/2015 | Campylobacter | No | 1/11/2015 | 2015 | 1 | 1 | 1 | 15_02 | ||||||||
9 | 100516544 | Closed | 7-Jan-15 | 16-Jan-15 | CAMP | CONFIRMED | 7-Jan-15 | LAB_TEST_DATE | CT | NO | . | . | UNKNOWN | . | 94 | USA | Hartford County | FEMALE | UNKNOWN | Unknown | 1/7/2015 | Campylobacter | Unknown | 1/7/2015 | 2015 | 1 | 1 | 1 | 15_01Obs CaseID Status Effective_From_Date Create_Date DISEASE DISEASE_STATUS EVENT_DATE EVENT_TYPE STATE_HIDDEN CASE_HOSPITALIZED_0 SS_BLOODY_DIARRHEA SS_DIARRHEA SS_DIARRHEA_BLOODY SS_DIARRHEA_DURATION SS_DIARRHEA_ONSET_DATE SS_DIARRHEA_ONSET_TIME SS_FEVER SS_THROMBOCYTOPENIA SS_VOMITING SS_VOMITING_ONSET_DATE SS_VOMITING_ONSET_TIME STILL_SYMPTOMATIC SYMPTOMS SYMPTOM_ONSET_DATE SYMPTOM_ONSET_TIME WORK_RELATED_INJURY AGE_YEARS COUNTRY COUNTY GENDER HISPANIC RACE_0 DtSpec Pathogen TravelInt DtSpec2 datayear datamonth datamonth05yr datamonth10yr MMWR2 1 100515896 Closed 7-Jan-15 12-Jan-15 CAMP CONFIRMED 7-Jan-15 SYMPTOM_ONSET_DATE CT YES YES NO 8 7-Jan-15 YES NO YES 7-Jan-15 No 64 USA New Haven County MALE NO WHITE 1/9/2015 Campylobacter No 1/9/2015 2015 1 1.2 1.2 15_01 2 100516106 . . . . . . . 1/3/2015 Campylobacter No 1/3/2015 2015 1 1 1.1111 3 100516107 Closed 5-Jan-15 13-Jan-15 CAMP CONFIRMED 5-Jan-15 LAB_TEST_DATE CT NO YES NO . . NO YES YES . 58 USA Fairfield County FEMALE NO WHITE 1/5/2015 Campylobacter No 1/5/2015 2015 1 1 1.1 15_01 4 100516116 Closed 2-Jan-15 13-Jan-15 CAMP CONFIRMED 2-Jan-15 LAB_TEST_DATE CT NO . . UNKNOWN . 62 USA Fairfield County MALE NO WHITE 1/2/2015 Campylobacter Unknown 1/2/2015 2015 1 1 1 5 100516476 Closed 2-Jan-15 15-Jan-15 CAMP CONFIRMED 2-Jan-15 SYMPTOM_ONSET_DATE CT NO YES NO 5 2-Jan-15 am NO NO YES 2-Jan-15 am 56 USA New Haven County FEMALE NO BLACK_AFRICAN_AMERICAN 1/7/2015 Campylobacter No 1/7/2015 2015 1 1 1 15_01 6 100516481 Closed 5-Jan-15 15-Jan-15 CAMP CONFIRMED 5-Jan-15 SYMPTOM_ONSET_DATE CT NO YES NO 2 5-Jan-15 9:00 PM YES NO YES 5-Jan-15 31 USA New Haven County FEMALE NO WHITE 1/7/2015 Campylobacter Yes 1/7/2015 2015 1 1 1 15_01 7 100516484 Closed 8-Jan-15 15-Jan-15 CAMP CONFIRMED 8-Jan-15 LAB_TEST_DATE CT NO . . UNKNOWN . 57 USA New London County FEMALE UNKNOWN BLACK_AFRICAN_AMERICAN 1/8/2015 Campylobacter Unknown 1/8/2015 2015 1 1 1 15_01 8 100516485 Closed 4-Jan-15 15-Jan-15 CAMP CONFIRMED 4-Jan-15 SYMPTOM_ONSET_DATE CT YES YES NO 8 6-Jan-15 NO NO YES 4-Jan-15 61 USA New Haven County MALE NO WHITE 1/11/2015 Campylobacter No 1/11/2015 2015 1 1 1 15_02 9 100516544 Closed 7-Jan-15 16-Jan-15 CAMP CONFIRMED 7-Jan-15 LAB_TEST_DATE CT NO . . UNKNOWN . 94 USA Hartford County FEMALE UNKNOWN Unknown 1/7/2015 Campylobacter Unknown 1/7/2015 2015 1 1 1 15_01 |
So as I understand things, you want to count cases in each month. All of these would count as cases in January 2015 except row 2?
Is it possible to have event_date different month from create_date different month from effective_From_date? In this case, which date do we use to determine counts in a month?
UNTESTED CODE
/* First obtain counts by month */
proc freq data=have;
table dtspec2/noprint out=_counts_;
format dtspec2 yymm6.;
run;
/* Next get average of 5 years */
proc summary data=_counts_ (where=(year(dtspec2)>=2015 and year(dtspec2)<=2019));
var count;
output out=_mean_ mean=mean_count;
run;
/* Create macro variable */
data _null_;
set _mean_;
call symputx('mean_count',mean_count);
run;
/* Create plot */
proc gchart data=work.camp;
vbar datamonth/ discrete type=freq ref=&mean_count;
where datayear=2019;
run;
quit;
Going back to the original problem statement
I would like to create a barchart showing case counts by month for 2019 (y-axis is frequency), with an average line overlayed on top. The line would represent the 5-year average case count by month.
This seems to refer to a single average and a single line, the word line does not appear as the plural word "lines". Now you are asking for different lines for each month? One line for January that just goes over the different January bars, and similarly a separate line for February and all the other months?
This would require a somewhat different approach, and although it would be possible in PROC GPLOT, you would have to use the ANNOTATE facility, and I no longer remember those details. Nor is it immediately obvious how to do this in the newer PROC SGPLOT. So at this time, I do not have an answer.
Doing this in SGPLOT, I would overlay a STEP plot on top of a VBARPARM.
Thinking outside the box, it would be relatively simple to produce dots for the average of each month over the last five years (one dot for the value in January, a different value for the dot for February, etc.) on the vertical bar plot, and then connect the dots, so the plot shows the bars and the sequence of dots. Even better yet, in my opinion, instead of bars for the monthly counts, you have one line (let's say red) that shows the counts by month, and another line (let's say blue) that shows the average counts in each month. This would visually get the information across and again in my opinion, would be a better plot layout than bars plus dots, or bars plus lines.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.