BookmarkSubscribeRSS Feed
jbrock
Fluorite | Level 6

Hello!

 

I have a large case dataset.  I would like to create a barchart showing case counts by month for 2019 (y-axis is frequency), with an average line overlayed on top.   The line would represent the 5-year average case count by month.   I am using the following code to create the barchart:

 

ods html file="P:\SASDATA\DescriptiveStats" file="campy.html";
proc gchart data=work.camp;
vbar datamonth/ discrete type=freq;
where datayear=2019;
run;
quit;
ods html close;

 

How can I create and add a 5-year average line to this barchart?   Thanks for any thoughts!

 

Jessica

 

 

17 REPLIES 17
PaigeMiller
Diamond | Level 26

You have to calculate the 5 year average before you draw the plot, then store the value in a macro variable. Then use the REF= option of the VBAR statement.

--
Paige Miller
jbrock
Fluorite | Level 6
Thanks!


Could you advise on calculating the average count variable? My dataset contains case counts for 2015-2019, so I would like to average the monthly counts across all years.


?
PaigeMiller
Diamond | Level 26

Provide us with a portion of the actual data as SAS data step code, following these instructions: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

--
Paige Miller
jbrock
Fluorite | Level 6
?Hi Paige, the code is not working quite right (I think), and my dataset has a lot of confidential fields. Would it work if I attached a few de-identified observations in excel format?

PaigeMiller
Diamond | Level 26

Many people (including me) will not download Excel files because they are a security risk.

 

You can type or paste the de-identified data into your reply, click on the {i} icon and then type or paste the data into that window.

--
Paige Miller
jbrock
Fluorite | Level 6

Thanks! 

 

Attached and below is a sample of my dataset, de-identified.  The sample contains cases from 2015 (the entire dataset spans 2015-2019).

 

Is this format ok?  I just attached (and copied) the downloaded excel file of the dataset.

 

 

ObsCaseIDStatusEffective_From_DateCreate_DateDISEASEDISEASE_STATUSEVENT_DATEEVENT_TYPESTATE_HIDDENCASE_HOSPITALIZED_0SS_BLOODY_DIARRHEASS_DIARRHEASS_DIARRHEA_BLOODYSS_DIARRHEA_DURATIONSS_DIARRHEA_ONSET_DATESS_DIARRHEA_ONSET_TIMESS_FEVERSS_THROMBOCYTOPENIASS_VOMITINGSS_VOMITING_ONSET_DATESS_VOMITING_ONSET_TIMESTILL_SYMPTOMATICSYMPTOMSSYMPTOM_ONSET_DATESYMPTOM_ONSET_TIMEWORK_RELATED_INJURYAGE_YEARSCOUNTRYCOUNTYGENDERHISPANICRACE_0DtSpecPathogenTravelIntDtSpec2datayeardatamonthdatamonth05yrdatamonth10yrMMWR2
1100515896Closed7-Jan-1512-Jan-15CAMPCONFIRMED7-Jan-15SYMPTOM_ONSET_DATECTYES YESNO87-Jan-15 YES NO   YES7-Jan-15 No64USANew Haven CountyMALENOWHITE1/9/2015CampylobacterNo1/9/2015201511.21.215_01
2100516106 ..  .      ..        .  .     1/3/2015CampylobacterNo1/3/20152015111.1111 
3100516107Closed5-Jan-1513-Jan-15CAMPCONFIRMED5-Jan-15LAB_TEST_DATECTNO YESNO.. NO YES   YES.  58USAFairfield CountyFEMALENOWHITE1/5/2015CampylobacterNo1/5/20152015111.115_01
4100516116Closed2-Jan-1513-Jan-15CAMPCONFIRMED2-Jan-15LAB_TEST_DATECTNO   ..       UNKNOWN.  62USAFairfield CountyMALENOWHITE1/2/2015CampylobacterUnknown1/2/20152015111 
5100516476Closed2-Jan-1515-Jan-15CAMPCONFIRMED2-Jan-15SYMPTOM_ONSET_DATECTNO YESNO52-Jan-15amNO NO   YES2-Jan-15am 56USANew Haven CountyFEMALENOBLACK_AFRICAN_AMERICAN1/7/2015CampylobacterNo1/7/2015201511115_01
6100516481Closed5-Jan-1515-Jan-15CAMPCONFIRMED5-Jan-15SYMPTOM_ONSET_DATECTNO YESNO25-Jan-159:00 PMYES NO   YES5-Jan-15  31USANew Haven CountyFEMALENOWHITE1/7/2015CampylobacterYes1/7/2015201511115_01
7100516484Closed8-Jan-1515-Jan-15CAMPCONFIRMED8-Jan-15LAB_TEST_DATECTNO   ..       UNKNOWN.  57USANew London CountyFEMALEUNKNOWNBLACK_AFRICAN_AMERICAN1/8/2015CampylobacterUnknown1/8/2015201511115_01
8100516485Closed4-Jan-1515-Jan-15CAMPCONFIRMED4-Jan-15SYMPTOM_ONSET_DATECTYES YESNO86-Jan-15 NO NO   YES4-Jan-15  61USANew Haven CountyMALENOWHITE1/11/2015CampylobacterNo1/11/2015201511115_02
9100516544Closed7-Jan-1516-Jan-15CAMPCONFIRMED7-Jan-15LAB_TEST_DATECTNO   ..       UNKNOWN.  94USAHartford CountyFEMALEUNKNOWNUnknown1/7/2015CampylobacterUnknown1/7/2015201511115_01
Obs	CaseID	Status	Effective_From_Date	Create_Date	DISEASE	DISEASE_STATUS	EVENT_DATE	EVENT_TYPE	STATE_HIDDEN	CASE_HOSPITALIZED_0	SS_BLOODY_DIARRHEA	SS_DIARRHEA	SS_DIARRHEA_BLOODY	SS_DIARRHEA_DURATION	SS_DIARRHEA_ONSET_DATE	SS_DIARRHEA_ONSET_TIME	SS_FEVER	SS_THROMBOCYTOPENIA	SS_VOMITING	SS_VOMITING_ONSET_DATE	SS_VOMITING_ONSET_TIME	STILL_SYMPTOMATIC	SYMPTOMS	SYMPTOM_ONSET_DATE	SYMPTOM_ONSET_TIME	WORK_RELATED_INJURY	AGE_YEARS	COUNTRY	COUNTY	GENDER	HISPANIC	RACE_0	DtSpec	Pathogen	TravelInt	DtSpec2	datayear	datamonth	datamonth05yr	datamonth10yr	MMWR2
1	100515896	Closed	7-Jan-15	12-Jan-15	CAMP	CONFIRMED	7-Jan-15	SYMPTOM_ONSET_DATE	CT	YES		YES	NO	8	7-Jan-15		YES		NO				YES	7-Jan-15		No	64	USA	New Haven County	MALE	NO	WHITE	1/9/2015	Campylobacter	No	1/9/2015	2015	1	1.2	1.2	15_01
2	100516106		.	.			.							.	.									.			.						1/3/2015	Campylobacter	No	1/3/2015	2015	1	1	1.1111	
3	100516107	Closed	5-Jan-15	13-Jan-15	CAMP	CONFIRMED	5-Jan-15	LAB_TEST_DATE	CT	NO		YES	NO	.	.		NO		YES				YES	.			58	USA	Fairfield County	FEMALE	NO	WHITE	1/5/2015	Campylobacter	No	1/5/2015	2015	1	1	1.1	15_01
4	100516116	Closed	2-Jan-15	13-Jan-15	CAMP	CONFIRMED	2-Jan-15	LAB_TEST_DATE	CT	NO				.	.								UNKNOWN	.			62	USA	Fairfield County	MALE	NO	WHITE	1/2/2015	Campylobacter	Unknown	1/2/2015	2015	1	1	1	
5	100516476	Closed	2-Jan-15	15-Jan-15	CAMP	CONFIRMED	2-Jan-15	SYMPTOM_ONSET_DATE	CT	NO		YES	NO	5	2-Jan-15	am	NO		NO				YES	2-Jan-15	am		56	USA	New Haven County	FEMALE	NO	BLACK_AFRICAN_AMERICAN	1/7/2015	Campylobacter	No	1/7/2015	2015	1	1	1	15_01
6	100516481	Closed	5-Jan-15	15-Jan-15	CAMP	CONFIRMED	5-Jan-15	SYMPTOM_ONSET_DATE	CT	NO		YES	NO	2	5-Jan-15	9:00 PM	YES		NO				YES	5-Jan-15			31	USA	New Haven County	FEMALE	NO	WHITE	1/7/2015	Campylobacter	Yes	1/7/2015	2015	1	1	1	15_01
7	100516484	Closed	8-Jan-15	15-Jan-15	CAMP	CONFIRMED	8-Jan-15	LAB_TEST_DATE	CT	NO				.	.								UNKNOWN	.			57	USA	New London County	FEMALE	UNKNOWN	BLACK_AFRICAN_AMERICAN	1/8/2015	Campylobacter	Unknown	1/8/2015	2015	1	1	1	15_01
8	100516485	Closed	4-Jan-15	15-Jan-15	CAMP	CONFIRMED	4-Jan-15	SYMPTOM_ONSET_DATE	CT	YES		YES	NO	8	6-Jan-15		NO		NO				YES	4-Jan-15			61	USA	New Haven County	MALE	NO	WHITE	1/11/2015	Campylobacter	No	1/11/2015	2015	1	1	1	15_02
9	100516544	Closed	7-Jan-15	16-Jan-15	CAMP	CONFIRMED	7-Jan-15	LAB_TEST_DATE	CT	NO				.	.								UNKNOWN	.			94	USA	Hartford County	FEMALE	UNKNOWN	Unknown	1/7/2015	Campylobacter	Unknown	1/7/2015	2015	1	1	1	15_01

 

PaigeMiller
Diamond | Level 26

So as I understand things, you want to count cases in each month. All of these would count as cases in January 2015 except row 2?

 

Is it possible to have event_date different month from create_date different month from effective_From_date? In this case, which date do we use to determine counts in a month?

--
Paige Miller
jbrock
Fluorite | Level 6
?Yes, I would like to calculate the average of monthly case counts over the course of 5-years.


The resulting graph would show freqency of cases by month for 2019, with a 5-year average line (2015-2019) overlayed (if possible).


DtSpec2 is my anchor date for calculating frequencies.

PaigeMiller
Diamond | Level 26

UNTESTED CODE

 

/* First obtain counts by month */
proc freq data=have;
     table dtspec2/noprint out=_counts_;
     format dtspec2  yymm6.;
run;

/* Next get average of 5 years */
proc summary data=_counts_ (where=(year(dtspec2)>=2015 and year(dtspec2)<=2019));
    var count;
    output out=_mean_ mean=mean_count;
run;

/* Create macro variable */
data _null_;
    set _mean_;
    call symputx('mean_count',mean_count);
run;

/* Create plot */
proc gchart data=work.camp;
vbar datamonth/ discrete type=freq ref=&mean_count;
where datayear=2019;
run;
quit;


     

 

--
Paige Miller
jbrock
Fluorite | Level 6
Thanks again.


When I try that code, the resulting graph has a flat line running through all the points. It seems that one average value was created, which runs across all months (where as each month should have it's own average count over the span of 5 years).


Looking closer at the code, should the first section (obtain counts by month) reference "datamonth" rather than "dtspec2"? Then there would be 12 values in the resulting "counts" dataset.

PaigeMiller
Diamond | Level 26

Going back to the original problem statement

 

I would like to create a barchart showing case counts by month for 2019 (y-axis is frequency), with an average line overlayed on top. The line would represent the 5-year average case count by month.

 

This seems to refer to a single average and a single line, the word line does not appear as the plural word "lines". Now you are asking for different lines for each month? One line for January that just goes over the different January bars, and similarly a separate line for February and all the other months?

 

This would require a somewhat different approach, and although it would be possible in PROC GPLOT, you would have to use the ANNOTATE facility, and I no longer remember those details. Nor is it immediately obvious how to do this in the newer PROC SGPLOT. So at this time, I do not have an answer.

--
Paige Miller
DanH_sas
SAS Super FREQ

Doing this in SGPLOT, I would overlay a STEP plot on top of a VBARPARM.

PaigeMiller
Diamond | Level 26

Thinking outside the box, it would be relatively simple to produce dots for the average of each month over the last five years (one dot for the value in January, a different value for the dot for February, etc.) on the vertical bar plot, and then connect the dots, so the plot shows the bars and the sequence of dots. Even better yet, in my opinion, instead of bars for the monthly counts, you have one line (let's say red) that shows the counts by month, and another line (let's say blue) that shows the average counts in each month. This would visually get the information across and again in my opinion, would be a better plot layout than bars plus dots, or bars plus lines.

--
Paige Miller
jbrock
Fluorite | Level 6
?That sounds like a great strategy. Thanks for that suggestion. Could you possibly point me to a sample code for producing dots (representing average count per month over the course of 5-years) that I could pull into the chart?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 3092 views
  • 1 like
  • 3 in conversation