Hi - I would like a chart similar to the following, where the x-axis is shown in weeks (note the last bin with >53) (but data in days).
Here are my attempts so far:
1. Use SGPLOT with binwdith and xaxis when data is not restricted. Status: Bins are not for 7 days. Note that the data ranges from -377 to 277 in my example ( I need to work out how to set the seed ... ) An explanation of this would be great ...
2. Use SGPLOT restric the data from 0 to 365. Status: This gets me closer however I don't get the last bin ">53". Labels are also incorrect. Is this possible with SGPLOT without a custom format?
3. Try to create a custom format. Status: Doesn't format the data as expected
Test code is as follows:
/* Add data */
data Hist(drop=i);
label T = "Days";
call streaminit(1);
do i = 1 to 100;
T = rand("Normal", 120, 60); /* normal with mean 120 */
T = round(T, 1); /* round to nearest day */
output;
end;
run;
proc means data=hist;
run;
Title 'Data not restricted';
/*Attempt One: Use SGPLOT with no data restriction / bins not correct */
proc sgplot data=hist;
histogram T / binwidth=7 binstart=0 showbins;
xaxis values=(0 to 365 by 7);
run;
/*Attempt Two: When data is restricted, bins are correct*/
Title "Data restricted to 0-365";
proc sgplot data=hist(where=(0<=T<=365));
histogram T / binwidth=7 binstart=0 showbins;
xaxis values=(0 to 365 by 7);
run;
/*Start of attempt Three: Use a custom format - not yet working*/
data dayToWeek;
attrib fmtname informat=$10.
eexcl informat=$1.
hlo informat=$ 1.
label informat=$3.;
drop i;
retain fmtname '$dayToWeek' EEXCL 'Y' HLO '';
do i = 0 to 51;
start = i*7;
end = (i+1)*7;
label = i;
output;
end;
start=364;
label = '52+';
HLO = 'H';
output;
run;
proc format cntlin=dayToWeek;
run;
proc freq data=hist;
tables T;
format T daytoweek.;
run;
I ended up using the following code ...
/*Create Format*/
data dayToWeek;
attrib fmtname informat=$10.
hlo informat=$1.
label informat=$3.;
drop i;
retain fmtname 'dayToWeek' SEXCL 'N' EEXCL 'Y' HLO '' TYPE 'N' ;
do i = 0 to 51;
start = i*7;
end = (i+1)*7;
label = i;
output;
end;
start=364;
label = '52+';
HLO = 'H';
EEXCL = 'N';
output;
run;
proc format cntlin=dayToWeek;
run;
/*see all formats*/
PROC FORMAT CNTLOUT=fmtout;
RUN;
PROC PRINT DATA=fmtout;
RUN;
/*Create fake data*/
data Hist(drop=i);
label T = "Days";
call streaminit(1);
do i = 1 to 100;
T = rand("Normal", 220, 90); /* normal with mean 220 */
T = round(T, 1); /* round to nearest day */
output;
end;
run;
proc freq data=hist(where=(T>0)) NOPRINT;
tables T / out=work.hist_bin;
format T dayToWeek.;
run;
title 'Length of Travel';
proc sgplot data=work.hist_bin;
vbar T / response=percent;
xaxis label="Duration (Weeks)";
yaxis label="Percent";
run;
NB. Found this reference extremely handy: Creating a Format from Raw Data or a SAS® Dataset
How about using the WEEK function to create the week number out of a date and then use the VBAR statement of SGPLOT.
Below is a code sample.
/* Add data */
data Hist(drop=i);
label T = "Days";
call streaminit(1);
do i = 1 to 100;
T = rand("Normal", 120, 60); /* normal with mean 120 */
T = round(T, 1); /* round to nearest day */
week = week(t, "V");
output;
end;
run;
proc sgplot data=hist;
vbar week / stat=percent;
xaxis type=linear
min=1 max=53
values=( 4 to 52 by 4 )
valueshint
;
run;
Bruno
Thanks Bruno, that is almost there (and much simpler). Do you know how to get the last bin of > 53 as well? Do you also know my methods 1 and 3 didn't work?
For method 3, the format name used starts with a $ , so this creates a character format, where as your data is numeric. Also the format would only work for positive numbers. Also the logic would only cover a specific year.
I do not know of an easy way to label week 53, but there are only week numbers from 1 to 53. So week 53 is nothing special
Bruno
The real data is for duration in days and only has values >= 0. The last bin is supposed to be for all durations of 53 weeks and greater, i.e. durations of 53, 54, 55, 56, ... weeks.
Once you have a variable with the week number then you can use a custom format such as :
proc format;
value weeknum
53-high = '>53'
;
run;
Then use that format with the summary, if you presummarize data before the graph, and graph procedure associated with the week variable. The format will create the appropriate bin.
I ended up using the following code ...
/*Create Format*/
data dayToWeek;
attrib fmtname informat=$10.
hlo informat=$1.
label informat=$3.;
drop i;
retain fmtname 'dayToWeek' SEXCL 'N' EEXCL 'Y' HLO '' TYPE 'N' ;
do i = 0 to 51;
start = i*7;
end = (i+1)*7;
label = i;
output;
end;
start=364;
label = '52+';
HLO = 'H';
EEXCL = 'N';
output;
run;
proc format cntlin=dayToWeek;
run;
/*see all formats*/
PROC FORMAT CNTLOUT=fmtout;
RUN;
PROC PRINT DATA=fmtout;
RUN;
/*Create fake data*/
data Hist(drop=i);
label T = "Days";
call streaminit(1);
do i = 1 to 100;
T = rand("Normal", 220, 90); /* normal with mean 220 */
T = round(T, 1); /* round to nearest day */
output;
end;
run;
proc freq data=hist(where=(T>0)) NOPRINT;
tables T / out=work.hist_bin;
format T dayToWeek.;
run;
title 'Length of Travel';
proc sgplot data=work.hist_bin;
vbar T / response=percent;
xaxis label="Duration (Weeks)";
yaxis label="Percent";
run;
NB. Found this reference extremely handy: Creating a Format from Raw Data or a SAS® Dataset
I was thinking again about the labeling of the x axis to 52 and >=53, so I found a way of doing this using the VALUESFORMAT= option on the XAXIS statement.
I also had a closer look at your orginal graph and noticed that there is actually a bar between 52 and >=53
Anyway find below the code sample for the x axis labeling as it was in the original graph
/*Create Format*/
data dayToWeek;
attrib
fmtname informat=$10.
hlo informat=$1.
label informat=$3.
;
drop i;
retain
fmtname 'dayToWeek' SEXCL 'Y' EEXCL 'N' HLO '' TYPE 'N'
;
do i = 0 to 52;
start = i*7;
end = (i+1)*7;
label = put(i, z2.);
output;
end;
start=371;
label = '53';
HLO = 'H';
EEXCL = 'N';
output;
run;
proc format cntlin=dayToWeek;
run;
/*Create fake data*/
data Hist(drop=i);
label T = "Days";
call streaminit(1);
do i = 1 to 1000;
T = rand("Normal", 120, 90); /* normal with mean 220 */
T = round(T, 1); /* round to nearest day */
output;
end;
run;
proc freq data=hist( where = ( T>0 )) NOPRINT;
tables T / out=work.hist_bin;
format T dayToWeek.;
run;
*
* convert the bin week to the real week number
*;
data hist_bin2;
set hist_bin;
t2 = input( vvalue(t), 8.);
run;
*
* create an all weeks data set
*;
data allWeeks;
do t2 = 1 to 53;
output;
end;
run;
*
* merge data and all weeks
* convert t to char
*;
data hist_bin_all;
merge hist_bin2 allWeeks;
by t2;
tc = put(t2, z2.);
run;
*
* create a format for labeling the x axis
*;
proc format;
value $tc_display
"04" = "4"
"08" = "8"
"12" = "12"
"16" = "16"
"20" = "20"
"24" = "24"
"28" = "28"
"32" = "32"
"36" = "36"
"40" = "40"
"44" = "44"
"48" = "48"
"52" = "52"
"53" = "(*ESC*){unicode '2265'x}53"
other = " "
;
run;
title 'Length of Travel';
ods graphics / width=1600 height=800;
proc sgplot data=work.hist_bin_all;
vbar Tc / response=percent;
xaxis
label="Duration (Weeks)"
valuesformat=$tc_display.
;
yaxis label="Percent";
run;
Bruno
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.