Help using Base SAS procedures

Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 78
Accepted Solution

Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

Hi - I would like a chart similar to the following, where the x-axis is shown in weeks (note the last bin with >53) (but data in days).  

Capture.PNG

 

Here are my attempts so far:

1. Use SGPLOT with binwdith and xaxis when data is not restricted.  Status: Bins are not for 7 days.  Note that the data ranges from -377 to 277 in my example ( I need to work out how to set the seed ... )  An explanation of this would be great ... 

 

2. Use SGPLOT restric the data from 0 to 365.  Status: This gets me closer however I don't get the last bin ">53".  Labels are also incorrect. Is this possible with SGPLOT without a custom format?

 

3.  Try to create a custom format.  Status: Doesn't format the data as expected

 

Test code is as follows:

/* Add data */
data Hist(drop=i);
label T = "Days";
call streaminit(1);
do i = 1 to 100;
   T = rand("Normal", 120, 60); /* normal with mean 120  */
   T = round(T, 1);             /* round to nearest day */
   output;
end;
run;
proc means data=hist;
run;
Title 'Data not restricted';
/*Attempt One: Use SGPLOT with no data restriction / bins not correct */
proc sgplot data=hist;
histogram T / binwidth=7 binstart=0 showbins;
xaxis values=(0 to 365 by 7);
run;
/*Attempt Two: When data is restricted, bins are correct*/
Title "Data restricted to 0-365";
proc sgplot data=hist(where=(0<=T<=365));
histogram T / binwidth=7 binstart=0 showbins;
xaxis values=(0 to 365 by 7);
run;
/*Start of attempt Three: Use a custom format - not yet working*/
data dayToWeek;
attrib fmtname informat=$10. 
		eexcl informat=$1. 
		hlo informat=$ 1.
		label informat=$3.;
  drop i;
  retain fmtname '$dayToWeek' EEXCL 'Y' HLO '';

do i = 0 to 51;
  start = i*7;
  end = (i+1)*7;
  label = i;
  output;
  end;

  start=364;
  label = '52+';
  HLO = 'H';
	output; 
run;

proc format cntlin=dayToWeek; 
run;

proc freq data=hist;
tables T;
format T daytoweek.;
run;

 


Accepted Solutions
Solution
‎03-03-2016 08:46 PM
Frequent Contributor
Posts: 78

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

I ended up using the following code ... 

 

/*Create Format*/
data dayToWeek;
attrib fmtname informat=$10.
hlo informat=$1.
label informat=$3.;
drop i;
retain fmtname 'dayToWeek' SEXCL 'N' EEXCL 'Y' HLO '' TYPE 'N' ;

do i = 0 to 51;
start = i*7;
end = (i+1)*7;
label = i;
output;
end;

start=364;
label = '52+';
HLO = 'H';
EEXCL = 'N';
output;
run;
proc format cntlin=dayToWeek;
run;

/*see all formats*/
PROC FORMAT CNTLOUT=fmtout;
RUN;
PROC PRINT DATA=fmtout;
RUN;

/*Create fake data*/
data Hist(drop=i);
label T = "Days";
call streaminit(1);
do i = 1 to 100;
T = rand("Normal", 220, 90); /* normal with mean 220 */
T = round(T, 1); /* round to nearest day */
output;
end;
run;

proc freq data=hist(where=(T>0)) NOPRINT;
tables T / out=work.hist_bin;
format T dayToWeek.;
run;

title 'Length of Travel';
proc sgplot data=work.hist_bin;
vbar T / response=percent;
xaxis label="Duration (Weeks)";
yaxis label="Percent";
run;

NB. Found this reference extremely handy: Creating a Format from Raw Data or a SAS® Dataset 

View solution in original post


All Replies
SAS Super FREQ
Posts: 698

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

How about using the WEEK function to create the week number out of a date and then use the VBAR statement of SGPLOT.

 

Below is a code sample.

/* Add data */
data Hist(drop=i);
  label T = "Days";
  call streaminit(1);

  do i = 1 to 100;
    T = rand("Normal", 120, 60); /* normal with mean 120  */
    T = round(T, 1);             /* round to nearest day */
    week = week(t, "V");
    output;
  end;
run;

proc sgplot data=hist;
  vbar week / stat=percent;
  xaxis type=linear 
    min=1 max=53
    values=( 4 to 52 by 4 )
    valueshint
  ;
run;

Bruno

 

 

Frequent Contributor
Posts: 78

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

Thanks Bruno, that is almost there (and much simpler).  Do you know how to get the last bin of > 53 as well?  Do you also know my methods 1 and 3 didn't work?

SAS Super FREQ
Posts: 698

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

For method 3, the format name used starts with a $ , so this creates a character format, where as your data is numeric. Also the format would only work for positive numbers. Also the logic would only cover a specific year.

 

I do not know of an easy way to label week 53, but there are only week numbers from 1 to 53. So week 53 is nothing special

 

Bruno

Frequent Contributor
Posts: 78

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

The real data is for duration in days and only has values >= 0.  The last bin is supposed to be for all durations of 53 weeks and greater, i.e. durations of 53, 54, 55, 56, ... weeks.  

Super User
Posts: 10,833

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

Once you have a variable with the week number then you can use a custom format such as :

proc format;
   value weeknum
   53-high = '>53'
   ;
run;

Then use that format with the summary, if you presummarize data before the graph, and graph procedure associated with the week variable. The format will create the appropriate bin.

 

Solution
‎03-03-2016 08:46 PM
Frequent Contributor
Posts: 78

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

I ended up using the following code ... 

 

/*Create Format*/
data dayToWeek;
attrib fmtname informat=$10.
hlo informat=$1.
label informat=$3.;
drop i;
retain fmtname 'dayToWeek' SEXCL 'N' EEXCL 'Y' HLO '' TYPE 'N' ;

do i = 0 to 51;
start = i*7;
end = (i+1)*7;
label = i;
output;
end;

start=364;
label = '52+';
HLO = 'H';
EEXCL = 'N';
output;
run;
proc format cntlin=dayToWeek;
run;

/*see all formats*/
PROC FORMAT CNTLOUT=fmtout;
RUN;
PROC PRINT DATA=fmtout;
RUN;

/*Create fake data*/
data Hist(drop=i);
label T = "Days";
call streaminit(1);
do i = 1 to 100;
T = rand("Normal", 220, 90); /* normal with mean 220 */
T = round(T, 1); /* round to nearest day */
output;
end;
run;

proc freq data=hist(where=(T>0)) NOPRINT;
tables T / out=work.hist_bin;
format T dayToWeek.;
run;

title 'Length of Travel';
proc sgplot data=work.hist_bin;
vbar T / response=percent;
xaxis label="Duration (Weeks)";
yaxis label="Percent";
run;

NB. Found this reference extremely handy: Creating a Format from Raw Data or a SAS® Dataset 

SAS Super FREQ
Posts: 698

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

I was thinking again about the labeling of the x axis to 52 and >=53, so I found a way of doing this using the VALUESFORMAT= option on the XAXIS statement.

 

I also had a closer look at your orginal graph and noticed that there is actually a bar between 52 and >=53

 

Anyway find below the code sample for the x axis labeling as it was in the original graph

 

/*Create Format*/
data dayToWeek;
  attrib
    fmtname informat=$10.
    hlo informat=$1.
    label informat=$3.
  ;
  drop i;
  retain
    fmtname 'dayToWeek' SEXCL 'Y' EEXCL 'N' HLO '' TYPE 'N'
  ;

  do i = 0 to 52;
    start = i*7;
    end = (i+1)*7;

    label = put(i, z2.);
    output;
  end;

  start=371;
  label = '53';
  HLO = 'H';
  EEXCL = 'N';
  output;
run;

proc format cntlin=dayToWeek;
run;

/*Create fake data*/
data Hist(drop=i);
  label T = "Days";
  call streaminit(1);

  do i = 1 to 1000;
    T = rand("Normal", 120, 90); /* normal with mean 220 */
    T = round(T, 1); /* round to nearest day */
    output;
  end;
run;

proc freq data=hist( where = ( T>0 )) NOPRINT;
  tables T / out=work.hist_bin;
  format T dayToWeek.;
run;

*
* convert the bin week to the real week number
*;
data hist_bin2;
  set hist_bin;
  t2 = input( vvalue(t), 8.);
run;

*
* create an all weeks data set
*;
data allWeeks;
  do t2 = 1 to 53;
    output;
  end;
run;

*
* merge data and all weeks
* convert t to char
*;
data hist_bin_all;
  merge hist_bin2 allWeeks;
  by t2;
  tc = put(t2, z2.);
run;

*
* create a format for labeling the x axis
*;
proc format;
  value $tc_display
    "04" = "4"
    "08" = "8"
    "12" = "12"
    "16" = "16"
    "20" = "20"
    "24" = "24"
    "28" = "28"
    "32" = "32"
    "36" = "36"
    "40" = "40"
    "44" = "44"
    "48" = "48"
    "52" = "52"
    "53" = "(*ESC*){unicode '2265'x}53"
    other = " "
  ;
run;

title 'Length of Travel';
ods graphics / width=1600 height=800;

proc sgplot data=work.hist_bin_all;
  vbar Tc / response=percent;
  xaxis  
    label="Duration (Weeks)"
    valuesformat=$tc_display. 
  ;
  yaxis label="Percent";
run;

Bruno

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 593 views
  • 1 like
  • 3 in conversation