Help using Base SAS procedures

Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 85
Accepted Solution

Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

Hi - I would like a chart similar to the following, where the x-axis is shown in weeks (note the last bin with >53) (but data in days).  

Capture.PNG

 

Here are my attempts so far:

1. Use SGPLOT with binwdith and xaxis when data is not restricted.  Status: Bins are not for 7 days.  Note that the data ranges from -377 to 277 in my example ( I need to work out how to set the seed ... )  An explanation of this would be great ... 

 

2. Use SGPLOT restric the data from 0 to 365.  Status: This gets me closer however I don't get the last bin ">53".  Labels are also incorrect. Is this possible with SGPLOT without a custom format?

 

3.  Try to create a custom format.  Status: Doesn't format the data as expected

 

Test code is as follows:

/* Add data */
data Hist(drop=i);
label T = "Days";
call streaminit(1);
do i = 1 to 100;
   T = rand("Normal", 120, 60); /* normal with mean 120  */
   T = round(T, 1);             /* round to nearest day */
   output;
end;
run;
proc means data=hist;
run;
Title 'Data not restricted';
/*Attempt One: Use SGPLOT with no data restriction / bins not correct */
proc sgplot data=hist;
histogram T / binwidth=7 binstart=0 showbins;
xaxis values=(0 to 365 by 7);
run;
/*Attempt Two: When data is restricted, bins are correct*/
Title "Data restricted to 0-365";
proc sgplot data=hist(where=(0<=T<=365));
histogram T / binwidth=7 binstart=0 showbins;
xaxis values=(0 to 365 by 7);
run;
/*Start of attempt Three: Use a custom format - not yet working*/
data dayToWeek;
attrib fmtname informat=$10. 
		eexcl informat=$1. 
		hlo informat=$ 1.
		label informat=$3.;
  drop i;
  retain fmtname '$dayToWeek' EEXCL 'Y' HLO '';

do i = 0 to 51;
  start = i*7;
  end = (i+1)*7;
  label = i;
  output;
  end;

  start=364;
  label = '52+';
  HLO = 'H';
	output; 
run;

proc format cntlin=dayToWeek; 
run;

proc freq data=hist;
tables T;
format T daytoweek.;
run;

 


Accepted Solutions
Solution
‎03-03-2016 08:46 PM
Frequent Contributor
Posts: 85

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

I ended up using the following code ... 

 

/*Create Format*/
data dayToWeek;
attrib fmtname informat=$10.
hlo informat=$1.
label informat=$3.;
drop i;
retain fmtname 'dayToWeek' SEXCL 'N' EEXCL 'Y' HLO '' TYPE 'N' ;

do i = 0 to 51;
start = i*7;
end = (i+1)*7;
label = i;
output;
end;

start=364;
label = '52+';
HLO = 'H';
EEXCL = 'N';
output;
run;
proc format cntlin=dayToWeek;
run;

/*see all formats*/
PROC FORMAT CNTLOUT=fmtout;
RUN;
PROC PRINT DATA=fmtout;
RUN;

/*Create fake data*/
data Hist(drop=i);
label T = "Days";
call streaminit(1);
do i = 1 to 100;
T = rand("Normal", 220, 90); /* normal with mean 220 */
T = round(T, 1); /* round to nearest day */
output;
end;
run;

proc freq data=hist(where=(T>0)) NOPRINT;
tables T / out=work.hist_bin;
format T dayToWeek.;
run;

title 'Length of Travel';
proc sgplot data=work.hist_bin;
vbar T / response=percent;
xaxis label="Duration (Weeks)";
yaxis label="Percent";
run;

NB. Found this reference extremely handy: Creating a Format from Raw Data or a SAS® Dataset 

View solution in original post


All Replies
SAS Super FREQ
Posts: 820

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

How about using the WEEK function to create the week number out of a date and then use the VBAR statement of SGPLOT.

 

Below is a code sample.

/* Add data */
data Hist(drop=i);
  label T = "Days";
  call streaminit(1);

  do i = 1 to 100;
    T = rand("Normal", 120, 60); /* normal with mean 120  */
    T = round(T, 1);             /* round to nearest day */
    week = week(t, "V");
    output;
  end;
run;

proc sgplot data=hist;
  vbar week / stat=percent;
  xaxis type=linear 
    min=1 max=53
    values=( 4 to 52 by 4 )
    valueshint
  ;
run;

Bruno

 

 

Frequent Contributor
Posts: 85

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

Posted in reply to Bruno_SAS

Thanks Bruno, that is almost there (and much simpler).  Do you know how to get the last bin of > 53 as well?  Do you also know my methods 1 and 3 didn't work?

SAS Super FREQ
Posts: 820

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

For method 3, the format name used starts with a $ , so this creates a character format, where as your data is numeric. Also the format would only work for positive numbers. Also the logic would only cover a specific year.

 

I do not know of an easy way to label week 53, but there are only week numbers from 1 to 53. So week 53 is nothing special

 

Bruno

Frequent Contributor
Posts: 85

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

Posted in reply to Bruno_SAS

The real data is for duration in days and only has values >= 0.  The last bin is supposed to be for all durations of 53 weeks and greater, i.e. durations of 53, 54, 55, 56, ... weeks.  

Super User
Posts: 13,517

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

Once you have a variable with the week number then you can use a custom format such as :

proc format;
   value weeknum
   53-high = '>53'
   ;
run;

Then use that format with the summary, if you presummarize data before the graph, and graph procedure associated with the week variable. The format will create the appropriate bin.

 

Solution
‎03-03-2016 08:46 PM
Frequent Contributor
Posts: 85

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

I ended up using the following code ... 

 

/*Create Format*/
data dayToWeek;
attrib fmtname informat=$10.
hlo informat=$1.
label informat=$3.;
drop i;
retain fmtname 'dayToWeek' SEXCL 'N' EEXCL 'Y' HLO '' TYPE 'N' ;

do i = 0 to 51;
start = i*7;
end = (i+1)*7;
label = i;
output;
end;

start=364;
label = '52+';
HLO = 'H';
EEXCL = 'N';
output;
run;
proc format cntlin=dayToWeek;
run;

/*see all formats*/
PROC FORMAT CNTLOUT=fmtout;
RUN;
PROC PRINT DATA=fmtout;
RUN;

/*Create fake data*/
data Hist(drop=i);
label T = "Days";
call streaminit(1);
do i = 1 to 100;
T = rand("Normal", 220, 90); /* normal with mean 220 */
T = round(T, 1); /* round to nearest day */
output;
end;
run;

proc freq data=hist(where=(T>0)) NOPRINT;
tables T / out=work.hist_bin;
format T dayToWeek.;
run;

title 'Length of Travel';
proc sgplot data=work.hist_bin;
vbar T / response=percent;
xaxis label="Duration (Weeks)";
yaxis label="Percent";
run;

NB. Found this reference extremely handy: Creating a Format from Raw Data or a SAS® Dataset 

SAS Super FREQ
Posts: 820

Re: Histogram with days formatted as weeks (and one bin per week) and help with PROC FORMAT

I was thinking again about the labeling of the x axis to 52 and >=53, so I found a way of doing this using the VALUESFORMAT= option on the XAXIS statement.

 

I also had a closer look at your orginal graph and noticed that there is actually a bar between 52 and >=53

 

Anyway find below the code sample for the x axis labeling as it was in the original graph

 

/*Create Format*/
data dayToWeek;
  attrib
    fmtname informat=$10.
    hlo informat=$1.
    label informat=$3.
  ;
  drop i;
  retain
    fmtname 'dayToWeek' SEXCL 'Y' EEXCL 'N' HLO '' TYPE 'N'
  ;

  do i = 0 to 52;
    start = i*7;
    end = (i+1)*7;

    label = put(i, z2.);
    output;
  end;

  start=371;
  label = '53';
  HLO = 'H';
  EEXCL = 'N';
  output;
run;

proc format cntlin=dayToWeek;
run;

/*Create fake data*/
data Hist(drop=i);
  label T = "Days";
  call streaminit(1);

  do i = 1 to 1000;
    T = rand("Normal", 120, 90); /* normal with mean 220 */
    T = round(T, 1); /* round to nearest day */
    output;
  end;
run;

proc freq data=hist( where = ( T>0 )) NOPRINT;
  tables T / out=work.hist_bin;
  format T dayToWeek.;
run;

*
* convert the bin week to the real week number
*;
data hist_bin2;
  set hist_bin;
  t2 = input( vvalue(t), 8.);
run;

*
* create an all weeks data set
*;
data allWeeks;
  do t2 = 1 to 53;
    output;
  end;
run;

*
* merge data and all weeks
* convert t to char
*;
data hist_bin_all;
  merge hist_bin2 allWeeks;
  by t2;
  tc = put(t2, z2.);
run;

*
* create a format for labeling the x axis
*;
proc format;
  value $tc_display
    "04" = "4"
    "08" = "8"
    "12" = "12"
    "16" = "16"
    "20" = "20"
    "24" = "24"
    "28" = "28"
    "32" = "32"
    "36" = "36"
    "40" = "40"
    "44" = "44"
    "48" = "48"
    "52" = "52"
    "53" = "(*ESC*){unicode '2265'x}53"
    other = " "
  ;
run;

title 'Length of Travel';
ods graphics / width=1600 height=800;

proc sgplot data=work.hist_bin_all;
  vbar Tc / response=percent;
  xaxis  
    label="Duration (Weeks)"
    valuesformat=$tc_display. 
  ;
  yaxis label="Percent";
run;

Bruno

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 750 views
  • 1 like
  • 3 in conversation