Hi everyone,
I have noticed some weird behavior with the SGPLOT procedure and I'm hoping someone can provide some information why it is behaving the way it is.
In short, I've found two things
Even if you use a format statement in SGPLOT the unformatted values are plotted
Axistable values use determined using the formatted values but are plotted to the unformatted values
My conclusion is that the SGPLOT procedure first plots the unformatted values then assigns the formats to them. Is this correct?
I created a toy example to show you what I mean. In the test dataset i have two y variables. y1 ranges from 0 to 1 by .1 and y2 is taking y1 and rounding it to the nearest 0 or 1 (with the use of a format instead of round function).
data test;
input x y1 ylab $;
datalines;
0 0 zero
1 .1 one
2 .2 two
3 .3 three
4 .4 four
5 .5 five
6 .6 six
7 .7 seven
8 .8 eight
9 .9 nine
10 1 ten
;
data test;
set test;
y2=input(put(y1,1.),1.)); *round values to nearest whole number 0 or 1;
run;
As we would expect if we do a basic scatter plot we get the expected result.
proc sgplot data=test;
scatter x=x y=y1;
xaxis values=(0 to 10 by 1);
yaxis values=(0 to 1 by .1);
run;
Now here is where my first question comes in. If we apply a 1. format to y1 then any y1 values less than 0.5 will be formated to 0 and everything greater than or equal to 0.5 will be formatted to a value of 1. This is confirmed if you look at variable y2 in the test dataset.
However if I run the sgplot procedure with a format statement it does not plot the formatted values. Why? (Question 1)
proc sgplot data=test;
scatter x=x y=y1;
xaxis values=(0 to 10 by 1);
yaxis values=(0 to 1 by .1);
format y1 1.;
run;
As you can see the unformatted value of y1 is plotted instead of the formatted value. Is this the typical behavior?
On the other hand if I plot y2 (which is pre-formatted in the dataset) then I get the expected plot. Should we get the same plot if we use the 'format y1 1.' statement?
proc sgplot data=test;
scatter x=x y=y2;
xaxis values=(0 to 10 by 1);
yaxis values=(0 to 1 by .1);
run;
Ok thats my first question. Now to my second observation. I will assign the ylab variable to the y variable.
proc sgplot data=test;
scatter x=x y=y1;
xaxis values=(0 to 10 by 1);
yaxis values=(0 to 1 by .1);
yaxistable ylab / position=right;
run;
As expected each y1 value gets assigned the ylab text.
Now here is the weird part. If I add the format statement to the SGPLOT procedure then only two yaxistable labels are assigned (zero and five).
proc sgplot data=test;
scatter x=x y=y1;
xaxis values=(0 to 10 by 1);
yaxis values=(0 to 1 by .1);
yaxistable ylab / position=right;
format y1 1.;
run;
Based on my understanding of how axistable work, if there are y-axis values with the same value then the axistable will only assign the first label it encounters. If we look at the formatted value of y1 we see that 'zero' label is first value for y1=0 and 'five' is the value for y1=1. But if you look at the placement of the ylabels is it placing them next to the y-values of 0 and 0.5 (the unformatted values).
x
y1 formatted
ylab
0
0
zero
1
0
one
2
0
two
3
0
three
4
0
four
5
1
five
6
1
six
7
1
seven
8
1
eight
9
1
nine
10
1
ten
Ok so I get why it is using the 'zero' and 'five' but why is it placing them at the unformatted values of y1? It would seem to me that the proc is using the formatted values of y1 to determine the appropriate label to use but plotted the labels at the unformatted value.
Lastly if we use plot the pre-formatted value of y2 then the scatter plot is behaving as I would expect (in my mind). It is using the expected labels and plotting them at the correct y-values.
proc sgplot data=test;
scatter x=x y=y2;
xaxis values=(0 to 10 by 1);
yaxis values=(0 to 1 by .1);
yaxistable ylab / position=right;
run;
... View more