Hi,
I am trying to create a scatterplot with my variables rNPS and pNPS.
Unfortunately, the generated scatterplot looks very strange and I can't explain why.
This is my code and the output scatterplot:
ODS Graphics on;
proc sgplot data=xwrk.nps_ebenen_vollständig;
scatter x=pnps y=rnps;
run;
I already checked the correlation between those two variables and there is a positive and significant correlation.
Has anyone already had this problem and can help me?
Thanks a lot!
Your data in both variables rNPS and pNPS take on only integer values. This is why you get the appearance shown. In addition, the correlation doesn't appear visually because you may have 1000 data points at a given position on the plot, all are shown exactly on top of one another (because they are all integers), making it look as if there is only a single data point on the plot.
A potential improvement is to use the JITTER option and the JITTERWIDTH option in the SCATTER statement in PROC SGPLOT.
Try this:
proc sgplot data=xwrk.nps_ebenen_vollständig;
scatter x=pnps y=rnps / jitter;
run;
The JITTER option will add a small random offset to the X and Y variables before plotting the marker. (It doesn't change your data, only the plot.)
with the jitter option I can now see small data points. I actually wanted to use the scatterplot to show a linear relationship between rNPS and pNPS. But due to the integer values this is a bit confusing in the scatterplot. Do you have any suggestions as to what I can do to show the linear relationship graphically?
To me, this does show the relationship. There is the most "ink" on the diagonal, and much less off-diagonal.
But, this is a limitation of plotting data like this which are all integers; the linear relationship can be hard to see.
Alright, thank you 🙂
Is it possible to change the colour of the data points to see the relationship a little better?
Yes of course you can change the color, but may I recommend the TRANSPARENCY= option? This will make spots with lots of data darker than spots with a little bit of data. I don't know how it will work with integer data, you would still have to use the JITTER option, but please give it a try.
And try it without the JITTER option as well, plotting integer data this way is something I have never tried, and so trying a bunch of different variations of the plot might turn up one that you really like.
If you want a crude use of color, you can overlay the curve and regression line on a heat map of the density of the data. I don't have access to your data, but it might look something like this:
%let DSName = xwrk.nps_ebenen_vollständig;
%let XVar = pNPS;
%let YVar = rNPS;
proc sgplot data=&DSName;
heatmap x=&XVar y=&YVar / xbinsize=1 ybinsize=1 colormodel=TwoColorRamp;
reg x=&XVar y=&YVar / jitter;
run;
Thank you so much, guys!
That helped a lot 😊
Use the REG statement:
proc sgplot data=xwrk.nps_ebenen_vollständig;
reg x=pnps y=rnps / jitter;
run;
Do you have additional questions? If not, please mark this thread as SOLVED.
Depending on your data, I've found bubble plots helpful to show that some points have more values than others.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.