Hi there
I want to illustrate the effects of different local regressions. To do it, i generated some random data like this:
data ex;
call streaminit(12345);
do i = 1 to 250;
x = rand('lognormal',0.6,0.5);
y = 3*sin(x/2)+1 + rand('normal',0,0.5);
output;
end;
run;
So, the data points simulated are pairs of (x,y), where the underlying DGP is given by y. The reason i draw x from a lognormal is just because i want only a few observations in one part and many observations in another part of the x-axis (relevant for the context i am using the plot in).
Now, my problem is:
I would like to make a scatterplot with the generated observations, and at the same time plot the deterministic part of the underlying DGP [name it z = 3*sin(x/2)+1]. Is it possible to do this in for example a proc sgplot without having to simulate a lot of (x,z) pairs?
2 y values such as
data ex; call streaminit(12345); do i = 1 to 250; x = rand('lognormal',0.6,0.5); y1 = 3*sin(x/2)+1 + rand('normal',0,0.5); y2 = 3*sin(x/2)+1; output; end; run;
In SGPLOT have two scatter statments using the two y values. Use a plot option to make the markers different color/shape/size. An appropriate label for the Y variables would be a good idea for the legend.
I don't think you can plot a curve without simulating the data.
@Rick_SAS has a great blog post out today on the LINEPARM statement which allows you to to draw a line without simulating data (you specify slope and intercept), but I think to draw y = 3*sin(x/2)+1 without simulating the data, you would need some sort of CURVEPARM statement (FUNCTIONPARM?) that allows you to specify an arbitrary function like you would on a graphing calculator. I don't think that exists in SGPLOT, SGANNO, or GTL.
I also have had times when I wanted to just overlay a function without simulating the data myself. So I'd be happy if I'm wrong.
To build on Ballardw's response, you can sort the data by X and then overlay the SERIES statement to plot the underlying deterministic model. Alternatively, generate an evenly spaced set of points in a separate data set and concatenate the two data sets, as shown in the article "How to overlay custom curves with PROC SGPLOT."
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.