Hi there
I want to illustrate the effects of different local regressions. To do it, i generated some random data like this:
data ex;
call streaminit(12345);
do i = 1 to 250;
x = rand('lognormal',0.6,0.5);
y = 3*sin(x/2)+1 + rand('normal',0,0.5);
output;
end;
run;
So, the data points simulated are pairs of (x,y), where the underlying DGP is given by y. The reason i draw x from a lognormal is just because i want only a few observations in one part and many observations in another part of the x-axis (relevant for the context i am using the plot in).
Now, my problem is:
I would like to make a scatterplot with the generated observations, and at the same time plot the deterministic part of the underlying DGP [name it z = 3*sin(x/2)+1]. Is it possible to do this in for example a proc sgplot without having to simulate a lot of (x,z) pairs?
2 y values such as
data ex; call streaminit(12345); do i = 1 to 250; x = rand('lognormal',0.6,0.5); y1 = 3*sin(x/2)+1 + rand('normal',0,0.5); y2 = 3*sin(x/2)+1; output; end; run;
In SGPLOT have two scatter statments using the two y values. Use a plot option to make the markers different color/shape/size. An appropriate label for the Y variables would be a good idea for the legend.
I don't think you can plot a curve without simulating the data.
@Rick_SAS has a great blog post out today on the LINEPARM statement which allows you to to draw a line without simulating data (you specify slope and intercept), but I think to draw y = 3*sin(x/2)+1 without simulating the data, you would need some sort of CURVEPARM statement (FUNCTIONPARM?) that allows you to specify an arbitrary function like you would on a graphing calculator. I don't think that exists in SGPLOT, SGANNO, or GTL.
I also have had times when I wanted to just overlay a function without simulating the data myself. So I'd be happy if I'm wrong.
To build on Ballardw's response, you can sort the data by X and then overlay the SERIES statement to plot the underlying deterministic model. Alternatively, generate an evenly spaced set of points in a separate data set and concatenate the two data sets, as shown in the article "How to overlay custom curves with PROC SGPLOT."
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.