## Plotting the underlying DGP in a scatterplot of simulated data

Hi there

I want to illustrate the effects of different local regressions. To do it, i generated some random data like this:

``````data ex;
call streaminit(12345);
do i = 1 to 250;
x = rand('lognormal',0.6,0.5);
y = 3*sin(x/2)+1 + rand('normal',0,0.5);
output;
end;
run;``````

So, the data points simulated are pairs of (x,y), where the underlying DGP is given by y. The reason i draw x from a lognormal is just because i want only a few observations in one part and many observations in another part of the x-axis (relevant for the context i am using the plot in).

Now, my problem is:

I would like to make a scatterplot with the generated observations, and at the same time plot the deterministic part of the underlying DGP [name it z = 3*sin(x/2)+1]. Is it possible to do this in for example a proc sgplot without having to simulate a lot of (x,z) pairs?

3 REPLIES 3

## Re: Plotting the underlying DGP in a scatterplot of simulated data

2 y values such as

```data ex;
call streaminit(12345);
do i = 1 to 250;
x = rand('lognormal',0.6,0.5);
y1 = 3*sin(x/2)+1 + rand('normal',0,0.5);
y2 =   3*sin(x/2)+1;
output;
end;
run;```

In SGPLOT have two scatter statments using the two y values. Use a plot option to make the markers different color/shape/size. An appropriate label for the Y variables would be a good idea for the legend.

## Re: Plotting the underlying DGP in a scatterplot of simulated data

I don't think you can plot a curve without simulating the data.

@Rick_SAS  has a great blog post out today on the LINEPARM  statement which allows you to to draw a line without simulating data (you specify slope and intercept), but I think to draw y = 3*sin(x/2)+1 without simulating the data, you would need some sort of CURVEPARM statement (FUNCTIONPARM?) that allows you to specify an arbitrary function like you would on a graphing calculator.  I don't think that exists in SGPLOT, SGANNO, or GTL.

I also have had times when I wanted to just overlay a function without simulating the data myself. So I'd be happy if I'm wrong.

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.

## Re: Plotting the underlying DGP in a scatterplot of simulated data

To build on Ballardw's response, you can sort the data by X and then overlay the SERIES statement to plot the underlying deterministic model. Alternatively, generate an evenly spaced set of points in a separate data set and concatenate the two data sets, as shown in the article "How to overlay custom curves with PROC SGPLOT."

Discussion stats
• 3 replies
• 236 views
• 0 likes
• 4 in conversation