Hello everyone SAS!
Please could you help me to do problem on random sampling.
How do I sample a quadruplet of values from the distribution shown in figure? Note that two samples (in blue) demonstrate covariance. Namely if x1 is high in distribution (u1,s1), then x2, x3 and x4 will also be high in the respective distribution?
Thank you all!
ll87
Most of us, including me, will not download Microsoft Office documents, they can be a security threat. Please include screen captures of what you want to show us by clicking on the "Insert Photos" icon. Do NOT attach files.
Thank you PaigeMiller I appreciate!
But look at this picture how does the sampling you suggest give me the cyan line? Or lower blue line? I want these quadruplet points from sampling. The sampling you suggests, give me the distribution but not give me a vector of 4 values at values x1, x2, x3, x4.
Hope this clear and thanking you for help!
Please explain in detail what the cyan line and the blue line represent.
The line with 4 points represents data from a single person. The subject has an outcome (y) measured at fixed concentrations (x1, x2, x3, x4). So for a subject you have pairs of points (x1,y1)... (x4,y4) for a subject. I basically want to randomly sample a subjects data [(x1,y1)...(x4,y4)]. If the subject has high response y1, it tends to stay high for all concentrations (as picture shows).
So for a subject, the four Y values are measured, the four X values are fixed. It seems as if the Y values are correlated with the X values, am I right about that? Are the Ys correlated with each other beyond the correlation with X?
Yes the Xi are correlated with Yi. And the Yi's are correlated with one another.
@linlin87 wrote:
Yes the Xi are correlated with Yi. And the Yi's are correlated with one another.
This really doesn't answer the question. Are the Yi's correlated with one another because of their relationship to X? Or is there additional correlation amongst the Yi's that have nothing to do with X?
The Yi are correlated with one another because each individual person has a unique response profile over the Xs. If that answer question? So it have nothing to do with X really, it is just variation in the strength of the response. But currently sampling approach does not account for this.
@linlin87 wrote:
But currently sampling approach does not account for this.
Are you referring to PaigeMiller's suggestion of sampling from the multivariate normal distribution? I think that this model could have the characteristics shown in your sample plot. Below is an example using only Base SAS and SAS/STAT (as I don't have a SAS/IML license). But, really, you should follow Rick_SAS's advice and precisely define the statistical model you are starting with.
/* Define mean vector and covariance matrix */
data cov(type=COV);
input _type_ $ _name_ $ y1-y4;
datalines;
COV y1 0.14 0.158 0.167 0.183
COV y2 0.158 0.22 0.235 0.263
COV y3 0.167 0.235 0.31 0.351
COV y4 0.183 0.263 0.351 0.49
MEAN . 1 1.72 2.42 3.52
;
/* Simulate data from the corresponding 4-dimensional normal distribution */
proc simnorm data=cov outsim=want numreal=100000 seed=2718;
var y1-y4;
run;
/* Prepare density plot */
proc transpose data=want out=trans;
by rnum;
run;
/* Prepare annotation to display the data of two simulated vectors:
rnum=46651 (representative of the 5% quantile of y1),
rnum=16395 (representative of the 95% quantile of y1). */
%sganno
data sgannodata;
%sgoval(x1=24.0, y1=76.1, height=2, width=2.6, linecolor="CX3F48CC", fillcolor="CX3F48CC", display="fill");
%sgoval(x1=31.3, y1=52.2);
%sgoval(x1=38.4, y1=28.3);
%sgoval(x1=50.7, y1=4.4);
%sgline(x1=24.0, x2=31.3, y1=76.1, y2=52.2, linethickness=1);
%sgline(x1=31.3, x2=38.4, y1=52.2, y2=28.3);
%sgline(x1=38.4, x2=50.7, y1=28.3, y2=4.4);
%sgoval(x1=38.1, y1=76.1, height=2, width=2.6, linecolor="CX00A2E8", fillcolor="CX00A2E8", display="fill");
%sgoval(x1=47.0, y1=52.2);
%sgoval(x1=56.0, y1=28.3);
%sgoval(x1=69.0, y1=4.4);
%sgline(x1=38.1, x2=47.0, y1=76.1, y2=52.2, linethickness=1);
%sgline(x1=47.0, x2=56.0, y1=52.2, y2=28.3);
%sgline(x1=56.0, x2=69.0, y1=28.3, y2=4.4);
run;
/* Create the annotated density plot */
proc sgpanel data=trans sganno=sgannodata noautolegend;
panelby _name_ / rows=4 columns=1 onepanel novarname;
colaxis display=(nolabel);
density col1 / type=kernel lineattrs=(color=black);
run;
Result:
The first step for any data simulation is to write down the model from which you want to generate the data. You must do this first.
It sounds like you might be modeling these data as a repeated-measures model where a random effect dictates the intercept and slope for each individual?
Several sources show how to simulate from mixed models in SAS. I suggest you start by reading Gibbs and Kiernan (2020), "Simulating Data for Complex Linear Models." That paper discusses several mixed models and how to simulate them.
If you have real data, you might already have an appropriate model that you have fit by using a SAS procedure. If so, the output of the SAS procedure provides estimates of the parameters and variance terms that are necessary to build a simulation that reflects your real data.
After you know the model from which you wish to simulate, write back if you need help with the actual simulation step.
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.