Re: Simulation of a variable (continuous or dichotomous) correlated to...

slegleye · Posted 06-15-2022 02:49 AM

Hello users.

I have a dataset with three variables : one continuous, P, and two dichotomous, E and Y. I wonder how to simulate a variable (continuous or dichotomous) with given correlations with P, E and Y.

Does anyone know how to do it ?

Best,

sbxkoenk · Posted 06-15-2022 04:36 AM

Hi,

( Moved this post from 'Programming' to 'Statistical Procedures' board )

Start here :

Simulate multivariate correlated data by using PROC COPULA in SAS
By Rick Wicklin on The DO Loop July 7, 2021
https://blogs.sas.com/content/iml/2021/07/07/proc-copula-sas.html

Simulate correlated variables by using the Iman-Conover transformation
By Rick Wicklin on The DO Loop June 14, 2021
https://blogs.sas.com/content/iml/2021/06/14/simulate-iman-conover-transformation.html

SAS MACROS: CORR2DATA
https://stats.oarc.ucla.edu/sas/sas/macros/sas-macros-corr2data/

Maybe @Rick_SAS wants to add something?

Thanks,

Koen

slegleye · Posted 06-15-2022 05:53 AM

Thank you for your quick answer.

Unfortunately, all these programs simulate all variables. My problem is that I would like to simulate one variable with given correlations to actual variable in a dataset.

Do you know a way to do it ?

Best,

sbxkoenk · Posted 06-15-2022 06:17 AM

Hello,

In that case it becomes a kind of optimization.

Do you have SAS Optimization in SAS VIYA 3.x or 4 or SAS/OR (Operations Research) in SAS 9.4?

It's possible with PROC OPTMODEL, like in this blog :

Creating Synthetic Data with SAS/OR
By Jared Erickson on Operations Research with SAS May 17, 2017
https://blogs.sas.com/content/operations/2017/05/17/creating-synthetic-data-sasor/

Ciao,

Koen

slegleye · Posted 06-20-2022 08:11 AM

Dear Koen.

Thank you for your answere. Unfortunately, I do not have SAS VIYA 3.x or 4 or SAS/OR (Operations Research) in SAS 9.4.

In addition, I am not sure that I understand correctly the code you mention.

Sorry,

Rick_SAS · Posted 06-15-2022 06:45 AM

For one variable, you can do it. The geometry and SAS code for creating a variable that has a specified correlation with other variables is shown in Find a vector that has a specified correlation with another vector - The DO Loop

For multiple variables, you cannot always find a vector that has a specified correlation. The set of possible correlations with a set {x1, x2, x3} are determined by the geometry of those vectors and the correlations between the variable. The fact that two of your variables are dichotomous (0/1) further restricts the possible correlations that an arbitrary vector can make with the set {x1, x2, x3}.

For example, if x1 and x2 are highly correlated, you cannot find a vector that is highly correlated with x1 but is uncorrelated with x2. Similarly, if x1 and x2 are uncorrelated, you cannot find a vector that is highly correlated with x1 and with x2. So to even get started on this problem we would need to know the correlation matrix for the x_i.

So, I ask you to explain more about the source of these variables and the correlations. How do you know that there is a solution for the correlations that you are using? Are you starting from real data? Do you have an empirical correlation from some data that contains an actual Y variable? Are you trying to simulate new Y_i that are related to the x_i in a way that is similar to Y's relationship with the x_i?

slegleye · Posted 06-15-2022 07:16 AM

Dear Rick.

Thank you for your interest.

In fact, I intend to test the robustness of a causal estimation: does E, the binary exposition, really causes Y, the binary outcome? Data come from a real data set, a survey on youth (n=21000). I have E, Y, and a propensity score modeling E (conditionally on many covariates X), Ps. E and Y are binary, but Ps is continuous. the correlations between E, Y and Ps are given by the data.

The covariates X are also observed in the survey ; but what about an unobserved covariate U? It is still possible that my results based on Ps and E are biased because Ps does not include U.

I want to simulate an U with given correlations with E and Y, but with a null correlation with Ps.

Best,

MichaelL_SAS · Posted 06-15-2022 02:13 PM

I think this approach where you simulate only the value of an unmeasured confounder based on the observed data is likely to run into issues. Namely, if you are simulating the U values based on the observed exposure E and outcome Y, while you might be able to create the desired correlations, the causal relationships that produce them are unlikely to correspond to U being an unmeasured confounder. For U to be an unmeasured confounder it would have to be a common cause of E and Y. However, in your simulation E and Y are already known, so U cannot have a truly causal effect on them, so the correlation would come from E or Y effecting U, in which case the casual relationships are the reverse of what you want, and U would not be an unmeasured confounder.

I think it is fair to say that how to best perform sensitivity analyses for the effect of unmeasured confounding in observational studies is not a settled question. There are a wide variety of methods discussed in the literature. I think in the case of a binary outcome with the effect measured on the relative risk scale, the E-value as described by VanderWeele and Ding might be the most commonly suggested approach. I believe the appendix to their original paper had example SAS code for the computation of E-values and they have since made a web-app for computing E-values. There are also approaches that are specific to methods like propensity score matching, there is an example of this in the PROC PSMATCH documentation. There are also other approaches where the measured confounders are used to provide some basis for judging what the effect of an unmeasured confounder might be by seeing the effect of omitting each of the measured confounders from the adjustment set.

slegleye · Posted 06-20-2022 08:29 AM

Dear Michael (I hope that is the correct spelling).

Thanks for your answer. Your remark about the very nature of U in my simulation task is interesting and points the difficult nature of the problem. My intention is really to simulate an unobserved confounder of E and Y. My understanding of the situation is that : if I simulate a U with chosen correlations with E and Y but no correlation with the propensity score Ps, it would have exactly the observerd properties that I would have in the case where U is a true confounder that generates E and Y (but not Ps). With such a U, I would be able to compute a causal effect of E on Y (conditionally to the covariates X that compose Ps, and U). By varying the correlations between U and E and Y, I would be able to determine the correlations that are sufficient to explain away the effect of E on Y (without U, that is, only on observables). That U is not correlated to Ps ensures that the U is the extra unmeasured variable that is sufficient to do it.

I do not see what different properties would have a genuine confounder of E and Y that I miss with this method.

I agree that the literature on sensitivity is abundant and proposes various approaches. I know the meaning and the computation of the E-value by VanderWeele and Ding. But the E-value is a simplification in the sense that it relies on a U with equal correlations (risk-ratios) with E and Y. I would like to make the correlations between U and E and U and Y independent.

Best,

another propensity score with all the information: the covariates X that compose Ps and U. The derived estimate of the effect of E on Y would be the "true" causal effect of E on Y.

MichaelL_SAS · Posted 06-22-2022 05:07 PM

Sorry, for the delay in responding.

I think the issue I see with the simulation approach is maybe best described with some of the notation from causal diagrams. Given that E and Y values are set, if you are simulating a value of U with the desired correlations given the observed data, the causal structure would likely be one where E->U<-Y, which would make U a collider on a pathway between E and Y. For U to be a common cause of E and Y (and therefore a confounder) you would need the direction of those arrows to reverse and have E<-U->Y, something that I don't think is really possible given the fixed values of E and Y. Note that the documentation for the CAUSALGRAPH procedure provides some more details on graphical causal models, and there is this 2019 SGF paper that also discusses the collider issue in example 2.

In that case where U is a collider, comparing effect estimates that do/do not incorporate it in the adjustment set is studying the effect of inappropriately adjusting for a collider (as doing so opens up a non-casual pathway between E and Y) instead of studying the effect of not adjusting for an unmeasured confounder (as that leaves a non-causal pathway unblocked). In a sense the different assumptions about U result in analyses that are mirror image of one another, i.e. one assumes your current adjustment is correct and would be made incorrect by incorporating U vs the other assumes your current adjustment set is incorrect and would be made correct by incorporating U.

slegleye · Posted 06-23-2022 04:18 AM

Thank you for your answer.

You get the point: all is about the assumption regarding the causal role or U on E and Y.

I explicitly want to simulate a U that causes E and Y (as given in the dataset) but independantly of the observed covariates X ; and not a collider. If U is a collider, then my current estimation that ignores U is correct ; but if U is a confounder, it is not.

In the simulation I cannot impose a direction (a causality) but only a correlation structure between U, E and Y (and X). If there is causality (U-->E and U--Y) then I would observe the correlation structure that I simulate. More precisely I intend to simulate all that is needed to get my current estimation right without X but false if U is a confounder and to estimate the amount of correlation (causal role) of U on E and Y that would produce a null causal effect of E on Y when U is taken into account.

To be honest, I studied causal diagrams and Pearl's theory quite in detail but what I want to do is not in the textbooks I know. But I think it is relevant.

Best,

Ksharp · Posted 06-15-2022 08:12 AM

Maybe OP want simulate some data to conform to the correlation coefficient in a published paper.

Ksharp · Posted 06-20-2022 08:45 AM

Here is a way by Genetic Algorithm .

As Rick said, it is not guaranted to get solution .

And if you have many data,it could cost lots of time to get result.

/*
x1 is a binary variable,
x2 is a binary variable,
x3 is a continuous variable.

need create a new variable x4,which has correlation with x1 is 0.04, with x2 is -0.5,with x3 is 0.2
*/

%let corr_x4=  0.04    -0.5  0.2;  




data have(keep=x1 x2 x3);
set sashelp.heart(keep=status sex height obs=100);
x1=ifn(status='Dead',1,0);
x2=ifn(sex='Male',1,0);
rename height=x3;
run;

proc iml;
use have nobs nobs;
read all var {x1 x2 x3};
close;

start function(x) global(x1 ,x2 ,x3,corr_x4);
 all=x1||x2||x3||t(x); 
 corr=corr(all);
 sse=ssq(corr[4,1:3]-corr_x4) ;
 return (sse);
finish;

corr_x4={&corr_x4.};

bounds=j(2,nobs,-1000);
bounds[2,]=1000 ;    

id=gasetup(1,nobs,123456789);
call gasetobj(id,0,"function");
call gasetsel(id,10,1,.95);
call gainit(id,10000,bounds);


niter =  100 ;
do i = 1 to niter;
 call garegen(id);
 call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);

x4=t(mem);

create want var {x1 x2 x3 x4};
append;
close;

print value[l = "Min Value:(be near zero,be better)"] ;
call gaend(id);
quit;


proc corr data=want pearson;
var x1 x2 x3 x4;
run;

Simulation of a variable (continuous or dichotomous) correlated to three existant variables?

Re: Simulation of a variable (continuous or dichotomous) correlated to three existant variables?

Re: Simulation of a variable (continuous or dichotomous) correlated to three existant variables?

Re: Simulation of a variable (continuous or dichotomous) correlated to three existant variables?

Re: Simulation of a variable (continuous or dichotomous) correlated to three existant variables?

Re: Simulation of a variable (continuous or dichotomous) correlated to three existant variables?

Re: Simulation of a variable (continuous or dichotomous) correlated to three existant variables?

Re: Simulation of a variable (continuous or dichotomous) correlated to three existant variables?

Re: Simulation of a variable (continuous or dichotomous) correlated to three existant variables?

Re: Simulation of a variable (continuous or dichotomous) correlated to three existant variables?

Re: Simulation of a variable (continuous or dichotomous) correlated to three existant variables?

Re: Simulation of a variable (continuous or dichotomous) correlated to three existant variables?

Re: Simulation of a variable (continuous or dichotomous) correlated to three existant variables?