10-03-2016 04:50 AM
I am presently developing a Monte Carlo study using SAS. For one of my experimental designs, I need to simulate a random variable, which has to be drawn in such a way that it should respect an a priory defined correlation to an already existing variable. More concretely, I am simulating the difference (error) between list and transaction prices of residential properties, which I assume that is strongly correlated to the size (area) of sold properties (i.e., the bigger the house, the bigger the difference between what it is asked by the seller and what he/she actually gets from the market...).
Unfortunately, I was unable to find advice on how to simulate a random variable with a known defined correlation to an already existing variable. I find examples that simulate all variables but, for my study, I need to simulate a variable (error between list and market price) from an existing one (area). The simulation needs to take into account a pre-defined correlation between the existing and simulated variables (say, 0.9).
Would it be possible to get some advice on how to do this? Suggest some reading?
10-03-2016 05:48 AM
Is the known variable assumed to be fixed?
What is the distribution of the variable that you want to simulate?
10-03-2016 06:27 AM
Hi, thanks for your prompt interest (I have recently asked for one of your books; still on transit... :-) ).
The variable (area) is to remain fixed.
The error (i.e., List Price - Transaction Price) is assumed to be log-normally distributed (i.e., basically the error is always positive; sellers are always asking for more than they actually receive from the market).
I just want to derive a vector of errors, which will be correlated with the area of a property. With this vector I will derive list prices (which I do not have in my dataset): List Price = Transaction Price + Error.
After this, I will generate data (in a Monte Cralo Study) that will mirror my sample characteristics (which will then have the new simulated List price variable). For doing this, I will follow the procedures described in "SAS for Monte Carlo Studies; A guide for qunatitative researchers". My sample has around 60% of all market transactions carried out in Portugal (2009-2013). I am applying hedonic regression models to describe transaction prices:
Ln(transaction prices) = f(characteristics) and would like to know what happens if I use Ln(List price) instead of transaction prices.
Thanks (do not know if this helps; perhaps too much info...).
10-03-2016 07:26 AM
You mean correlation coefficient ? data have; call streaminit(4321); do id=1 to 10000; x=rand('normal');output; end; run; %let rho=0.9; data want; set have; call streaminit(1234); y=ρ*x+sqrt(1-ρ**2)*rand('normal'); run; proc corr data=want; var x; with y; run;
10-03-2016 10:14 AM
Hi, yes, it is the correlation coefficient.
Thank you very much for your inputs. But I was seeing the problem in a incorrect way. I want to get a variable from a fixed variable and the correlation between a random variable and a fixed one is zero as one of the users has rightly pointed out.
10-03-2016 09:17 AM
Covariance is defined for random vectorrs. The covariance between a random vector and a constant vector is zero, since the constant vector doe not vary. So your formulation is not quite correct.
Still, I think I understand your intention. You want to simulate the ListingPrice as a function of the asking price and area. Since you don't have listing prices, you have free parameters to play with, but you should assume a model of some form that includes the area. One choice might be
log(ListPrice) = B0 + B1*log(AskingPrice) + B2*sqrt(Area) + epsilon
but I don't know how how housing prices are usually modeled.
Notice that the errors are normally distributed, but since you are modeling log(ListPrice), the errors for ListPrice are log-normally distributed. See the article "Error distributions and exponential regression models."
So I suggest that you form some model for log(ListPrice) that incorporates the asking price and the area. If you don't know how to simulate a response for a regression model, see Section 11.3 (especially p. 204) of my book Simulating Data with SAS.
10-03-2016 10:06 AM
Thanks. You are right. My formulation is not correct.
And you have understood what I want to do correctly.
I will now try to model List Prices as a function of Transaction Prices (market prices) and area. And I will look into tyour book since I do not know how to simulate a response variable.