BookmarkSubscribeRSS Feed
Evange
Fluorite | Level 6

Hi.

 

I am presently developing a Monte Carlo study using SAS. For one of my experimental designs, I need to simulate a random variable, which has to be drawn in such a way that it should respect an a priory defined correlation to an already existing variable. More concretely, I am simulating the difference (error) between list and transaction prices of residential properties, which I assume that is strongly correlated to the size (area) of sold properties (i.e., the bigger the house, the bigger the difference between what it is asked by the seller and what he/she actually gets from the market...). 

 

Unfortunately, I was unable to find advice on how to simulate a random variable with a known defined correlation to an already existing variable. I find examples that simulate all variables but, for my study, I need to simulate a variable (error between list and market price) from an existing one (area). The simulation needs to take into account a pre-defined correlation between the existing and simulated variables (say, 0.9).

 

Would it be possible to get some advice on how to do this? Suggest some reading?

 

Thanks.

6 REPLIES 6
Rick_SAS
SAS Super FREQ

Is the known variable assumed to be fixed?

What is the distribution of the variable that you want to simulate? 

 

Evange
Fluorite | Level 6

Hi, thanks for your prompt interest (I have recently asked for one of your books; still on transit... 🙂 ).

 

The variable (area) is to remain fixed.

The error (i.e., List Price - Transaction Price) is assumed to be log-normally distributed (i.e., basically the error is always positive; sellers are always asking for more than they actually receive from the market).

 

I just want to derive a vector of errors, which will be correlated with the area of a property. With this vector I will derive list prices (which I do not have in my dataset): List Price = Transaction Price + Error.

 

After this, I will generate data (in a Monte Cralo Study) that will mirror my sample characteristics (which will then have the new simulated List price variable). For doing this, I will follow the procedures described in "SAS for Monte Carlo Studies; A guide for qunatitative researchers". My sample has around 60% of all market transactions carried out in Portugal (2009-2013). I am applying hedonic regression models to describe transaction prices:

 

Ln(transaction prices) = f(characteristics) and would like to know what happens if I use Ln(List price) instead of transaction prices.

 

Thanks (do not know if this helps; perhaps too much info...).

Ksharp
Super User
You mean correlation coefficient ?


data have;
call streaminit(4321);
do id=1 to 10000;
 x=rand('normal');output;
end;
run;
%let rho=0.9;
data want;
 set have;
 call streaminit(1234);
 y=ρ*x+sqrt(1-ρ**2)*rand('normal');
run;
proc corr data=want;
 var x;
 with y;
run;


Evange
Fluorite | Level 6

Hi, yes, it is the correlation coefficient.

 

Thank you very much for your inputs. But I was seeing the problem in a incorrect way. I want to get a variable from a fixed variable and the correlation between a random variable and a fixed one is zero as one of the users has rightly pointed out. 

Rick_SAS
SAS Super FREQ

Covariance is defined for random vectorrs. The covariance between a random vector and a constant vector is zero, since the constant vector doe not vary. So your formulation is not quite correct.

 

Still, I think I understand your intention. You want to simulate the ListingPrice as a function of the asking price and area. Since you don't have listing prices, you have free parameters to play with, but you should assume a model of some form that includes the area. One choice might be 

 

log(ListPrice) = B0 + B1*log(AskingPrice) + B2*sqrt(Area) + epsilon

but I don't know how how housing prices are usually modeled.

 

 

Notice that the errors are normally distributed, but since you are modeling log(ListPrice), the errors for ListPrice are log-normally distributed. See the article "Error distributions and exponential regression models."

 

So I suggest that you form some model for log(ListPrice) that incorporates the asking price and the area.  If you don't know how to simulate a response for a regression model, see Section 11.3 (especially p. 204) of my book Simulating Data with SAS.

 

Evange
Fluorite | Level 6

Hi.

 

Thanks. You are right. My formulation is not correct.

 

And you have understood what I want to do correctly.

I will now try to model List Prices as a function of Transaction Prices (market prices) and area. And I will look into tyour book since I do not know how to simulate a response variable.

 

Regards.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2310 views
  • 3 likes
  • 3 in conversation