BookmarkSubscribeRSS Feed
maureen
Calcite | Level 5
Hello,

I need to simulate data about age/ income/ gender.
Which distributions do these variables follow?
how are they simulated.

Thanks in advance for your time and help.
6 REPLIES 6
ChrisHemedinger
Community Manager

I'm guessing that we don't have enough information to help with those specific variables.  "Age" of who?  "Income" where?  "Gender" across what population?  

 

Rick Wicklin has a ton of practical simulation tips and explanation on his blog.  I'd start there and post back here when you have specific questions and/or code that you've tried.

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
maureen
Calcite | Level 5
I am working on attributed social networks data (each person in the network has many attributes as age/ income/ centrality measure). most of the studies use examples not simulation.
So, I need to generalize my results using simulation instead of examples (generalize it on any age/ income/ gender structure)
is that possible?
ChrisHemedinger
Community Manager

I'm sure it is possible 🙂 

 

Perhaps if you can post examples of the data that you have, in the form that you think you need.  And then some more ideas of what you think you might want as a result.

 

Your question sounds similar to Rick's topic here -  about simulating data when you don't really have any data to go by but instead just have some boundaries that you guess at.

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
Reeza
Super User

@maureen wrote:
I am working on attributed social networks data (each person in the network has many attributes as age/ income/ centrality measure). most of the studies use examples not simulation.
So, I need to generalize my results using simulation instead of examples (generalize it on any age/ income/ gender structure)
is that possible?

Yes it is possible, but it's work. 

Get the distributions of age/income/gender from your countries Census Bureau. Depending on your data, you can possibly use other information. For example, we commonly use the location of a person and use the area's median income in the analysis. If I had to simulate it, I would use the 5th, 25th, 50th, 95th and 99th income percentiles to do this.  

Rick_SAS
SAS Super FREQ

The challenge in simulating demographic data is getting the correlations correct. If you have access to a large set of real data, you might want to use bootstrap resamples.

 

It is easy to simulate each variable separately (use PROC UNIVARIATE to fit the continuous variables) and then simulate gender, age, and income. However, if you do this you assume independence of gender, age, and income. In practice, older people make more money than young and men make more money than women for doing the same job. 

 

For your problem, it is reasonable to assume that the gender and age variables are essentially independent (except for the very old population; women live longer than men). Therefore you can simulate gender and age from their univariate distributions. 

I suggest that you model income as a function of age and gender, then simulate income from the regression model.You can then randomly simulate an income for each (gender, age) pair. 

 

Some of this is covered in Chapter2 and 11 of Simulating Data with SAS. An example of simulating from a linear regression model is available at "Simulate many samples from a linear regression model."

 

 

Reeza
Super User

I know Canadian census provides age/sex breakdowns at very small levels (both geographic and age groups) and I believe the US census does too. It's the income that's usually harder to simulate. 

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1904 views
  • 5 likes
  • 4 in conversation