02-23-2018 02:33 PM
I'm guessing that we don't have enough information to help with those specific variables. "Age" of who? "Income" where? "Gender" across what population?
Rick Wicklin has a ton of practical simulation tips and explanation on his blog. I'd start there and post back here when you have specific questions and/or code that you've tried.
02-23-2018 02:53 PM
02-23-2018 03:01 PM
I'm sure it is possible
Perhaps if you can post examples of the data that you have, in the form that you think you need. And then some more ideas of what you think you might want as a result.
Your question sounds similar to Rick's topic here - about simulating data when you don't really have any data to go by but instead just have some boundaries that you guess at.
02-23-2018 03:12 PM
I am working on attributed social networks data (each person in the network has many attributes as age/ income/ centrality measure). most of the studies use examples not simulation.
So, I need to generalize my results using simulation instead of examples (generalize it on any age/ income/ gender structure)
is that possible?
Yes it is possible, but it's work.
Get the distributions of age/income/gender from your countries Census Bureau. Depending on your data, you can possibly use other information. For example, we commonly use the location of a person and use the area's median income in the analysis. If I had to simulate it, I would use the 5th, 25th, 50th, 95th and 99th income percentiles to do this.
02-24-2018 03:20 PM
The challenge in simulating demographic data is getting the correlations correct. If you have access to a large set of real data, you might want to use bootstrap resamples.
It is easy to simulate each variable separately (use PROC UNIVARIATE to fit the continuous variables) and then simulate gender, age, and income. However, if you do this you assume independence of gender, age, and income. In practice, older people make more money than young and men make more money than women for doing the same job.
For your problem, it is reasonable to assume that the gender and age variables are essentially independent (except for the very old population; women live longer than men). Therefore you can simulate gender and age from their univariate distributions.
I suggest that you model income as a function of age and gender, then simulate income from the regression model.You can then randomly simulate an income for each (gender, age) pair.
Some of this is covered in Chapter2 and 11 of Simulating Data with SAS. An example of simulating from a linear regression model is available at "Simulate many samples from a linear regression model."
02-24-2018 04:25 PM
I know Canadian census provides age/sex breakdowns at very small levels (both geographic and age groups) and I believe the US census does too. It's the income that's usually harder to simulate.