Fluorite | Level 6

Simulating data with known distributions

Hello,

I am looking :

So my research is: I want to create a table of 5000 observations with 3 variables (formula, sex, salary). Now in the formula variable: I have 50% of the observations being F1, 20% or F2 and 30% or F3. In the sex variable: 70% or men and 30% women. Finally the salary: 50% earn 3000 euros and 50% 2000 euros. This is basically what I'm looking for.

3 REPLIES 3
Super User

Re: Simulating data with known distributions

I modified your subject line to be more descriptive and removed the text of your question from a code block, that should be used for code or data not text.

You really should also know the relationship between the variables but assuming what you stated is all you have something like the following will get you started.

``````data randomData;

*random seed to ensure reproducible results for testing;
call streaminit(55);

array probFormula(3)\$  _temporary_ ("F1", "F2", "F3");
array probSex(2) \$ _temporary_ ("M", "F");
array probSalary(2) _temporary_ (3000, 2000);

*number of observations = 5000;
do i=1 to 5000;
formula = probFormula(rand('table', 0.5, 0.2, 0.3));
Sex = probSex(rand('table', 0.7, 0.3));
Salary = probSalary(rand('table', 0.5, 0.5));
output;
end;

drop i;

run;

*check distribution;
proc freq data=randomData;
table formula sex salary;
run;
``````

@Jaji wrote:

Hello,

I am looking :

So my research is: I want to create a table of 5000 observations with 3 variables (formula, sex, salary). Now in the formula variable: I have 50% of the observations being F1, 20% or F2 and 30% or F3. In the sex variable: 70% or men and 30% women. Finally the salary: 50% earn 3000 euros and 50% 2000 euros. This is basically what I'm looking for.

Fluorite | Level 6

Re: Simulating data with known distributions

Thank very much. Your code will help me.

Thank you

Super User

Re: Simulating data with known distributions

Calling @Rick_SAS

Discussion stats
• 3 replies
• 461 views
• 4 likes
• 3 in conversation