Hi, I used simulation to generate mixed distribution returns from state 1 & 2, where N =5000, P(state 1) = 70%, P(state 2) = 30%. Then I separate return 1 and return 2, and filling the missing value as 0. Now the question is how to create the average value scalar for each return? I mean create this number for future use. I tried transpose since the average number does not have the same dimension with my variables. However, after I count for the number of each return, R1 has 3400 whil R2 has 1600. Should these be 3500 for R1 and 1500 for R2? I would appreciate any comments and suggestion. Following is my sample and code:
/* sample from a mixture distribution */
%let N = 5000;
data States(drop=i);
call streaminit(12345);
array prob [2] _temporary_ (0.7 0.3);
do i = 1 to &N;
type = rand("Table", of prob[*]);
if type=1 then Return = rand("Normal", 0, 3); /* state 1 */
else Return = rand("Normal", 5, 10); /* steta 2 */
output;
end;
run;
data work.States;
set work.States;
if type = 1 then R1 = Return + 0;
else R2 = Return + 0;
if R1 =. then R1 =0;
if R2 =. then R2 =0;
Run;
proc transpose
data=work.States;
run;
data want;
set data1;
array cnt col1-col5000;
array cnts cols1-cols5000;
do over cnt;
if cnt in (0,.) then cnts=1;
else cnts=.;
allcnts=nmiss(of cols1-cols5000);
end;
drop col: cols:;
run;
Here is a short example for the result:
N type return R1 R2
1 1 0.58 0.58 0
2 2 0.36 0 0.36
3 1 -0.74 -0.74 0
I am definitely NOT an expert regarding the fairly new RAND function. However, that said, based on the UNIFORM distribution and the seed you used, 3416 of the observations you created had a random number generated that was less than .7.
That is apparently what the table option is selecting.
Your multiple calls to the function also likely interferred with a standard expectation. Try the following and look at the output. I use proc means to get the group counts but, at the same time, it (I think) does the calculations you want:
%let N = 5000; data States(drop=i); call streaminit(12345); array prob [2] _temporary_ (0.7 0.3); do i = 1 to &N; type = rand("Table", of prob[*]); output; end; run; data States; set States; call streaminit(12345); if type=1 then Return = rand("Normal", 0, 3); /* state 1 */ else Return = rand("Normal", 5, 10); /* steta 2 */ run; data states; set states; call streaminit(12345); rnum = rand("Uniform"); run; proc sort data=states; by rnum; run; proc means data=states; var return; class type; run;
Art, CEO, AnalystFinder.com
If you want exact proportional assignment you might be better off using proc surveyselect. There is probably a more direct route to get what you want, but the following will work:
%let N = 5000; data States; do i = 1 to &N; output; end; run; proc surveyselect data=states groups=10 seed=12345 out=states noprint; run; data States(drop=i); set States; if groupid lt 8 then type=1; else type=2; if type=1 then Return = rand("Normal", 0, 3);/* state 1 */ else Return = rand("Normal", 5, 10);/* state 2 */ output; run;
Art, CEO, AnalystFinder.com
Hi art297, thank you for your contribution. I got exactly 3500 returns from state 1 and rest from state 2.
I still have two quesitons with your code.
1. Is there any possible reason why my code generate 3400 R1 and 1600 R2?
2. After grouping and creating the returns, how to get the mean, variance, standard deviation, densitiy, etc for return from state1 and from state 2 separately?
I am definitely NOT an expert regarding the fairly new RAND function. However, that said, based on the UNIFORM distribution and the seed you used, 3416 of the observations you created had a random number generated that was less than .7.
That is apparently what the table option is selecting.
Your multiple calls to the function also likely interferred with a standard expectation. Try the following and look at the output. I use proc means to get the group counts but, at the same time, it (I think) does the calculations you want:
%let N = 5000; data States(drop=i); call streaminit(12345); array prob [2] _temporary_ (0.7 0.3); do i = 1 to &N; type = rand("Table", of prob[*]); output; end; run; data States; set States; call streaminit(12345); if type=1 then Return = rand("Normal", 0, 3); /* state 1 */ else Return = rand("Normal", 5, 10); /* steta 2 */ run; data states; set states; call streaminit(12345); rnum = rand("Uniform"); run; proc sort data=states; by rnum; run; proc means data=states; var return; class type; run;
Art, CEO, AnalystFinder.com
Thank you art297, I've tried your code and it works.
Since I would like to use the exact 70% and 30%, I will apply your first method and then use excel to get the parameters I need.
I will consider the second method when I come back to SAS and continue with following steps. Thank you.
Hi Xusheng,
I don't know if this approach is too crude, but why don't you just generate the 5000 i's, and if i is less than or equal to 3500 then type =1 and else type=2, and then you do all the subsequent simulation steps like art297 showed?
Thank you ilikesas, I did what you suggest -- create two loop one is from 1 to 3500 and another is from 3501 to 5000. And it works. Thank you so much.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.