DATA Step, Macro, Functions and more

Variable average with missing value

Accepted Solution Solved
Reply
Contributor
Posts: 48
Accepted Solution

Variable average with missing value

Hi, I used simulation to generate mixed distribution returns from state 1 & 2, where N =5000, P(state 1) = 70%, P(state 2) = 30%. Then I separate return 1 and return 2, and filling the missing value as 0. Now the question is how to create the average value scalar for each return? I mean create this number for future use. I tried transpose since the average number does not have the same dimension with my variables. However, after I count for the number of each return, R1 has 3400 whil R2 has 1600. Should these be 3500 for R1 and 1500 for R2? I would appreciate any comments and suggestion. Following is my sample and code:

/* sample from a mixture distribution */
%let N = 5000;
data States(drop=i);
call streaminit(12345);
array prob [2] _temporary_ (0.7 0.3);
do i = 1 to &N;
   type = rand("Table", of prob[*]);
   if type=1 then      Return = rand("Normal", 0, 3); /* state 1 */
   else                Return = rand("Normal", 5, 10); /* steta 2 */             
   output;
end;
run;
data work.States;
	set work.States;
if type = 1 then R1 = Return + 0;
else R2 = Return + 0;
if R1 =. then R1 =0;
if R2 =. then R2 =0;
Run;
proc transpose 
data=work.States;
run;
data want;
set data1;
array cnt col1-col5000;
array cnts cols1-cols5000;
do over cnt;
if cnt in (0,.) then cnts=1;
else cnts=.;
allcnts=nmiss(of cols1-cols5000);
end;
drop col: cols:;
run;

Here is a short example for the result:

N     type     return      R1       R2

1        1        0.58       0.58      0

2        2        0.36         0       0.36

3        1       -0.74      -0.74      0


Accepted Solutions
Solution
‎04-01-2017 04:03 PM
PROC Star
Posts: 7,364

Re: Variable average with missing value

I am definitely NOT an expert regarding the fairly new RAND function. However, that said, based on the UNIFORM distribution and the seed you used, 3416 of the observations you created had a random number generated that was less than .7.

 

That is apparently what the table option is selecting.

 

Your multiple calls to the function also likely interferred with a standard expectation. Try the following and look at the output. I use proc means to get the group counts but, at the same time, it (I think) does the calculations you want:

 

%let N = 5000;
data States(drop=i);
  call streaminit(12345);
  array prob [2] _temporary_ (0.7 0.3);
  do i = 1 to &N;
    type = rand("Table", of prob[*]);
    output;
  end;
run;
data States;
  set States;
  call streaminit(12345);
  if type=1 then      Return = rand("Normal", 0, 3); /* state 1 */
  else                Return = rand("Normal", 5, 10); /* steta 2 */             
run;
data states;
  set states;
  call streaminit(12345);
  rnum = rand("Uniform");
run;
proc sort data=states;
  by rnum;
run;

proc means data=states;
  var return;
  class type;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
PROC Star
Posts: 7,364

Re: Variable average with missing value

If you want exact proportional assignment you might be better off using proc surveyselect. There is probably a more direct route to get what you want, but the following will work:

 

%let N = 5000;
data States;
  do i = 1 to &N;
   output;
  end;
run;

proc surveyselect data=states groups=10 seed=12345 out=states noprint;
run;

data States(drop=i);
  set States;
   if groupid lt 8 then type=1;
   else type=2;
   if type=1 then Return = rand("Normal", 0,  3);/* state 1 */
   else           Return = rand("Normal", 5, 10);/* state 2 */             
   output;
run;

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 48

Re: Variable average with missing value

Hi art297, thank you for your contribution. I got exactly 3500 returns from state 1 and rest from state 2.

I still have two quesitons with your code.

1. Is there any possible reason why my code generate 3400 R1 and 1600 R2?

2. After grouping and creating the returns, how to get the mean, variance, standard deviation, densitiy, etc for return from state1 and from state 2 separately?

 

Solution
‎04-01-2017 04:03 PM
PROC Star
Posts: 7,364

Re: Variable average with missing value

I am definitely NOT an expert regarding the fairly new RAND function. However, that said, based on the UNIFORM distribution and the seed you used, 3416 of the observations you created had a random number generated that was less than .7.

 

That is apparently what the table option is selecting.

 

Your multiple calls to the function also likely interferred with a standard expectation. Try the following and look at the output. I use proc means to get the group counts but, at the same time, it (I think) does the calculations you want:

 

%let N = 5000;
data States(drop=i);
  call streaminit(12345);
  array prob [2] _temporary_ (0.7 0.3);
  do i = 1 to &N;
    type = rand("Table", of prob[*]);
    output;
  end;
run;
data States;
  set States;
  call streaminit(12345);
  if type=1 then      Return = rand("Normal", 0, 3); /* state 1 */
  else                Return = rand("Normal", 5, 10); /* steta 2 */             
run;
data states;
  set states;
  call streaminit(12345);
  rnum = rand("Uniform");
run;
proc sort data=states;
  by rnum;
run;

proc means data=states;
  var return;
  class type;
run;

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 48

Re: Variable average with missing value

Thank you art297, I've tried your code and it works. 

Since I would like to use the exact 70% and 30%, I will apply your first method and then use excel to get the parameters I need.

I will consider the second method when I come back to SAS and continue with following steps. Thank you.

Super Contributor
Posts: 413

Re: Variable average with missing value

Hi Xusheng,

 

I don't know if this approach is too crude, but why don't you just generate the 5000 i's, and if i is less than or equal to 3500 then type =1 and else type=2, and then you do all the subsequent simulation steps like art297 showed?

Contributor
Posts: 48

Re: Variable average with missing value

Thank you ilikesas, I did what you suggest -- create two loop one is from 1 to 3500 and another is from 3501 to 5000. And it works. Thank you so much.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 242 views
  • 1 like
  • 3 in conversation