Programming the statistical procedures from SAS

Change statistical moments/parameters of a sample

Accepted Solution Solved
Reply
Super Contributor
Posts: 334
Accepted Solution

Change statistical moments/parameters of a sample

[ Edited ]

Hello,

 

Is possible (and how) to take sample - similar to proc surveyselect -, but change certain statistical parameters of the sample? For example, take a normal distribution (uniform distribution is a bad example) and then get a triangular-distributed sample which has a certain mode?

 

Thanks&kind regards

 


Accepted Solutions
Solution
‎01-18-2016 08:03 AM
Grand Advisor
Posts: 9,452

Re: Change statistical moments/parameters of a sample

Yes. As Rick said. It looks like you want draw some data from Normal Distribution with some special probability which conform to Triangle Distribution.

 

data tri;
call streaminit(1234);
do i=1 to 10000;
 x=rand('triangle',0.7);output;
end;
run;
title 'Triangle Distribution peak=0.7';
proc sgplot data=tri;
histogram x;
density x/type=kernel;
run;






%let peak=0.7;
title 'Simulation from Normal';
proc iml;
x=j(10000,1);
prob=j(10000,1);
z=j(10000,1);

call randseed(1234);
call randgen(x,'normal');
prob = cdf("Normal", x);
prob = choose((prob<=&peak),prob,2#&peak-prob);

z=sample(x,10000,"Replace", prob);
call histogram(z) density='kernel';

quit;

View solution in original post


All Replies
SAS Super FREQ
Posts: 3,309

Re: Change statistical moments/parameters of a sample

It's possible in some cases, but the resulting "sample" is no longer random. 

 

In general, every time you resample you will obtain new sample statistics (mean, median, mode,  etc).  But it sounds like you want to predetermine a statistic, such as "I want the new mode to be 1.2." 

 

What is the application for this process? What are you hoping to accomplish?

Super Contributor
Posts: 334

Re: Change statistical moments/parameters of a sample

[ Edited ]

This is a very crude code, but it should show the basic idea. The actual data are "recipes" (a big number of bill of materials consisting of different components with 'interesting areas'). This is, sampling the multivariate case for general distributions (not symmetric) and correlation (because the percentages of the components add up to 100 %) would be complete description. A SAS function for this is probably to much to hope for, but there might be articles about this problem.

 

* Defines allowed values;
Data A;
  Do i=1 To 1000;
    X=Round(Rannor(1)*3+10,0.01);
	Output;
  End;
Run;

* is not skewed ..;
Proc Means Data=A Mean StdDev Skewness;
  Var X;
Run;

* 'nasty' way to get a kind of a triangular distribution;
Data B;
  Set A;
  Select ;
    When (X > 6 & X <= 7) Group=1;
	When (X > 7 & X <= 8) Group=2;
	When (X > 8 & X <= 9) Group=3;
	When (X > 9 & X <= 10) Group=4;
    Otherwise Group=0;
  End;
Run;

Proc Sort Data=B;
  By Group X;
Run;

* sample sizes give left skewed distribution ..;
Proc SurveySelect Data=B Out=C Method=srs N=(0 5 5 10 30);
  Strata Group;
Run;

Proc Means Data=C Mean StdDev Skewness;
  Var X;
Run;

 

SAS Super FREQ
Posts: 3,309

Re: Change statistical moments/parameters of a sample

[ Edited ]

Sound like you are wanting to do "probability sampling," where each observation is assigned a known probability of being selected.  In your case, you are assigning higher probability to observations greater than the mean and lower probability to observations that are less than the mean.  How you assign the probabilities affects the moments of the resulting distribution, so the key to your problem will be deciding on a transformation that generates a sampling probability from the data.

 

If you have SAS/IML experience, you can request probability samples by using the SAMPLE function. In the following, I generate sampling weights by using the normal CDF of the standardized data. 

 

proc iml;
mu = 10; sigma = 3;
x = j(1000, 1);
call randgen(x, "Normal", mu, sigma);  /* x ~ N(10, 3) */

/* create probability scale based on z-score */
prob = cdf("Normal", (x - mu)/sigma);
y = sample(x, 50, "Replace", prob);  /* prob is standardized so sum(prob)=1 */

call histogram(y);  /* show skewed distribution */
mean = mean(y`);
skew = skewness(y`);
print mean skew;

 

 

Solution
‎01-18-2016 08:03 AM
Grand Advisor
Posts: 9,452

Re: Change statistical moments/parameters of a sample

Yes. As Rick said. It looks like you want draw some data from Normal Distribution with some special probability which conform to Triangle Distribution.

 

data tri;
call streaminit(1234);
do i=1 to 10000;
 x=rand('triangle',0.7);output;
end;
run;
title 'Triangle Distribution peak=0.7';
proc sgplot data=tri;
histogram x;
density x/type=kernel;
run;






%let peak=0.7;
title 'Simulation from Normal';
proc iml;
x=j(10000,1);
prob=j(10000,1);
z=j(10000,1);

call randseed(1234);
call randgen(x,'normal');
prob = cdf("Normal", x);
prob = choose((prob<=&peak),prob,2#&peak-prob);

z=sample(x,10000,"Replace", prob);
call histogram(z) density='kernel';

quit;
Super Contributor
Posts: 334

Re: Change statistical moments/parameters of a sample

Unfortunately, I only have STAT/ETS and OR, but I think I can build something similar. If there is something similar for those modules, please let me know.

Grand Advisor
Posts: 9,452

Re: Change statistical moments/parameters of a sample

Yes. Data step can do it, but need some more code .

 

data tri;
call streaminit(1234);
do i=1 to 10000;
 x=rand('triangle',0.7);output;
end;
run;
title 'Triangle Distribution peak=0.7';
proc sgplot data=tri;
histogram x;
density x/type=kernel;
run;


%let peak=0.7;
title 'Simulation from Normal';
data normal;
call streaminit(1234);
do i=1 to 10000;
 x=rand('normal');
 prob = cdf("Normal", x);
 p = ifn((prob<=&peak),prob,2*&peak-prob);
 output;
end;
drop prob i;
run;
proc sql noprint;
 select count(*) into : n from normal;

create table temp as
 select x,p/sum(p) as p from normal;
quit;
data want;
 set temp end=last;
 array xx{&n} _temporary_;
 array pp{&n} _temporary_;
 call streaminit(1234);
 xx{_n_}=x;
 pp{_n_}=p;
 if last then do;
  do i=1 to 10000;
   idx=rand('table',of pp{*});
   x=xx{idx};
   output;
  end;
 end;
run;
proc sgplot data=want;
histogram x;
density x/type=kernel;
run;
SAS Super FREQ
Posts: 3,309

Re: Change statistical moments/parameters of a sample

Do you want to SIMULATE data from a triangular distribution? That's easy by using the DATA step and the "TRIANGLE" distribution. You can also simulate data from the PERT distribution, which is a generalization of the triangular distribution.

 

Please clarify: do you want to simulate from a probability distribution, or do you want to resample from existing data?

 

 

Super Contributor
Posts: 334

Re: Change statistical moments/parameters of a sample

I would like to resample existing data. (PERT would be nice too, but for my purpose a triangular dist. is sufficient)
SAS Super FREQ
Posts: 3,309

Re: Change statistical moments/parameters of a sample

Here's what you want to do:

1. Create a variable that contains the sampling probability for each observation

2. Use the METHOD=PPS_WR option in PROC SURVEYSELECT to specify that you want a probability sample that is proportional to size (with replacement)

 

For example, the following program assigns the first observation a 50% probability of being selected and the other eight observations a 6.25% probability.

data A;
do x = 1 to 9;
   if x=1 then prob = 0.5; /* 50% probability of selection */
   else prob = 0.5/8;      /*  6.25% probability of selection */
   output;
end;
run;

/* resample with probability proportional to size */
proc surveyselect data=A out=out method=PPS_WR 
     seed=123 N=100;
size prob;    /* specify the probability variable */
run;

/* examine the distribution of the observations */
proc freq data=out;
weight numberHits;
tables x;
run;
Grand Advisor
Posts: 9,452

Re: Change statistical moments/parameters of a sample

Here is the code base on Rick.

 


%let peak=0.7;
title 'Simulation from Normal';
data normal;
call streaminit(1234);
do i=1 to 100000;
 x=rand('normal');
 prob = cdf("Normal", x);
 _RATE_= ifn((prob<=&peak),prob,2*&peak-prob);
 output;
end;
drop prob ;
run;

proc surveyselect data=normal  out=want method=PPS_WR N=10000;
size _RATE_;    /* specify the probability variable */
run;
proc sgplot data=want;
histogram x;
density x/type=kernel;
run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 403 views
  • 2 likes
  • 3 in conversation