BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
user24feb
Barite | Level 11

Hello,

 

Is possible (and how) to take sample - similar to proc surveyselect -, but change certain statistical parameters of the sample? For example, take a normal distribution (uniform distribution is a bad example) and then get a triangular-distributed sample which has a certain mode?

 

Thanks&kind regards

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Yes. As Rick said. It looks like you want draw some data from Normal Distribution with some special probability which conform to Triangle Distribution.

 

data tri;
call streaminit(1234);
do i=1 to 10000;
 x=rand('triangle',0.7);output;
end;
run;
title 'Triangle Distribution peak=0.7';
proc sgplot data=tri;
histogram x;
density x/type=kernel;
run;






%let peak=0.7;
title 'Simulation from Normal';
proc iml;
x=j(10000,1);
prob=j(10000,1);
z=j(10000,1);

call randseed(1234);
call randgen(x,'normal');
prob = cdf("Normal", x);
prob = choose((prob<=&peak),prob,2#&peak-prob);

z=sample(x,10000,"Replace", prob);
call histogram(z) density='kernel';

quit;

View solution in original post

10 REPLIES 10
Rick_SAS
SAS Super FREQ

It's possible in some cases, but the resulting "sample" is no longer random. 

 

In general, every time you resample you will obtain new sample statistics (mean, median, mode,  etc).  But it sounds like you want to predetermine a statistic, such as "I want the new mode to be 1.2." 

 

What is the application for this process? What are you hoping to accomplish?

user24feb
Barite | Level 11

This is a very crude code, but it should show the basic idea. The actual data are "recipes" (a big number of bill of materials consisting of different components with 'interesting areas'). This is, sampling the multivariate case for general distributions (not symmetric) and correlation (because the percentages of the components add up to 100 %) would be complete description. A SAS function for this is probably to much to hope for, but there might be articles about this problem.

 

* Defines allowed values;
Data A;
  Do i=1 To 1000;
    X=Round(Rannor(1)*3+10,0.01);
	Output;
  End;
Run;

* is not skewed ..;
Proc Means Data=A Mean StdDev Skewness;
  Var X;
Run;

* 'nasty' way to get a kind of a triangular distribution;
Data B;
  Set A;
  Select ;
    When (X > 6 & X <= 7) Group=1;
	When (X > 7 & X <= 8) Group=2;
	When (X > 8 & X <= 9) Group=3;
	When (X > 9 & X <= 10) Group=4;
    Otherwise Group=0;
  End;
Run;

Proc Sort Data=B;
  By Group X;
Run;

* sample sizes give left skewed distribution ..;
Proc SurveySelect Data=B Out=C Method=srs N=(0 5 5 10 30);
  Strata Group;
Run;

Proc Means Data=C Mean StdDev Skewness;
  Var X;
Run;

 

Rick_SAS
SAS Super FREQ

Sound like you are wanting to do "probability sampling," where each observation is assigned a known probability of being selected.  In your case, you are assigning higher probability to observations greater than the mean and lower probability to observations that are less than the mean.  How you assign the probabilities affects the moments of the resulting distribution, so the key to your problem will be deciding on a transformation that generates a sampling probability from the data.

 

If you have SAS/IML experience, you can request probability samples by using the SAMPLE function. In the following, I generate sampling weights by using the normal CDF of the standardized data. 

 

proc iml;
mu = 10; sigma = 3;
x = j(1000, 1);
call randgen(x, "Normal", mu, sigma);  /* x ~ N(10, 3) */

/* create probability scale based on z-score */
prob = cdf("Normal", (x - mu)/sigma);
y = sample(x, 50, "Replace", prob);  /* prob is standardized so sum(prob)=1 */

call histogram(y);  /* show skewed distribution */
mean = mean(y`);
skew = skewness(y`);
print mean skew;

 

 

Ksharp
Super User

Yes. As Rick said. It looks like you want draw some data from Normal Distribution with some special probability which conform to Triangle Distribution.

 

data tri;
call streaminit(1234);
do i=1 to 10000;
 x=rand('triangle',0.7);output;
end;
run;
title 'Triangle Distribution peak=0.7';
proc sgplot data=tri;
histogram x;
density x/type=kernel;
run;






%let peak=0.7;
title 'Simulation from Normal';
proc iml;
x=j(10000,1);
prob=j(10000,1);
z=j(10000,1);

call randseed(1234);
call randgen(x,'normal');
prob = cdf("Normal", x);
prob = choose((prob<=&peak),prob,2#&peak-prob);

z=sample(x,10000,"Replace", prob);
call histogram(z) density='kernel';

quit;
user24feb
Barite | Level 11

Unfortunately, I only have STAT/ETS and OR, but I think I can build something similar. If there is something similar for those modules, please let me know.

Ksharp
Super User

Yes. Data step can do it, but need some more code .

 

data tri;
call streaminit(1234);
do i=1 to 10000;
 x=rand('triangle',0.7);output;
end;
run;
title 'Triangle Distribution peak=0.7';
proc sgplot data=tri;
histogram x;
density x/type=kernel;
run;


%let peak=0.7;
title 'Simulation from Normal';
data normal;
call streaminit(1234);
do i=1 to 10000;
 x=rand('normal');
 prob = cdf("Normal", x);
 p = ifn((prob<=&peak),prob,2*&peak-prob);
 output;
end;
drop prob i;
run;
proc sql noprint;
 select count(*) into : n from normal;

create table temp as
 select x,p/sum(p) as p from normal;
quit;
data want;
 set temp end=last;
 array xx{&n} _temporary_;
 array pp{&n} _temporary_;
 call streaminit(1234);
 xx{_n_}=x;
 pp{_n_}=p;
 if last then do;
  do i=1 to 10000;
   idx=rand('table',of pp{*});
   x=xx{idx};
   output;
  end;
 end;
run;
proc sgplot data=want;
histogram x;
density x/type=kernel;
run;
Rick_SAS
SAS Super FREQ

Do you want to SIMULATE data from a triangular distribution? That's easy by using the DATA step and the "TRIANGLE" distribution. You can also simulate data from the PERT distribution, which is a generalization of the triangular distribution.

 

Please clarify: do you want to simulate from a probability distribution, or do you want to resample from existing data?

 

 

user24feb
Barite | Level 11
I would like to resample existing data. (PERT would be nice too, but for my purpose a triangular dist. is sufficient)
Rick_SAS
SAS Super FREQ

Here's what you want to do:

1. Create a variable that contains the sampling probability for each observation

2. Use the METHOD=PPS_WR option in PROC SURVEYSELECT to specify that you want a probability sample that is proportional to size (with replacement)

 

For example, the following program assigns the first observation a 50% probability of being selected and the other eight observations a 6.25% probability.

data A;
do x = 1 to 9;
   if x=1 then prob = 0.5; /* 50% probability of selection */
   else prob = 0.5/8;      /*  6.25% probability of selection */
   output;
end;
run;

/* resample with probability proportional to size */
proc surveyselect data=A out=out method=PPS_WR 
     seed=123 N=100;
size prob;    /* specify the probability variable */
run;

/* examine the distribution of the observations */
proc freq data=out;
weight numberHits;
tables x;
run;
Ksharp
Super User

Here is the code base on Rick.

 


%let peak=0.7;
title 'Simulation from Normal';
data normal;
call streaminit(1234);
do i=1 to 100000;
 x=rand('normal');
 prob = cdf("Normal", x);
 _RATE_= ifn((prob<=&peak),prob,2*&peak-prob);
 output;
end;
drop prob ;
run;

proc surveyselect data=normal  out=want method=PPS_WR N=10000;
size _RATE_;    /* specify the probability variable */
run;
proc sgplot data=want;
histogram x;
density x/type=kernel;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1769 views
  • 2 likes
  • 3 in conversation