Hello,
Is possible (and how) to take sample - similar to proc surveyselect -, but change certain statistical parameters of the sample? For example, take a normal distribution (uniform distribution is a bad example) and then get a triangular-distributed sample which has a certain mode?
Thanks&kind regards
Yes. As Rick said. It looks like you want draw some data from Normal Distribution with some special probability which conform to Triangle Distribution.
data tri;
call streaminit(1234);
do i=1 to 10000;
x=rand('triangle',0.7);output;
end;
run;
title 'Triangle Distribution peak=0.7';
proc sgplot data=tri;
histogram x;
density x/type=kernel;
run;
%let peak=0.7;
title 'Simulation from Normal';
proc iml;
x=j(10000,1);
prob=j(10000,1);
z=j(10000,1);
call randseed(1234);
call randgen(x,'normal');
prob = cdf("Normal", x);
prob = choose((prob<=&peak),prob,2#&peak-prob);
z=sample(x,10000,"Replace", prob);
call histogram(z) density='kernel';
quit;
It's possible in some cases, but the resulting "sample" is no longer random.
In general, every time you resample you will obtain new sample statistics (mean, median, mode, etc). But it sounds like you want to predetermine a statistic, such as "I want the new mode to be 1.2."
What is the application for this process? What are you hoping to accomplish?
This is a very crude code, but it should show the basic idea. The actual data are "recipes" (a big number of bill of materials consisting of different components with 'interesting areas'). This is, sampling the multivariate case for general distributions (not symmetric) and correlation (because the percentages of the components add up to 100 %) would be complete description. A SAS function for this is probably to much to hope for, but there might be articles about this problem.
* Defines allowed values;
Data A;
Do i=1 To 1000;
X=Round(Rannor(1)*3+10,0.01);
Output;
End;
Run;
* is not skewed ..;
Proc Means Data=A Mean StdDev Skewness;
Var X;
Run;
* 'nasty' way to get a kind of a triangular distribution;
Data B;
Set A;
Select ;
When (X > 6 & X <= 7) Group=1;
When (X > 7 & X <= 8) Group=2;
When (X > 8 & X <= 9) Group=3;
When (X > 9 & X <= 10) Group=4;
Otherwise Group=0;
End;
Run;
Proc Sort Data=B;
By Group X;
Run;
* sample sizes give left skewed distribution ..;
Proc SurveySelect Data=B Out=C Method=srs N=(0 5 5 10 30);
Strata Group;
Run;
Proc Means Data=C Mean StdDev Skewness;
Var X;
Run;
Sound like you are wanting to do "probability sampling," where each observation is assigned a known probability of being selected. In your case, you are assigning higher probability to observations greater than the mean and lower probability to observations that are less than the mean. How you assign the probabilities affects the moments of the resulting distribution, so the key to your problem will be deciding on a transformation that generates a sampling probability from the data.
If you have SAS/IML experience, you can request probability samples by using the SAMPLE function. In the following, I generate sampling weights by using the normal CDF of the standardized data.
proc iml;
mu = 10; sigma = 3;
x = j(1000, 1);
call randgen(x, "Normal", mu, sigma); /* x ~ N(10, 3) */
/* create probability scale based on z-score */
prob = cdf("Normal", (x - mu)/sigma);
y = sample(x, 50, "Replace", prob); /* prob is standardized so sum(prob)=1 */
call histogram(y); /* show skewed distribution */
mean = mean(y`);
skew = skewness(y`);
print mean skew;
Yes. As Rick said. It looks like you want draw some data from Normal Distribution with some special probability which conform to Triangle Distribution.
data tri;
call streaminit(1234);
do i=1 to 10000;
x=rand('triangle',0.7);output;
end;
run;
title 'Triangle Distribution peak=0.7';
proc sgplot data=tri;
histogram x;
density x/type=kernel;
run;
%let peak=0.7;
title 'Simulation from Normal';
proc iml;
x=j(10000,1);
prob=j(10000,1);
z=j(10000,1);
call randseed(1234);
call randgen(x,'normal');
prob = cdf("Normal", x);
prob = choose((prob<=&peak),prob,2#&peak-prob);
z=sample(x,10000,"Replace", prob);
call histogram(z) density='kernel';
quit;
Unfortunately, I only have STAT/ETS and OR, but I think I can build something similar. If there is something similar for those modules, please let me know.
Yes. Data step can do it, but need some more code .
data tri;
call streaminit(1234);
do i=1 to 10000;
x=rand('triangle',0.7);output;
end;
run;
title 'Triangle Distribution peak=0.7';
proc sgplot data=tri;
histogram x;
density x/type=kernel;
run;
%let peak=0.7;
title 'Simulation from Normal';
data normal;
call streaminit(1234);
do i=1 to 10000;
x=rand('normal');
prob = cdf("Normal", x);
p = ifn((prob<=&peak),prob,2*&peak-prob);
output;
end;
drop prob i;
run;
proc sql noprint;
select count(*) into : n from normal;
create table temp as
select x,p/sum(p) as p from normal;
quit;
data want;
set temp end=last;
array xx{&n} _temporary_;
array pp{&n} _temporary_;
call streaminit(1234);
xx{_n_}=x;
pp{_n_}=p;
if last then do;
do i=1 to 10000;
idx=rand('table',of pp{*});
x=xx{idx};
output;
end;
end;
run;
proc sgplot data=want;
histogram x;
density x/type=kernel;
run;
Do you want to SIMULATE data from a triangular distribution? That's easy by using the DATA step and the "TRIANGLE" distribution. You can also simulate data from the PERT distribution, which is a generalization of the triangular distribution.
Please clarify: do you want to simulate from a probability distribution, or do you want to resample from existing data?
Here's what you want to do:
1. Create a variable that contains the sampling probability for each observation
2. Use the METHOD=PPS_WR option in PROC SURVEYSELECT to specify that you want a probability sample that is proportional to size (with replacement)
For example, the following program assigns the first observation a 50% probability of being selected and the other eight observations a 6.25% probability.
data A;
do x = 1 to 9;
if x=1 then prob = 0.5; /* 50% probability of selection */
else prob = 0.5/8; /* 6.25% probability of selection */
output;
end;
run;
/* resample with probability proportional to size */
proc surveyselect data=A out=out method=PPS_WR
seed=123 N=100;
size prob; /* specify the probability variable */
run;
/* examine the distribution of the observations */
proc freq data=out;
weight numberHits;
tables x;
run;
Here is the code base on Rick.
%let peak=0.7;
title 'Simulation from Normal';
data normal;
call streaminit(1234);
do i=1 to 100000;
x=rand('normal');
prob = cdf("Normal", x);
_RATE_= ifn((prob<=&peak),prob,2*&peak-prob);
output;
end;
drop prob ;
run;
proc surveyselect data=normal out=want method=PPS_WR N=10000;
size _RATE_; /* specify the probability variable */
run;
proc sgplot data=want;
histogram x;
density x/type=kernel;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.