turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Change statistical moments/parameters of a sample

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-15-2016 07:19 AM - edited 01-15-2016 07:24 AM

Hello,

Is possible (and how) to take sample - similar to proc surveyselect -, but change certain statistical parameters of the sample? For example, take a normal distribution (uniform distribution is a bad example) and then get a triangular-distributed sample which has a certain mode?

Thanks&kind regards

Accepted Solutions

Solution

01-18-2016
08:03 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-18-2016 03:45 AM

Yes. As Rick said. It looks like you want draw some data from Normal Distribution with some special probability which conform to Triangle Distribution.

```
data tri;
call streaminit(1234);
do i=1 to 10000;
x=rand('triangle',0.7);output;
end;
run;
title 'Triangle Distribution peak=0.7';
proc sgplot data=tri;
histogram x;
density x/type=kernel;
run;
%let peak=0.7;
title 'Simulation from Normal';
proc iml;
x=j(10000,1);
prob=j(10000,1);
z=j(10000,1);
call randseed(1234);
call randgen(x,'normal');
prob = cdf("Normal", x);
prob = choose((prob<=&peak),prob,2#&peak-prob);
z=sample(x,10000,"Replace", prob);
call histogram(z) density='kernel';
quit;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-15-2016 07:59 AM

It's possible in some cases, but the resulting "sample" is no longer random.

In general, every time you resample you will obtain new sample statistics (mean, median, mode, etc). But it sounds like you want to predetermine a statistic, such as "I want the new mode to be 1.2."

What is the application for this process? What are you hoping to accomplish?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-15-2016 08:53 AM - edited 01-15-2016 09:17 AM

This is a very crude code, but it should show the basic idea. The actual data are "recipes" (a big number of bill of materials consisting of different components with 'interesting areas'). This is, sampling the multivariate case for general distributions (not symmetric) __and__ correlation (because the percentages of the components add up to 100 %) would be complete description. A SAS function for this is probably to much to hope for, but there might be articles about this problem.

```
* Defines allowed values;
Data A;
Do i=1 To 1000;
X=Round(Rannor(1)*3+10,0.01);
Output;
End;
Run;
* is not skewed ..;
Proc Means Data=A Mean StdDev Skewness;
Var X;
Run;
* 'nasty' way to get a kind of a triangular distribution;
Data B;
Set A;
Select ;
When (X > 6 & X <= 7) Group=1;
When (X > 7 & X <= 8) Group=2;
When (X > 8 & X <= 9) Group=3;
When (X > 9 & X <= 10) Group=4;
Otherwise Group=0;
End;
Run;
Proc Sort Data=B;
By Group X;
Run;
* sample sizes give left skewed distribution ..;
Proc SurveySelect Data=B Out=C Method=srs N=(0 5 5 10 30);
Strata Group;
Run;
Proc Means Data=C Mean StdDev Skewness;
Var X;
Run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-15-2016 10:35 AM - edited 01-15-2016 10:35 AM

Sound like you are wanting to do "probability sampling," where each observation is assigned a known probability of being selected. In your case, you are assigning higher probability to observations greater than the mean and lower probability to observations that are less than the mean. How you assign the probabilities affects the moments of the resulting distribution, so the key to your problem will be deciding on a transformation that generates a sampling probability from the data.

If you have SAS/IML experience, you can request probability samples by using the SAMPLE function. In the following, I generate sampling weights by using the normal CDF of the standardized data.

```
proc iml;
mu = 10; sigma = 3;
x = j(1000, 1);
call randgen(x, "Normal", mu, sigma); /* x ~ N(10, 3) */
/* create probability scale based on z-score */
prob = cdf("Normal", (x - mu)/sigma);
y = sample(x, 50, "Replace", prob); /* prob is standardized so sum(prob)=1 */
call histogram(y); /* show skewed distribution */
mean = mean(y`);
skew = skewness(y`);
print mean skew;
```

** **

Solution

01-18-2016
08:03 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-18-2016 03:45 AM

Yes. As Rick said. It looks like you want draw some data from Normal Distribution with some special probability which conform to Triangle Distribution.

```
data tri;
call streaminit(1234);
do i=1 to 10000;
x=rand('triangle',0.7);output;
end;
run;
title 'Triangle Distribution peak=0.7';
proc sgplot data=tri;
histogram x;
density x/type=kernel;
run;
%let peak=0.7;
title 'Simulation from Normal';
proc iml;
x=j(10000,1);
prob=j(10000,1);
z=j(10000,1);
call randseed(1234);
call randgen(x,'normal');
prob = cdf("Normal", x);
prob = choose((prob<=&peak),prob,2#&peak-prob);
z=sample(x,10000,"Replace", prob);
call histogram(z) density='kernel';
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-18-2016 08:02 AM

Unfortunately, I only have STAT/ETS and OR, but I think I can build something similar. If there is something similar for those modules, please let me know.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-18-2016 10:33 PM

Yes. Data step can do it, but need some more code .

```
data tri;
call streaminit(1234);
do i=1 to 10000;
x=rand('triangle',0.7);output;
end;
run;
title 'Triangle Distribution peak=0.7';
proc sgplot data=tri;
histogram x;
density x/type=kernel;
run;
%let peak=0.7;
title 'Simulation from Normal';
data normal;
call streaminit(1234);
do i=1 to 10000;
x=rand('normal');
prob = cdf("Normal", x);
p = ifn((prob<=&peak),prob,2*&peak-prob);
output;
end;
drop prob i;
run;
proc sql noprint;
select count(*) into : n from normal;
create table temp as
select x,p/sum(p) as p from normal;
quit;
data want;
set temp end=last;
array xx{&n} _temporary_;
array pp{&n} _temporary_;
call streaminit(1234);
xx{_n_}=x;
pp{_n_}=p;
if last then do;
do i=1 to 10000;
idx=rand('table',of pp{*});
x=xx{idx};
output;
end;
end;
run;
proc sgplot data=want;
histogram x;
density x/type=kernel;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-18-2016 09:32 AM

Do you want to SIMULATE data from a triangular distribution? That's easy by using the DATA step and the "TRIANGLE" distribution. You can also simulate data from the PERT distribution, which is a generalization of the triangular distribution.

Please clarify: do you want to simulate from a probability distribution, or do you want to resample from existing data?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-18-2016 09:37 AM

I would like to resample existing data. (PERT would be nice too, but for my purpose a triangular dist. is sufficient)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-18-2016 12:59 PM

Here's what you want to do:

1. Create a variable that contains the sampling probability for each observation

2. Use the METHOD=PPS_WR option in PROC SURVEYSELECT to specify that you want a probability sample that is proportional to size (with replacement)

For example, the following program assigns the first observation a 50% probability of being selected and the other eight observations a 6.25% probability.

```
data A;
do x = 1 to 9;
if x=1 then prob = 0.5; /* 50% probability of selection */
else prob = 0.5/8; /* 6.25% probability of selection */
output;
end;
run;
/* resample with probability proportional to size */
proc surveyselect data=A out=out method=PPS_WR
seed=123 N=100;
size prob; /* specify the probability variable */
run;
/* examine the distribution of the observations */
proc freq data=out;
weight numberHits;
tables x;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-19-2016 12:07 AM

Here is the code base on Rick.

```
%let peak=0.7;
title 'Simulation from Normal';
data normal;
call streaminit(1234);
do i=1 to 100000;
x=rand('normal');
prob = cdf("Normal", x);
_RATE_= ifn((prob<=&peak),prob,2*&peak-prob);
output;
end;
drop prob ;
run;
proc surveyselect data=normal out=want method=PPS_WR N=10000;
size _RATE_; /* specify the probability variable */
run;
proc sgplot data=want;
histogram x;
density x/type=kernel;
run;
```