BookmarkSubscribeRSS Feed
EdvinasS
Calcite | Level 5

Hi, i have interesting data tu analyse. there are two datasets (showing only example):

1) -1 -1 -1 0 1 1 0 -1 1 1 0 -1 -1 0 -1 1 1 -1 0 0 1 -1 1 1............................

2) 

....

....

0.99
0.8
1.11
-1
-1
0.8
0
-1
0
0
1
-1
0.78
0
0.76
0

...

basicaly datasets have 3 diferent types of values:

-1, 0, 1 in first dataset

and  -1, 0 ,poitive value which is in interval [0.5;1.2] in second dataset.

the size of data set is ~500 observations.

What i need to find out is the mean and confidence limits of it.

does anybody have an idea of evaluating distribution and parameters for such data? the main task is to get confident lower limit of mean and the minimum size of dataset which should be analysed to get confident result.

please help anyone.. thanks a lot!

6 REPLIES 6
SteveDenham
Jade | Level 19

What do the values represent?  With some idea of the process that generated these values, it might be easier to come up with an answer.  I think of the first as ordinal categories, while the second looks like a mixture of some sort.  With a better understanding of how these values were generated, we might be able to give a better answer.

Steve Denham

EdvinasS
Calcite | Level 5

thank you for the interest. the data represents the outcome of gambling game. those datasets are from the same games observed.

1)simplified dataset:  if you lose your result is -1 . if its a draw the result is 0. if you win the result is 1. so we take  500 games, we win more often than loose and we have some draws. and i am trying to calculated the return of investment (average result).

2) second dataset is based on the same games but difference is that when you come in to the game you pay the price of 1 unit. if you loose the game, you loose 1 unit so profit is -1. if its a draw you get 1 unit refunded so profit is 0. if you win you win 1 multiplied by some coeficient frominterval [0.5; 1.2] and the profit is equal to the coeficient.

for example we know that from first 500 observations, we won 280, won lost 170, and draw 50, and the final result is  that the average profit on one game is 0.05 (5 percent) having in mind that profit of winning is less than 1, and lost amount is always -1 makes profit small while we win much more times than loose.

any help from my explanation?

EdvinasS
Calcite | Level 5

any ideas?

SteveDenham
Jade | Level 19

The -1, 0, 1 data can be modeled with two processes.  The first calculates the probability of not tying, the second the probability of a win.  The expected value, based on your data, would be (number of zeroes/number of trials) * (number of wins/number of non-zero trials).  The variance could be calculated using the delta method, and applying it to the product of two binomials.

However, possibly the best way to estimate the mean and variance for these kinds of mixtures would be by bootstrapping.  Randomly sample 100 observations from each distribution and calculate the raw mean.  Repeat this about 1000 times, and calculate the overall mean and standard error based on the sample means.

I think the second distribution would be nearly intractable to any other analysis, as the fractional payoff is dependent on the distribution on the interval [0.5, 1.2].  I doubt very much that the distribution is uniform on that interval, probably a non-linear decreasing function like a gamma distribution, but truncated, so that moments would be almost impossible to calculate.  A perfect place to use bootstrapping.

Steve Denham

EdvinasS
Calcite | Level 5

Thank you so much. i found some prepared macros for bootstraping and i will test them with my data in few days. i will get back with results. Thanks again.

EdvinasS
Calcite | Level 5

Hi once more,

I triedusing bootstrapping technique to find out the distribution and i get X~N(0.05, 0.02). Everything is clear for me here. If i put mean, std dev, and lower/upper bounds of mean to proc power to get N i get ~300. Thats look totally normal as i was guessing that 300 would be enough before trying to calculate this. But if i choose 300 random observations from data set containing 1000 observations and calculate mean, every time it is not even close to 0.05. Even running bootstrapping when generating 300 samples means where N =300 i get totally different results. So how do i get N with which mean would be similar in both cases with all data from set and with N=300 ?

Problem: for example i want to collect another data set with similar observations (it takes too long to get another 1000 observations) and i want to decide when number of observations is enough to conclude that mean is equal to some number and it will be the same in a long term so and i can invest real money into these observations.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 724 views
  • 3 likes
  • 2 in conversation