BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ilikesas
Barite | Level 11

Hi,

 

I am recreating Rick Wicklin's blog post Simulating the Coupon Collector's Problem. In the last section there is a code that creates a CDF:

 

/** has event occurred by roll j? (j=K..L) **/
cdf = j(L,1,0); /** allocate **/
do j = K to L;
   c = countunique(x[,1:j], "row");
   cdf[j] = (c=K)[:];
end;

  

 

What I would like  is to have the actual count for each j and not the proportion, but when I did :

cdf[j] = c, I got an error message that the matrices do not conform to the operation. Is there a way to see the actual count for each j?

 

Thank you 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

The notation 

(c=K)

creates a 0/1 matrix  which has the value 1 in cells for which c[i,j]=K.

The notation 

(c=K)[:]

takes the mean of those numbers by using a subscript reduction operator.

If you want the sum instead of the mean, just use 

(c=K)[+]

or

sum(c=K)

 

cdf = j(L,1,0); /** allocate **/
do j = K to L;
   c = countunique(x[,1:j], "row");
   cdf[j] = (c=K)[+];
end;
call scatter(K:L, cdf);

 

View solution in original post

5 REPLIES 5
Rick_SAS
SAS Super FREQ

cdf[j] is one cell, whereas c is a vector with 10,000 elements, which is why you are getting an error.

 

I don't understand what "count"  you want.  I suggest you set NSim=5 and L=8 so you can print out  x and other matrices and  tell  us what you are looking for:

/* generate NSim trials of L rolls */
NSim = 5;
L = 8;
...
print x;

/* count of what? */
c1 = (x=k)[+,];  /* number of times for each column that K appeared */
c2 = (x=k)[,+];  /* number of times for each row that K appeared */

 

ilikesas
Barite | Level 11

Hi Rick (I knew you would reply!)

 

By count I mean simply the total number of trials for which all 6 faces appear in 6 rolls, in 7 rolls, 8 rolls etc.

 

I know that I can multiply each cdf by 10000 and then calculate the difference between consecutive cdf's, but I just wanted to see how to do it with the code.

 

 

Thanks!

Rick_SAS
SAS Super FREQ

The notation 

(c=K)

creates a 0/1 matrix  which has the value 1 in cells for which c[i,j]=K.

The notation 

(c=K)[:]

takes the mean of those numbers by using a subscript reduction operator.

If you want the sum instead of the mean, just use 

(c=K)[+]

or

sum(c=K)

 

cdf = j(L,1,0); /** allocate **/
do j = K to L;
   c = countunique(x[,1:j], "row");
   cdf[j] = (c=K)[+];
end;
call scatter(K:L, cdf);

 

ilikesas
Barite | Level 11

thanks! that is what I meant, but couldn't get because I just didn't know about the code

(c=K)[+]

 A small side note, this code gets the cummulative count, which is the cummulative distribution multiplied by the sample size. Is it also possible to get the discrete count, like the pdf times the sample size? 

Rick_SAS
SAS Super FREQ

The pdf is the difference between adjacent values of the cdf. Therfore you can use the DIF function to compute the pdf from the cdf. Something like

 

pdf = dif(cdf);

pdf[1] = 0;

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 5 replies
  • 1882 views
  • 2 likes
  • 2 in conversation