Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 02-04-2017 04:52 PM
(1093 views)

Hi,

I am recreating Rick Wicklin's blog post Simulating the Coupon Collector's Problem. In the last section there is a code that creates a CDF:

```
/** has event occurred by roll j? (j=K..L) **/
cdf = j(L,1,0); /** allocate **/
do j = K to L;
c = countunique(x[,1:j], "row");
cdf[j] = (c=K)[:];
end;
```

What I would like is to have the actual count for each j and not the proportion, but when I did :

cdf[j] = c, I got an error message that the matrices do not conform to the operation. Is there a way to see the actual count for each j?

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The notation

(c=K)

creates a 0/1 matrix which has the value 1 in cells for which c[i,j]=K.

The notation

(c=K)[:]

takes the mean of those numbers by using a subscript reduction operator.

If you want the sum instead of the mean, just use

(c=K)[+]

or

sum(c=K)

```
cdf = j(L,1,0); /** allocate **/
do j = K to L;
c = countunique(x[,1:j], "row");
cdf[j] = (c=K)[+];
end;
call scatter(K:L, cdf);
```

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

cdf[j] is one cell, whereas c is a vector with 10,000 elements, which is why you are getting an error.

I don't understand what "count" you want. I suggest you set NSim=5 and L=8 so you can print out x and other matrices and tell us what you are looking for:

```
/* generate NSim trials of L rolls */
NSim = 5;
L = 8;
...
print x;
/* count of what? */
c1 = (x=k)[+,]; /* number of times for each column that K appeared */
c2 = (x=k)[,+]; /* number of times for each row that K appeared */
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Rick (I knew you would reply!)

By count I mean simply the total number of trials for which all 6 faces appear in 6 rolls, in 7 rolls, 8 rolls etc.

I know that I can multiply each cdf by 10000 and then calculate the difference between consecutive cdf's, but I just wanted to see how to do it with the code.

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The notation

(c=K)

creates a 0/1 matrix which has the value 1 in cells for which c[i,j]=K.

The notation

(c=K)[:]

takes the mean of those numbers by using a subscript reduction operator.

If you want the sum instead of the mean, just use

(c=K)[+]

or

sum(c=K)

```
cdf = j(L,1,0); /** allocate **/
do j = K to L;
c = countunique(x[,1:j], "row");
cdf[j] = (c=K)[+];
end;
call scatter(K:L, cdf);
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

thanks! that is what I meant, but couldn't get because I just didn't know about the code

`(c=K)[+]`

A small side note, this code gets the cummulative count, which is the cummulative distribution multiplied by the sample size. Is it also possible to get the discrete count, like the pdf times the sample size?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The pdf is the difference between adjacent values of the cdf. Therfore you can use the DIF function to compute the pdf from the cdf. Something like

pdf = dif(cdf);

pdf[1] = 0;

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.