turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Number of Distinct Values?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-07-2009 10:19 AM

Is there a simple way to determine the number of distinct values in an observation?

I'm working with many columns with integer values. I can use descriptive statistics to determine the number of (non-missing) values or the minimum value etc.

numentries = n(of col_start--col_end)

minvalue = min(of col_start--col_end)

Many values are repeated. I would like to know the number of distinct values there are for each observation.

I'm working with many columns with integer values. I can use descriptive statistics to determine the number of (non-missing) values or the minimum value etc.

numentries = n(of col_start--col_end)

minvalue = min(of col_start--col_end)

Many values are repeated. I would like to know the number of distinct values there are for each observation.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-07-2009 11:43 AM

Hi JohnI,

I'm not sure what you mean by simple, but something like this might help:

data have;

input id var1 var2 var3 var4 var5;

cards;

1 1 2 3 4 5

2 1 2 2 3 4

3 3 3 2 1 3

4 4 1 2 3 1

;

run;

data want(keep=id var1--var5 n distinct);

set have;

array v_(*) var1--var5;

array _dist_(10) _temporary_;

do j = 1 to 10;

_dist_(j) = .;

end;

do i = 1 to dim(v_);

_dist_(v_(i)) = 1;

end;

distinct = 0;

do j = 1 to 10;

if _dist_(j) ne . then distinct = distinct + 1;

end;

n = n(of var1--var5);

run;

I've only used five variables and a limit of 10 for the value, but you could extend this for n variables and a wide range of integer values.

Robert

I'm not sure what you mean by simple, but something like this might help:

data have;

input id var1 var2 var3 var4 var5;

cards;

1 1 2 3 4 5

2 1 2 2 3 4

3 3 3 2 1 3

4 4 1 2 3 1

;

run;

data want(keep=id var1--var5 n distinct);

set have;

array v_(*) var1--var5;

array _dist_(10) _temporary_;

do j = 1 to 10;

_dist_(j) = .;

end;

do i = 1 to dim(v_);

_dist_(v_(i)) = 1;

end;

distinct = 0;

do j = 1 to 10;

if _dist_(j) ne . then distinct = distinct + 1;

end;

n = n(of var1--var5);

run;

I've only used five variables and a limit of 10 for the value, but you could extend this for n variables and a wide range of integer values.

Robert

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-07-2009 12:56 PM

Thank you Robert.

By simple I meant a 'one-liner' like N or Max.

But, I was able to use your code for my needs (and I learned something too).

Thanks again, John

By simple I meant a 'one-liner' like N or Max.

But, I was able to use your code for my needs (and I learned something too).

Thanks again, John

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-07-2009 03:38 PM

Hi:

You could also investigate the NLEVELS option of PROC FREQ or just proc freq in general:

[pre]

data have;

input id var1 var2 var3 var4 var5;

cards;

1 1 2 3 4 5

2 1 2 2 3 4

3 3 3 2 1 3

4 4 1 2 3 1

;

run;

proc freq data=have nlevels;

table _numeric_;

run;

[/pre]

cynthia

You could also investigate the NLEVELS option of PROC FREQ or just proc freq in general:

[pre]

data have;

input id var1 var2 var3 var4 var5;

cards;

1 1 2 3 4 5

2 1 2 2 3 4

3 3 3 2 1 3

4 4 1 2 3 1

;

run;

proc freq data=have nlevels;

table _numeric_;

run;

[/pre]

cynthia