DATA Step, Macro, Functions and more

How to count no. of unique values only

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

How to count no. of unique values only

Suppose I have a dataset with 100 char and 100 numeric variables and I want to count the no. of unique observations in char variables only ...


I know we could use the count distinct in Proc SQL but I wanted to know if there is any method using data step approach...

Thank you,

Rockerd.

Accepted Solutions
Solution
‎03-03-2017 08:00 AM
Respected Advisor
Posts: 3,799

Re: How to count no. of unique values only

> The method suggested by Patrick works for me.

More easily obtained with PROC FREQ LEVELS

[pre]
Number of Variable Levels

Variable Levels
--------------------
Name 19
Sex 2
[/pre]

View solution in original post


All Replies
SAS Employee
Posts: 174

Re: How to count no. of unique values only

[pre]data test;
input char1 $ char2 $ num1 num2;
datalines;
A B 1 2
A B 1 2
A C 1 2
A C 1 2
D B 4 2
D B 1 2
;
run;

proc sort data=work.test
out=char nodupkey;
by _char_;
run;

proc sort data=work.test
out=num nodupkey;
by _numeric_;
run;
N/A
Posts: 0

Re: How to count no. of unique values only

You could sort the dataset by the variable you want to count and then use the retain function to count every time you find a new value.

proc sort data=x;
by CharVar1;
run;

data x1(keep=count);
set x end=eof;
by CharVar1;

retain count 0;

if first.CharVar1 then count=count+1;

if eof then output;
run;
Valued Guide
Posts: 634

Re: How to count no. of unique values only

Posted in reply to deleted_user
Ron Fehd (the Macro Maven) has written a couple of macros that do what you are asking. Especially his %showall. See http://www2.sas.com/proceedings/sugi22/POSTERS/PAPER236.PDF
Occasional Contributor
Posts: 12

Re: How to count no. of unique values only

Thank you for the reply.

I had 100 character variables.

Do I have to sort every one individually and then follow this procedure of nodupkey for every variable? I feel that would be time consuming.


Is there any way where I could do it without having to sort the data?

Like in a Proc print you could specify :
sum var_name;
Is there any method similar to this in SAS where in I could say count(distinct (var_name)) without using proc SQL?
Respected Advisor
Posts: 3,799

Re: How to count no. of unique values only

I think PROC FREQ is what you need. TABLES _CHAR_;
Occasional Contributor
Posts: 12

Re: How to count no. of unique values only

Posted in reply to data_null__
actually I was trying for proc freq ...

but when I say out = dataset_name .. it outputs only the last char variable it has encountered ... so counting for distinct values is a little difficult with proc freq.
Respected Advisor
Posts: 3,799

Re: How to count no. of unique values only

[pre]ods output onewayfreqs=freq;[/pre]
Respected Advisor
Posts: 4,173

Re: How to count no. of unique values only

Hi

As I understand your question you're actually not looking for a count of unique observations but rather for a count of unique values per character variable.

If so then you have to scan the data set per variable. Luckily there are PROC's doing this for us - and they do it normally more efficient than we would code it.

Below approach uses Proc Freq to get a "record" per character variable and distinct value of this variable (missings included) and then uses ODS Output to collect the result and write it to a data set.

ods _all_ close;
ods output OneWayFreqs=work.Freqs(keep=table) ;

proc freq data=sashelp.class;
table _character_ /nocum nopercent missing;
run;

ods output close;
ods listing;

title "Unique Values per Character Variable";
proc sql;
select substr(Table,6) as Variable_Name format=$32. ,count(*) as N_UniqueValues format=8.
from work.Freqs
group by Table
;
quit;
title;


HTH
Patrick
Super User
Posts: 10,044

Re: How to count no. of unique values only

Hmmmmm...
I think there will be a way to get the result you want.try it.


[pre]
proc transpose data=sashelp.class out=class;
by _character_;
var _character_;
run;
proc sort data=class(keep= _name_ col1) out=temp;
by _name_ col1;
run;
data freq;
set temp;
by _name_ col1;
retain count;
if first._name_ then count=0;
if last.col1 then count+1;
if last._name_ then output;
keep _name_ count;
run;
proc print data=freq(rename=( _name_ = var_name)) noobs;
run;
[/pre]

Ksharp
Occasional Contributor
Posts: 12

Re: How to count no. of unique values only

Patrick and KSharp., thank you for the reply.

The method suggested by Patrick works for me.

Again thanks

Rockerd.
Solution
‎03-03-2017 08:00 AM
Respected Advisor
Posts: 3,799

Re: How to count no. of unique values only

> The method suggested by Patrick works for me.

More easily obtained with PROC FREQ LEVELS

[pre]
Number of Variable Levels

Variable Levels
--------------------
Name 19
Sex 2
[/pre]
Super User
Posts: 10,044

Re: How to count no. of unique values only

Posted in reply to data_null__
Hi . data null;
You must work hard on documentation. It is pretty easy.
I admire you really. :_)
Valued Guide
Posts: 634

Re: How to count no. of unique values only

Posted in reply to data_null__
Thank you Data _NULL_ for reminding me of these options. I believe that the LEVELS option is an alias for the NLEVELS option. Without a TABLE statement you not only get the summary shown by Data _NULL_, but also a summary for each level of each variable. Adding a TABLE statement yields only the overall summary for the selected variables. Since you want all variables try the following:

[pre]
proc freq data=sashelp.class nlevels;
table _all_ / noprint;
run;
[/pre]
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 38068 views
  • 8 likes
  • 7 in conversation