BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rockerd
Fluorite | Level 6
Suppose I have a dataset with 100 char and 100 numeric variables and I want to count the no. of unique observations in char variables only ...


I know we could use the count distinct in Proc SQL but I wanted to know if there is any method using data step approach...

Thank you,

Rockerd.
1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19
> The method suggested by Patrick works for me.

More easily obtained with PROC FREQ LEVELS

[pre]
Number of Variable Levels

Variable Levels
--------------------
Name 19
Sex 2
[/pre]

View solution in original post

13 REPLIES 13
GertNissen
Barite | Level 11
[pre]data test;
input char1 $ char2 $ num1 num2;
datalines;
A B 1 2
A B 1 2
A C 1 2
A C 1 2
D B 4 2
D B 1 2
;
run;

proc sort data=work.test
out=char nodupkey;
by _char_;
run;

proc sort data=work.test
out=num nodupkey;
by _numeric_;
run;
deleted_user
Not applicable
You could sort the dataset by the variable you want to count and then use the retain function to count every time you find a new value.

proc sort data=x;
by CharVar1;
run;

data x1(keep=count);
set x end=eof;
by CharVar1;

retain count 0;

if first.CharVar1 then count=count+1;

if eof then output;
run;
ArtC
Rhodochrosite | Level 12
Ron Fehd (the Macro Maven) has written a couple of macros that do what you are asking. Especially his %showall. See http://www2.sas.com/proceedings/sugi22/POSTERS/PAPER236.PDF
rockerd
Fluorite | Level 6
Thank you for the reply.

I had 100 character variables.

Do I have to sort every one individually and then follow this procedure of nodupkey for every variable? I feel that would be time consuming.


Is there any way where I could do it without having to sort the data?

Like in a Proc print you could specify :
sum var_name;
Is there any method similar to this in SAS where in I could say count(distinct (var_name)) without using proc SQL?
data_null__
Jade | Level 19
I think PROC FREQ is what you need. TABLES _CHAR_;
rockerd
Fluorite | Level 6
actually I was trying for proc freq ...

but when I say out = dataset_name .. it outputs only the last char variable it has encountered ... so counting for distinct values is a little difficult with proc freq.
data_null__
Jade | Level 19
[pre]ods output onewayfreqs=freq;[/pre]
Patrick
Opal | Level 21
Hi

As I understand your question you're actually not looking for a count of unique observations but rather for a count of unique values per character variable.

If so then you have to scan the data set per variable. Luckily there are PROC's doing this for us - and they do it normally more efficient than we would code it.

Below approach uses Proc Freq to get a "record" per character variable and distinct value of this variable (missings included) and then uses ODS Output to collect the result and write it to a data set.

ods _all_ close;
ods output OneWayFreqs=work.Freqs(keep=table) ;

proc freq data=sashelp.class;
table _character_ /nocum nopercent missing;
run;

ods output close;
ods listing;

title "Unique Values per Character Variable";
proc sql;
select substr(Table,6) as Variable_Name format=$32. ,count(*) as N_UniqueValues format=8.
from work.Freqs
group by Table
;
quit;
title;


HTH
Patrick
Ksharp
Super User
Hmmmmm...
I think there will be a way to get the result you want.try it.


[pre]
proc transpose data=sashelp.class out=class;
by _character_;
var _character_;
run;
proc sort data=class(keep= _name_ col1) out=temp;
by _name_ col1;
run;
data freq;
set temp;
by _name_ col1;
retain count;
if first._name_ then count=0;
if last.col1 then count+1;
if last._name_ then output;
keep _name_ count;
run;
proc print data=freq(rename=( _name_ = var_name)) noobs;
run;
[/pre]

Ksharp
rockerd
Fluorite | Level 6
Patrick and KSharp., thank you for the reply.

The method suggested by Patrick works for me.

Again thanks

Rockerd.
data_null__
Jade | Level 19
> The method suggested by Patrick works for me.

More easily obtained with PROC FREQ LEVELS

[pre]
Number of Variable Levels

Variable Levels
--------------------
Name 19
Sex 2
[/pre]
Ksharp
Super User
Hi . data null;
You must work hard on documentation. It is pretty easy.
I admire you really. :_)
ArtC
Rhodochrosite | Level 12
Thank you Data _NULL_ for reminding me of these options. I believe that the LEVELS option is an alias for the NLEVELS option. Without a TABLE statement you not only get the summary shown by Data _NULL_, but also a summary for each level of each variable. Adding a TABLE statement yields only the overall summary for the selected variables. Since you want all variables try the following:

[pre]
proc freq data=sashelp.class nlevels;
table _all_ / noprint;
run;
[/pre]

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 69304 views
  • 10 likes
  • 7 in conversation