BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rockerd
Fluorite | Level 6
Suppose I have a dataset with 100 char and 100 numeric variables and I want to count the no. of unique observations in char variables only ...


I know we could use the count distinct in Proc SQL but I wanted to know if there is any method using data step approach...

Thank you,

Rockerd.
1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19
> The method suggested by Patrick works for me.

More easily obtained with PROC FREQ LEVELS

[pre]
Number of Variable Levels

Variable Levels
--------------------
Name 19
Sex 2
[/pre]

View solution in original post

13 REPLIES 13
GertNissen
Barite | Level 11
[pre]data test;
input char1 $ char2 $ num1 num2;
datalines;
A B 1 2
A B 1 2
A C 1 2
A C 1 2
D B 4 2
D B 1 2
;
run;

proc sort data=work.test
out=char nodupkey;
by _char_;
run;

proc sort data=work.test
out=num nodupkey;
by _numeric_;
run;
deleted_user
Not applicable
You could sort the dataset by the variable you want to count and then use the retain function to count every time you find a new value.

proc sort data=x;
by CharVar1;
run;

data x1(keep=count);
set x end=eof;
by CharVar1;

retain count 0;

if first.CharVar1 then count=count+1;

if eof then output;
run;
ArtC
Rhodochrosite | Level 12
Ron Fehd (the Macro Maven) has written a couple of macros that do what you are asking. Especially his %showall. See http://www2.sas.com/proceedings/sugi22/POSTERS/PAPER236.PDF
rockerd
Fluorite | Level 6
Thank you for the reply.

I had 100 character variables.

Do I have to sort every one individually and then follow this procedure of nodupkey for every variable? I feel that would be time consuming.


Is there any way where I could do it without having to sort the data?

Like in a Proc print you could specify :
sum var_name;
Is there any method similar to this in SAS where in I could say count(distinct (var_name)) without using proc SQL?
data_null__
Jade | Level 19
I think PROC FREQ is what you need. TABLES _CHAR_;
rockerd
Fluorite | Level 6
actually I was trying for proc freq ...

but when I say out = dataset_name .. it outputs only the last char variable it has encountered ... so counting for distinct values is a little difficult with proc freq.
data_null__
Jade | Level 19
[pre]ods output onewayfreqs=freq;[/pre]
Patrick
Opal | Level 21
Hi

As I understand your question you're actually not looking for a count of unique observations but rather for a count of unique values per character variable.

If so then you have to scan the data set per variable. Luckily there are PROC's doing this for us - and they do it normally more efficient than we would code it.

Below approach uses Proc Freq to get a "record" per character variable and distinct value of this variable (missings included) and then uses ODS Output to collect the result and write it to a data set.

ods _all_ close;
ods output OneWayFreqs=work.Freqs(keep=table) ;

proc freq data=sashelp.class;
table _character_ /nocum nopercent missing;
run;

ods output close;
ods listing;

title "Unique Values per Character Variable";
proc sql;
select substr(Table,6) as Variable_Name format=$32. ,count(*) as N_UniqueValues format=8.
from work.Freqs
group by Table
;
quit;
title;


HTH
Patrick
Ksharp
Super User
Hmmmmm...
I think there will be a way to get the result you want.try it.


[pre]
proc transpose data=sashelp.class out=class;
by _character_;
var _character_;
run;
proc sort data=class(keep= _name_ col1) out=temp;
by _name_ col1;
run;
data freq;
set temp;
by _name_ col1;
retain count;
if first._name_ then count=0;
if last.col1 then count+1;
if last._name_ then output;
keep _name_ count;
run;
proc print data=freq(rename=( _name_ = var_name)) noobs;
run;
[/pre]

Ksharp
rockerd
Fluorite | Level 6
Patrick and KSharp., thank you for the reply.

The method suggested by Patrick works for me.

Again thanks

Rockerd.
data_null__
Jade | Level 19
> The method suggested by Patrick works for me.

More easily obtained with PROC FREQ LEVELS

[pre]
Number of Variable Levels

Variable Levels
--------------------
Name 19
Sex 2
[/pre]
Ksharp
Super User
Hi . data null;
You must work hard on documentation. It is pretty easy.
I admire you really. :_)
ArtC
Rhodochrosite | Level 12
Thank you Data _NULL_ for reminding me of these options. I believe that the LEVELS option is an alias for the NLEVELS option. Without a TABLE statement you not only get the summary shown by Data _NULL_, but also a summary for each level of each variable. Adding a TABLE statement yields only the overall summary for the selected variables. Since you want all variables try the following:

[pre]
proc freq data=sashelp.class nlevels;
table _all_ / noprint;
run;
[/pre]

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 66173 views
  • 9 likes
  • 7 in conversation