04-29-2016 05:33 AM
Could anyone help, on how to count distinct values for a character type field in a SAS dataset? Like, I have a field of email addresses in a dataset, and I want to count the distinct email addresses only. I could get the desired results using PROC SQL, but I wanted to know if there is any other way to do the same.
Thanks in advance!
04-29-2016 06:05 AM
There are, as always many ways to acheive the result.
Proc SQL - as you mention
Data step, count each row.
04-29-2016 06:40 AM
Proc means- does not give any results for Character type variable.
Proc freq- I did not get the desired output. Like, I have a field of email address in the dataset, using proc freq displayed all email addresses with count as 1 in front of each.
Proc summary- I have never used, I am trying and it runs into errors.
04-29-2016 06:49 AM
PROC SORT with NODUPKEY and OUT= options would be another method (in addition to the approaches listed by RW9).
Please note that some approaches differ in how missing values (and special missing values) are handled. For example, a COUNT(DISTINCT ...) in PROC SQL excludes missing values, whereas one missing value (of each kind in case of special missing values), if any, would remain after a NODUPKEY sort.
The optimal method will depend on the number of observations in the dataset, the expected number of distinct values and other considerations, e.g., whether the dataset is already sorted by that character variable.
04-29-2016 10:03 AM
Another technique is using Hash tables and SUMINC
04-30-2016 12:17 AM
1) proc freq data=sashelp.class nlevels; table sex; run;2) data _null_; if 0 then set sashelp.class; declare hash h(dataset:'sashelp.class'); h.definekey('sex'); h.definedone(); nlevels=h.num_items; putlog "NOTE:" nlevels; stop; run;