Hello,
Could anyone help, on how to count distinct values for a character type field in a SAS dataset? Like, I have a field of email addresses in a dataset, and I want to count the distinct email addresses only. I could get the desired results using PROC SQL, but I wanted to know if there is any other way to do the same.
Thanks in advance!
There are, as always many ways to acheive the result.
Proc SQL - as you mention
Data step, count each row.
Proc means:
http://www2.sas.com/proceedings/sugi29/240-29.pdf
Proc freq:
Proc summary:
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473735.htm
Thanks for your help.
Could you please give a small example for this?
Thank you!
With the help of the links provided, try to use them. If you run into obstacles, then get back to us.
Proc means- does not give any results for Character type variable.
Proc freq- I did not get the desired output. Like, I have a field of email address in the dataset, using proc freq displayed all email addresses with count as 1 in front of each.
Proc summary- I have never used, I am trying and it runs into errors.
Hello @PoojaP1,
PROC SORT with NODUPKEY and OUT= options would be another method (in addition to the approaches listed by RW9).
Please note that some approaches differ in how missing values (and special missing values) are handled. For example, a COUNT(DISTINCT ...) in PROC SQL excludes missing values, whereas one missing value (of each kind in case of special missing values), if any, would remain after a NODUPKEY sort.
The optimal method will depend on the number of observations in the dataset, the expected number of distinct values and other considerations, e.g., whether the dataset is already sorted by that character variable.
Hello,
Another technique is using Hash tables and SUMINC
http://www.pharmasug.org/proceedings/2014/CC/PharmaSUG-2014-CC02.pdf
Best regards,
1) proc freq data=sashelp.class nlevels; table sex; run;2) data _null_; if 0 then set sashelp.class; declare hash h(dataset:'sashelp.class'); h.definekey('sex'); h.definedone(); nlevels=h.num_items; putlog "NOTE:" nlevels; stop; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.