I tried the following code. I came to know I want not able to find the distinct count if the variable missing in SQL is there a way to circumvent this issue? I have the data with 2 distinct names ( alpha, " ") but when I do distinct count I am only getting one count. Thanks
data xx;
name= "alpha";
output;
name=" ";
output;
run;
proc sql;
create table xx1 as select count(distinct name) as cnt from xx ;
quit;
Hi @SASuserlot,
You could add 1 for missing values:
select count(distinct name)+max(name=" ") as cnt
Or count distinct hexadecimal representations in your character variable of length 5:
select count(distinct put(name,$hex10.)) as cnt
Is this what you are attempting to do?
Note I changed your xx dataset, as I suspect you want a count of the distinct values ie Alpha count = 5 _blank_ count = 3 in my example
data have;data xx;
do i=1 to 5 ;
name= "alpha";
output;
end ;
do i=1 to 3 ;
name=" ";
output;
end ;
run;
proc sql;
create table xx1 as
select distinct name, count(*)
from xx
group by name ;
quit;
If this isn't want you are looking for can you provide an example with a have and want dataset (ie what you have, and what you want after processing)
Also provide more detailed explanation of what you are attempting to do
Your example is good but not the quite I am looking for . You have two different names (alpha, ' ')out of 10 observations. I was looking for how many different names out of 10 observations not count for 'alpha' or count of missing. @FreelanceReinh response is working for me as of now
Hi @SASuserlot,
You could add 1 for missing values:
select count(distinct name)+max(name=" ") as cnt
Or count distinct hexadecimal representations in your character variable of length 5:
select count(distinct put(name,$hex10.)) as cnt
Thanks for your response. your both solutions working for me as of now. Additional question in second solutions: how $hex10. doing the job? By any chance if my character length for 'name' changes to 200 in future, is $hex10. works?
@SASuserlot wrote:
Additional question in second solutions: how $hex10. doing the job? By any chance if my character length for 'name' changes to 200 in future, is $hex10. works?
The width w of the $HEXw. format must be greater than or equal to twice the defined length of the character variable to be formatted. So use $hex400. for a length-200 variable (Edit: assuming the usual single-byte characters, not Unicode, etc.). The other solution is independent of the length of variable NAME.
Thank you. How about if variable is numeric type?
@SASuserlot wrote:
How about if variable is numeric type?
You might need to know what values you can expect in the numeric variable: Are there special missing values (like .A, .B, ..., ._) and would you like to count these as distinct? Are there floating point values which may contain tiny rounding errors? In this case would you like to count, say, 3 and 3.00000000000001 as different numbers?
If no special missing values are involved (only the usual numeric missing value .), you can use
select count(distinct numvar)+max(numvar=.)
To include distinct special missing values in the count, use
select count(distinct put(numvar,hex16.))
(This time the width of the format is always 16.)
Both of the above solutions would distinguish between 3 and 3.00000000000001, etc. If you don't want this, use the ROUND function or apply a format that suits your precision goal.
Example:
select count(distinct put(numvar,best12.))
This would also include distinct special missing values, if any.
PROC FREQ has no problems
proc freq data=xx nlevels;
ods output nlevels=xx1;
tables name;
run;
Thanks for your response. Yes PROC freq don't have any issues. I was looking to explore with SQL. I think @FreelanceReinh response is working at present.
@SASuserlot wrote:
Thanks for your response. Yes PROC freq don't have any issues. I was looking to explore with SQL. I think @FreelanceReinh response is working at present.
That's fine. I presented the PROC FREQ answer for people reading along who aren't limiting themselves to PROC SQL. Or for people who prefer straightforward and readable and easily understandable code (as you can see PROC SQL requires a little bit more complicated and less intuitive code).
@PaigeMiller thank you, I really appreciate that. Just wanted let you I communicated correctly. I really appreciate your quick responses to my questions all the time.😀
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.