Hi I would be gratefull if anyone could tell me if is there posibility for counting distinct values in summary or means procedure?
I know that I can do that by proc sql, however for cohesion of my code I would prefere use proc summary or means.
Thank you.
SUMMARY or MEANS alone don't do that, but a combination with PROC FREQ can do it:
proc freq data=sashelp.class noprint;
table sex / out=class;
run;
proc summary data=class n;
output out=want (drop=_type_);
run;
Yes and no. SUMMARY/MEANS isn't built for that, so you have to jump through a hoop or two. For example, suppose you want to know how many DISTINCT STATE values appear in your data set:
proc summary data=have nway;
class state;
var some_numeric_variable;
output out=stats (keep=state _freq_) n=;
run;
Now the number of observations in STATS is the number of STATE values that appeared in the original data set HAVE. So it takes a little extra processing to capture that number, such as:
data _null_;
set stats nobs=n_states;
put n_states=;
stop;
run;
@Astounding's answer reveals a solution with two PROC SUMMARY steps:
proc summary data=sashelp.class nway;
class sex;
var age; * just some numeric variable for syntactical completeness;
output out=class (keep=sex _freq_) n=;
run;
proc summary data=class n;
output out=want (drop=_type_);
run;
The solution put together by @Astounding and @Kurt_Bremser ended up working nicely in my situation. I wanted to use proc means to count distinct values because I am utilizing the types statement to get a variety of summary rows, which wouldn't work as easily in other procedures. My implementation looked something like this:
proc means data=input_data noprint;
class category subcategory subject;
types category*subcategory*subject category*subject subject;
output out=data1(rename=_type_=type) n=n;
run;
proc means data=data1 noprint nway missing;
class category subcategory type;
var n;
output out=data2(drop=_:) n=n sum=sum;
run;
In fact, with that method, I was able to to get both a count of distinct subjects (the n column in data2) and a count of all rows (the sum column in data2).
You can use proc means and proc summary, if you have a numeric variable in your data set or create a numeric variable for that purpose.
Data A;
Do ID = 'A', 'B', 'T', 'Z';
Do NumVar = 1 To Ceil(Ranuni(1)*10);
Output;
End;
End;
Run;
Proc Means Data=A NoPrint;
By ID;
Var NumVar;
Output Out=Want (Keep=ID _FREQ_) N=; * remove _FREQ_ if not required;
Run;
PROC FREQ can do that.
ods select none;
ods output nlevels=want;
proc freq data=sashelp.class nlevels;
table _all_;
run;
ods select all;
proc print;run;
I `m sorry I was not very clear, I need something like this
proc summary data=have missing;
class id1 id2 data;
var var1 var2 var3;
types id1*id2*id3;
output out=wan sum(var1)= max(var2) n(DISTINCT var3)=;/
run;
of course this way is not possible
I am able to do that, with two separate procedures and then merging result datasets, however, I was curious if it`s possible with one procedure or dataset just to make my code clearer.
Provide a small example data set and what the result should be.
I am not even sure we are all using "distinct" in the same way.
You can do what I want in proc sql:
proc sql;
create table want as
select id1, id2, date sum(var1), max(var2), count(distinct (var3)) from have;
quit;
I would like to know if there is option in proc summary/means for do the same.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.