- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi I would be gratefull if anyone could tell me if is there posibility for counting distinct values in summary or means procedure?
I know that I can do that by proc sql, however for cohesion of my code I would prefere use proc summary or means.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SUMMARY or MEANS alone don't do that, but a combination with PROC FREQ can do it:
proc freq data=sashelp.class noprint;
table sex / out=class;
run;
proc summary data=class n;
output out=want (drop=_type_);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes and no. SUMMARY/MEANS isn't built for that, so you have to jump through a hoop or two. For example, suppose you want to know how many DISTINCT STATE values appear in your data set:
proc summary data=have nway;
class state;
var some_numeric_variable;
output out=stats (keep=state _freq_) n=;
run;
Now the number of observations in STATS is the number of STATE values that appeared in the original data set HAVE. So it takes a little extra processing to capture that number, such as:
data _null_;
set stats nobs=n_states;
put n_states=;
stop;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Astounding's answer reveals a solution with two PROC SUMMARY steps:
proc summary data=sashelp.class nway;
class sex;
var age; * just some numeric variable for syntactical completeness;
output out=class (keep=sex _freq_) n=;
run;
proc summary data=class n;
output out=want (drop=_type_);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The solution put together by @Astounding and @Kurt_Bremser ended up working nicely in my situation. I wanted to use proc means to count distinct values because I am utilizing the types statement to get a variety of summary rows, which wouldn't work as easily in other procedures. My implementation looked something like this:
proc means data=input_data noprint;
class category subcategory subject;
types category*subcategory*subject category*subject subject;
output out=data1(rename=_type_=type) n=n;
run;
proc means data=data1 noprint nway missing;
class category subcategory type;
var n;
output out=data2(drop=_:) n=n sum=sum;
run;
In fact, with that method, I was able to to get both a count of distinct subjects (the n column in data2) and a count of all rows (the sum column in data2).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use proc means and proc summary, if you have a numeric variable in your data set or create a numeric variable for that purpose.
Data A;
Do ID = 'A', 'B', 'T', 'Z';
Do NumVar = 1 To Ceil(Ranuni(1)*10);
Output;
End;
End;
Run;
Proc Means Data=A NoPrint;
By ID;
Var NumVar;
Output Out=Want (Keep=ID _FREQ_) N=; * remove _FREQ_ if not required;
Run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PROC FREQ can do that.
ods select none;
ods output nlevels=want;
proc freq data=sashelp.class nlevels;
table _all_;
run;
ods select all;
proc print;run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I `m sorry I was not very clear, I need something like this
proc summary data=have missing;
class id1 id2 data;
var var1 var2 var3;
types id1*id2*id3;
output out=wan sum(var1)= max(var2) n(DISTINCT var3)=;/
run;
of course this way is not possible
I am able to do that, with two separate procedures and then merging result datasets, however, I was curious if it`s possible with one procedure or dataset just to make my code clearer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Provide a small example data set and what the result should be.
I am not even sure we are all using "distinct" in the same way.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can do what I want in proc sql:
proc sql;
create table want as
select id1, id2, date sum(var1), max(var2), count(distinct (var3)) from have;
quit;
I would like to know if there is option in proc summary/means for do the same.