🔒 This topic is solved and locked.
Need further help from the community? Please
sign in and ask a new question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 05-29-2018 04:53 PM
(13966 views)
Anybody have some good resources on using select distinct vs group by? I feel hesitant when using them and find myself confused as to which one to use or if there are times you need to use both.
Thanks!
Paul
1 ACCEPTED SOLUTION
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Run below and examine the results. May be that explains things to you already a bit more.
In a nutshell: You use DISTINCT to de-duplicate rows, you use GROUP BY to aggregate values by the variables in the group by statement.
data have;
input groupvar nvar cvar $;
datalines;
1 10 A
1 10 A
1 20 B
2 50 B
2 40 B
;
run;
proc sql;
select distinct groupvar, nvar, cvar
from have
;
select groupvar, sum(nvar) as sum_nvar, cvar
from have
group by groupvar
;
select distinct groupvar, sum(nvar) as sum_nvar, cvar
from have
group by groupvar
;
select distinct groupvar, sum(nvar) as sum_nvar, cvar
from have
group by groupvar
having sum(nvar)>80
;
quit;
1 REPLY 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Run below and examine the results. May be that explains things to you already a bit more.
In a nutshell: You use DISTINCT to de-duplicate rows, you use GROUP BY to aggregate values by the variables in the group by statement.
data have;
input groupvar nvar cvar $;
datalines;
1 10 A
1 10 A
1 20 B
2 50 B
2 40 B
;
run;
proc sql;
select distinct groupvar, nvar, cvar
from have
;
select groupvar, sum(nvar) as sum_nvar, cvar
from have
group by groupvar
;
select distinct groupvar, sum(nvar) as sum_nvar, cvar
from have
group by groupvar
;
select distinct groupvar, sum(nvar) as sum_nvar, cvar
from have
group by groupvar
having sum(nvar)>80
;
quit;