PROC SQL SELECT DISTINCT vs GROUP BY

pchappus — Tue, 29 May 2018 20:53:14 GMT

Anybody have some good resources on using select distinct vs group by? I feel hesitant when using them and find myself confused as to which one to use or if there are times you need to use both.

Thanks!

Paul

Re: PROC SQL SELECT DISTINCT vs GROUP BY

Patrick — Tue, 29 May 2018 21:29:25 GMT

@pchappus

Run below and examine the results. May be that explains things to you already a bit more.

In a nutshell: You use DISTINCT to de-duplicate rows, you use GROUP BY to aggregate values by the variables in the group by statement.

data have;
  input groupvar nvar cvar $;
  datalines;
1 10 A
1 10 A
1 20 B
2 50 B
2 40 B
;
run;

proc sql;
  select distinct groupvar, nvar, cvar
  from have
  ;
  select groupvar, sum(nvar) as sum_nvar, cvar
  from have
  group by groupvar
  ;
  select distinct groupvar, sum(nvar) as sum_nvar, cvar
  from have
  group by groupvar
  ;

  select distinct groupvar, sum(nvar) as sum_nvar, cvar
  from have
  group by groupvar
  having sum(nvar)>80
  ;

quit;

topic PROC SQL SELECT DISTINCT vs GROUP BY in SAS Programming

PROC SQL SELECT DISTINCT vs GROUP BY

Re: PROC SQL SELECT DISTINCT vs GROUP BY