Re: DISTINCT and GROUP BY

TimA · Posted 07-18-2012 07:58 AM

Dear All,

I have a question about SAS behaviour to understand how a query is executed.

Is there anyone who could help me understand what SAS thinks/judges like, in this case? I added the input and output test data if it helps.

I would like to understand what are the steps in SAS judgement in this case e.g. I think he first does the groupping on DISTINCT values on column1:

a group for anna

a group for jim

a group for ben

etc.

For each group, it calculates the maximum value on Column2. e.g. 50

For each row in the group,

e.g. anna 50 4g, if the value on Column2 = maximum, it keeps that row.

e.g. anna 50 2g, if the value on Column2 = maximum, it keeps that row.

e.g. anna 50 4g, if the value on Column2 = maximum, it keeps that row.

e.g. anna 20 1g, if the value on Column2 not = maximum, it turns that row into: anna 50 BLANK.

Through this logic, if I apply the DISTINCT at this stage (at the end), I do get the output that SAS gets.

Is this correct judgment? Especially the logic I am guessing for the usage of DISTINCT. Is DISTINCT meant to be applied generally as a last step in queries that have group by, or can I not make a general statement about that?

On the other hand, I see the DISTINCT is also applied in the beginning to have groups for each distinct value in Column1. Is distinct really being applied twice? Any generally statements that we can issue based on this case? Can I alter the code to have DISTINCT applied only once, in the end, for the sake of the exersize and for a better comprehension of SAS?

Thank you very much in advance!

Anna

PROC SQL;

CREATE TABLE WORK.TABLE1 AS

SELECT DISTINCT t1.'Column1'n,

/* MAX Value */

(MAX(t1.Column2'n)) AS 'MAX Value'n,

/* PL */

(CASE WHEN t1.'Column2'n = MAX(t1.'Column2'n)

THEN t1.'Column3'n

ELSE ""

END) AS PL

FROM WORK.TEST AS t1

GROUP BY t1.'Column1'n;

QUIT;

Input:

jim 1788 5F

jim 2000 9F

jim 500 9F

ben 190 4H

matt 400 46

matt 20 3G

jim 2000 52

jim 2000 9F

ben2 190 4H

ben2 34 2g

jess 190 4h

jess 120 4h

jess 190 4h

jess 13 35

jerr 11 3g

ben 190 4H

a 400 9f

a 300 6g

b 400 9f

b 400 8f

b 300 6g

b 20 3n

Output:

a 400 BLANK

a 400 9f

b 400 BLANK

b 400 8f

b 400 9f

ben 190 4H

ben2 190 BLANK

ben2 190 4H

jerr 11 3g

jess 190 BLANK

jess 190 4h

jim 2000 BLANK

jim 2000 52

jim 2000 9F

matt 400 BLANK

matt 400 46

For me this topic is of particular interest because I am doing conversions to and from another database system environment and I am sometimes "appalled" at the differences ins yntax /logic.

In another environment (can I name it here?) the same query triggers an errors prompting to put all the 3 columns in the group by clause.

Thank you!

Howles · Posted 07-20-2012 01:57 PM

1. Try

proc sql _method ;

2. There are two completely separate usages of DISTINCT. One operates to remove redundant arguments to a summary function. The other eliminates redundant rows from a result set.

3. Automatic "remerging" of summary data with original detail is an extension to standard SQL. It is supported by SAS but not by all vendors.

4. You can name competing vendors.

DISTINCT and GROUP BY

Re: DISTINCT and GROUP BY

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away