Dear All,
I have a question about SAS behaviour to understand how a query is executed.
Is there anyone who could help me understand what SAS thinks/judges like, in this case? I added the input and output test data if it helps.
I would like to understand what are the steps in SAS judgement in this case e.g. I think he first does the groupping on DISTINCT values on column1:
a group for anna
a group for jim
a group for ben
etc.
For each group, it calculates the maximum value on Column2. e.g. 50
For each row in the group,
e.g. anna 50 4g, if the value on Column2 = maximum, it keeps that row.
e.g. anna 50 2g, if the value on Column2 = maximum, it keeps that row.
e.g. anna 50 4g, if the value on Column2 = maximum, it keeps that row.
e.g. anna 20 1g, if the value on Column2 not = maximum, it turns that row into: anna 50 BLANK.
Through this logic, if I apply the DISTINCT at this stage (at the end), I do get the output that SAS gets.
Is this correct judgment? Especially the logic I am guessing for the usage of DISTINCT. Is DISTINCT meant to be applied generally as a last step in queries that have group by, or can I not make a general statement about that?
On the other hand, I see the DISTINCT is also applied in the beginning to have groups for each distinct value in Column1. Is distinct really being applied twice? Any generally statements that we can issue based on this case? Can I alter the code to have DISTINCT applied only once, in the end, for the sake of the exersize and for a better comprehension of SAS?
Thank you very much in advance!
Anna
PROC SQL;
CREATE TABLE WORK.TABLE1 AS
SELECT DISTINCT t1.'Column1'n,
/* MAX Value */
(MAX(t1.Column2'n)) AS 'MAX Value'n,
/* PL */
(CASE WHEN t1.'Column2'n = MAX(t1.'Column2'n)
THEN t1.'Column3'n
ELSE ""
END) AS PL
FROM WORK.TEST AS t1
GROUP BY t1.'Column1'n;
QUIT;
Input:
jim 1788 5F
jim 2000 9F
jim 500 9F
ben 190 4H
matt 400 46
matt 20 3G
jim 2000 52
jim 2000 9F
ben2 190 4H
ben2 190 4H
ben2 34 2g
jess 190 4h
jess 190 4h
jess 120 4h
jess 190 4h
jess 13 35
jerr 11 3g
jerr 11 3g
jerr 11 3g
ben 190 4H
a 400 9f
a 400 9f
a 300 6g
b 400 9f
b 400 8f
b 300 6g
b 20 3n
Output:
a 400 BLANK
a 400 9f
b 400 BLANK
b 400 8f
b 400 9f
ben 190 4H
ben2 190 BLANK
ben2 190 4H
jerr 11 3g
jess 190 BLANK
jess 190 4h
jim 2000 BLANK
jim 2000 52
jim 2000 9F
matt 400 BLANK
matt 400 46
For me this topic is of particular interest because I am doing conversions to and from another database system environment and I am sometimes "appalled" at the differences ins yntax /logic.
In another environment (can I name it here?) the same query triggers an errors prompting to put all the 3 columns in the group by clause.
Thank you!
1. Try
proc sql _method ;
2. There are two completely separate usages of DISTINCT. One operates to remove redundant arguments to a summary function. The other eliminates redundant rows from a result set.
3. Automatic "remerging" of summary data with original detail is an extension to standard SQL. It is supported by SAS but not by all vendors.
4. You can name competing vendors.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.