Hello,
Here is generic code
proc sql; select *, max(datex) FORMAT=DATETIME25.6 as datexx from table1 as a
group by id1, id2
having (calculated datexx) = a.datex
order by id1, id2;
quit;
The purpose is to keep only those with max date value. Question is about the scope of max function. This max action is column wise obviously. with 'group by'n effect, it will run max within the groups formed by id1, id2, right? I think so. Does anybody have other opinion? Thanks.
Jia
The SQL Procedure User's Guide contains reference documentation that explains everything, so you don't need an opinion of the interaction of GROUP BY and HAVING.
Specifies how to group the data for summarizing.
The HAVING clause is used with at least one summary function and an optional GROUP BY clause to summarize groups of data in a table. A HAVING clause is any valid SQL expression that is evaluated as either true or false for each group in a query. Alternatively, if the query involves remerged data, then the HAVING expression is evaluated for each row that participates in each group. The query must include one or more summary functions.Typically, the GROUP BY clause is used with the HAVING expression and defines the group or groups to be evaluated. If you omit the GROUP BY clause, then the summary function and the HAVING clause treat the table as one group.
The SQL Procedure User's Guide contains reference documentation that explains everything, so you don't need an opinion of the interaction of GROUP BY and HAVING.
Specifies how to group the data for summarizing.
The HAVING clause is used with at least one summary function and an optional GROUP BY clause to summarize groups of data in a table. A HAVING clause is any valid SQL expression that is evaluated as either true or false for each group in a query. Alternatively, if the query involves remerged data, then the HAVING expression is evaluated for each row that participates in each group. The query must include one or more summary functions.Typically, the GROUP BY clause is used with the HAVING expression and defines the group or groups to be evaluated. If you omit the GROUP BY clause, then the summary function and the HAVING clause treat the table as one group.
I inherited a lot of code loaded with proc SQL. I am not going to spend expensive paid company time to read documentation like my data steps are not good enough. If a quick confirmation or 'no you are wrong' is not available, I will just rewrite it to data step and move on. Whichever way is faster.
Thank you Richard.
Jia
not going to spend expensive paid company time to read documentation
I consider any company that discourages learning on the job a hopeless case, not worth wasting time on.
What do you think has made the senior helpers here into experts? Exactly what you disparage in your comment. Thinking like that is a serious mistake, to your personal detriment and to the detriment of your employer.
When I first saw DS2, I converted several proc sql programs to it. DS2 runs faster. Then I realized DS2 is a life saver because it finishes many SQL jobs that otherwise will die on proc sql. So I spent a lot, buying Jordan's book, enduring all the declare statement... to learn DS2. That brings a lot of joy and value to myself and my organizations. What are you going to label me then? great learner? What are you going to label me if I tell you I am not going to spend time learning other SQL like FEDSQL? "you refuse to learn, you are not good..."
After DS2, I spent some time learning programing on proc IMSTAT because it is yet another faster technology, better than SAS foundation that underlies proc sql. Now I have had yet good practice with Viya. Throughout this learning process and years, I took jobs that asked me to rewrite SAS to first R and then Python. I therefore do know first hand that not all packages interact "group by" and 'having'-like the same way SAS does.
At my current job I am helping with modernizing things. The starting point is pile of legacy proc SQL code, among others. Now not only i need to recall what I learned from Howard Schreier's "proc sql by example" book way back, I also have to separate signals from non-SAS package behavior. So I thought I came here to ask people who can just spit out tome quick answers.
So please do not judge companies who expect employees to selectively learn, to learn with value goals in mind, yearly, quarterly if not daily. If you learn SAS, try to learn something that runs on better, modern technologies. In other words, modernize yourself while learning. The first time I read SAS BASE documentation was ~1991 when there was no PDF. From UNIX/VAX, to mainframe, now Windows. Sometime all platforms together. I don't remember how many times I went through documentations, I still forget. I know who is good programmer when I see one. Actually very good ones often find problems with documentation and wrote books/papers about them. Schreier, Virgile, Dorfman, Whitlock? to recall a few. You are right practice makes it good (expertise not guaranteed) . On the other hand, specialty is specialty. If you ask me how HPFOREST works, or compare HPFOREST with Python, I am not going to send you to read documentation or go to stackoverflow.com. I can share with you 3-15 bullet points depending on how much detail you like. Just be humble, respect the respectable if you want good ROI of your time.
Jia
thanks for your question
I might simplify the code as follows to still get the same results as your original code.
proc sql; select * from sashelp.mdv group by city having shipdate =max(shipdate); quit; %put &=sqlobs;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.