Hello Folks,
I have a quick question on Data Remerging in PROC SQL.
I was going through the Book SAS Certification Prep Guide: Advanced Programming for SAS 9, Fourth Edition and in Chapter 2 (Performing Advanced Queries with PROC SQL) under the topic Understanding Data Remerging
The book says one of the condition under which Data Remerging will occur in PROC SQL is the following :
SAS(R) 9.3 SQL Procedure User's Guide
I am trying to understand the above condition. I understand the Data remerging when the HAVING Clause has a Column/Expression which is not in the GROUP BY Clause
For eg , the below PROC SQL will show Data Remerging because the HAVING Clause has a column Salary which is NOT in the GROUP BY Clause assuming the dataset sasuser.payrollmaster has columns JobCode and Salary
proc sql;
select jobcode, avg( salary) as AvgSalary format = dollar11.2
from sasuser.payrollmaster
group by JobCode
having salary > avg( salary);
Quit;
But unable to understand how the Data remerging occurs when the Column/Expression is not present in a Subquery.
Can someone help me with a PROC SQL example which shows Data remerging occuring if the HAVING clause has a Column/Expression which is NOT present in a Sub query.
Thanks,
Prashant
Although I think this illustrates the issue, in practice it usually occurs when the variable is in the SELECT statement and not summarized or used in a HAVING/GROUP BY that you'll get the remerge warning.
proc sql;
title 'Remerging example';
select sex, mean(weight) as weight
from sashelp.class
group by sex
having age>12;
title 'Values from original data, no HAVE clause';
select sex, age, mean(weight) as weight
from sashelp.class
group by sex;
title 'Not equivalent to WHERE';
select sex, age, mean(weight) as weight
from sashelp.class
where age>12
group by sex;
quit;
Thanks Reeza.
According to the Advanced SAS Prep Guide Data Remerging occurs in PROC SQL whenever any of the following conditions exist:
The Examples you provided do not involve any Sub queries but includes Scenario 2 above 3.
Is there a scenario where the Subquery in HAVING Clause causes Data Remerging as mentioned by scenario 3 above?
Where does the first example fall under?
Reeza,
Your First Example falls under Scenario 3 above where the HAVING Clause has a Column or Expression not in the GROUP BY Clause.
I was looking for a PROC SQL Example which involves a Sub query and HAVING Clause in Scenario 3 which causes Data Remerging.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
