DATA Step, Macro, Functions and more

The query requires remerging summary statistics back with the original data

Reply
Frequent Contributor
Posts: 88

The query requires remerging summary statistics back with the original data

I know this has been posted a ton already and I have searched the forums and google, but I still don't really understand what is happening when I get this note: The query requires remerging summary statistics back with the original data.

 

I am running this code:

 

proc sql;
create table collapsed_data_final_version as 
select *,
sum(msk_tx_yes) as msk_tx_sum,
sum(msk_cancel_tx_yes) as msk_cancel_tx_sum,
sum(msk_ca_yes) as msk_ca_sum, 
sum(msk_cancel_ca_yes) as msk_cancel_ca_sum, 
sum(msk_dc_yes) as msk_dc_sum,
sum(conc_psych_tx_yes) as conc_psych_tx_sum,
sum(conc_psych_ca_yes) as conc_psych_ca_sum,
sum (conc_psych_dc_yes) as conc_psych_dc_sum,
sum (conc_yes) as conc_sum,
sum (psych_yes) as psych_sum,
sum (surg_prog) as surg_sum
from test1
group by MRN;
quit;

can someone explain in VERY SIMPLY terms why I am getting this note and what I can do to fix this? Is this affecting my data?

 

What I want to happen with this code is that I want to create a table which sums the variables. I want to be able to keep all the other variables from the "test1" table while grouping by variable MRN.

 

thanks!

 

Super User
Posts: 19,878

Re: The query requires remerging summary statistics back with the original data

Posted in reply to christinagting0

In many SQL implementations, such as Oracle or MS SQL, the query would be invalid. 

They require every item in select clause to either 1) be an aggregate function, such as max/min/mean or 2) be part of the group by clause. SAS doesn't follow this restriction. This means that if you have items not in your group by clause you may have unique records and then the summarized data from the group by clause is merged with requested data and you get your end result. I'm not sure why SAS explicitly warns you, possibly because of the difference between the SQL or so you know it's using two passes of the data anyways. 

 

Its a Note, you don't need to do anything about it. If the results are what you want, to remove the note, you would need to manually calculate your stats per MRN in a summary table and then manually merge this with your original data. 

 

Ask a Question
Discussion stats
  • 1 reply
  • 793 views
  • 2 likes
  • 2 in conversation