Hi,
Please can someone explain the importance of the 'group by' statement when using proc sql and examples of how it should be used (i.e. when it should be 'group by 1,2,3;' etc. - is it the number of variables selected i.e. if five variables are selected, it would be 'group by 1,2,3,4,5;' and excludes any sum(variable) as [new variable name] when considering how many to group by? I have an example below of my code - can it be reviewed to see if it is correct and whether any changes need to be made? Thanks!
Code:
proc sort data=test;
by ID;
run;
proc sql;
create table consumer_balance as
select ID,
sum(balance) as Total_balance,
count(*) as Volume
from test
group by 1
;
quit;
Your code is correct for producing summaries by ID.
Personally, I prefer to use the column names instead of the position numbers, as this is robust against changes in the positional lineup of result columns, and easier to read and maintain.
Your code is correct for producing summaries by ID.
Personally, I prefer to use the column names instead of the position numbers, as this is robust against changes in the positional lineup of result columns, and easier to read and maintain.
PS SQL does not need the extra sort, it does the necessary sorting on its own. Only under certain circumstances can the external sort help SQL in preventing bad performance or exceeding resource limits.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.