I'm using SAS Enterprise Guide 7.1...I have a dataset that I am working to aggregate variables from using two methods: 1) sum and 2) weighted by observation total. A sample of the sort of data that I'm working with is below. So I have multiple IDs that are associated with a single Code variable. Ideally, I'm trying to combine the data to look like the output dataset below (there are something like 2,000 codes with anywhere from 2 to 15 IDs associated with each). Some of the variables are combined simply by summing each observation (i.e. delay) and some are combined by summing the length % times the variable (i.e. at_rur, at_sub, and postspd).
Is there a way to use something like proc means with a by statement to produce the new dataset, but still be able to define how the variables are combined? Thank you in advance for your help!
i.e.
input dataset
ID | Code | Length | Length % | at_rur | at_sub | postspd | delay |
100380 | 107-10114 | 1.29 | 0.28 | 1 | 0 | 55 | - |
100494 | 107-10114 | 0.55 | 0.12 | 1 | 0 | 55 | 0.17 |
100382 | 107-10114 | 0.39 | 0.08 | 1 | 0 | 55 | - |
100341 | 107-10114 | 0.59 | 0.13 | 1 | 0 | 55 | - |
100386 | 107-10114 | 0.56 | 0.12 | 0 | 1 | 55 | - |
100539 | 107-10114 | 0.61 | 0.13 | 0 | 1 | 55 | - |
103878 | 107-10114 | 0.37 | 0.08 | 0 | 1 | 50 | - |
100551 | 107-10114 | 0.18 | 0.04 | 0 | 1 | 50 | - |
103029 | 107-10114 | 0.13 | 0.03 | 0 | 1 | 50 | - |
Total | 4.66 | 0.61 | 0.39 | 54.28 | 0.17 |
output dataset
Code | Length | at_rur | at_sub | postspd | delay |
107-10114 | 4.66 | 0.61 | 0.39 | 54.28 | 0.17 |
proc sql;
create table summery_table as
select Code,
sum(Length) as Length,
sum(Length_percentage*at_rur) as at_rur,
sum(Length_percentage*at_sub) as at_sub,
sum(Length_percentage*postspd) as postspd ,
sum(delay) as delay
from input_dataset
group by Code;
quit;
I've updated the post to include what the final dataset should look like using the input dataset. I also updated some of the text, so hopefully it is more clear. The input dataset has anywhere from 2 to 15 IDs associated with each Code and there are 1,992 Codes. I'm basically just trying to compile the ID variables into the Codes, but I want to use multiple methods when compiling them. I know I can use proc means with a BY group to output a new dataset that sums all of the variables, but I want some of the variables (i.e. at_rur in my example) to sum by weights. So in my example, at_rur =
(0.28*1)+(0.12*1)+(0.08*1)+(0.13*1)+(0.12*0)...(0.03*0) = 0.61
Same idea for the at_sub and postspd variables. However, the length and delay variables are just a standard sum.
Hopefully this helps! Thank you.
proc sql;
create table summery_table as
select Code,
sum(Length) as Length,
sum(Length_percentage*at_rur) as at_rur,
sum(Length_percentage*at_sub) as at_sub,
sum(Length_percentage*postspd) as postspd ,
sum(delay) as delay
from input_dataset
group by Code;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.