I'm using SAS Enterprise Guide 7.1...I have a dataset that I am working to aggregate variables from using two methods: 1) sum and 2) weighted by observation total. A sample of the sort of data that I'm working with is below. So I have multiple IDs that are associated with a single Code variable. Ideally, I'm trying to combine the data to look like the output dataset below (there are something like 2,000 codes with anywhere from 2 to 15 IDs associated with each). Some of the variables are combined simply by summing each observation (i.e. delay) and some are combined by summing the length % times the variable (i.e. at_rur, at_sub, and postspd).
Is there a way to use something like proc means with a by statement to produce the new dataset, but still be able to define how the variables are combined? Thank you in advance for your help!
i.e.
input dataset
ID | Code | Length | Length % | at_rur | at_sub | postspd | delay |
100380 | 107-10114 | 1.29 | 0.28 | 1 | 0 | 55 | - |
100494 | 107-10114 | 0.55 | 0.12 | 1 | 0 | 55 | 0.17 |
100382 | 107-10114 | 0.39 | 0.08 | 1 | 0 | 55 | - |
100341 | 107-10114 | 0.59 | 0.13 | 1 | 0 | 55 | - |
100386 | 107-10114 | 0.56 | 0.12 | 0 | 1 | 55 | - |
100539 | 107-10114 | 0.61 | 0.13 | 0 | 1 | 55 | - |
103878 | 107-10114 | 0.37 | 0.08 | 0 | 1 | 50 | - |
100551 | 107-10114 | 0.18 | 0.04 | 0 | 1 | 50 | - |
103029 | 107-10114 | 0.13 | 0.03 | 0 | 1 | 50 | - |
Total | 4.66 | 0.61 | 0.39 | 54.28 | 0.17 |
output dataset
Code | Length | at_rur | at_sub | postspd | delay |
107-10114 | 4.66 | 0.61 | 0.39 | 54.28 | 0.17 |
proc sql;
create table summery_table as
select Code,
sum(Length) as Length,
sum(Length_percentage*at_rur) as at_rur,
sum(Length_percentage*at_sub) as at_sub,
sum(Length_percentage*postspd) as postspd ,
sum(delay) as delay
from input_dataset
group by Code;
quit;
I've updated the post to include what the final dataset should look like using the input dataset. I also updated some of the text, so hopefully it is more clear. The input dataset has anywhere from 2 to 15 IDs associated with each Code and there are 1,992 Codes. I'm basically just trying to compile the ID variables into the Codes, but I want to use multiple methods when compiling them. I know I can use proc means with a BY group to output a new dataset that sums all of the variables, but I want some of the variables (i.e. at_rur in my example) to sum by weights. So in my example, at_rur =
(0.28*1)+(0.12*1)+(0.08*1)+(0.13*1)+(0.12*0)...(0.03*0) = 0.61
Same idea for the at_sub and postspd variables. However, the length and delay variables are just a standard sum.
Hopefully this helps! Thank you.
proc sql;
create table summery_table as
select Code,
sum(Length) as Length,
sum(Length_percentage*at_rur) as at_rur,
sum(Length_percentage*at_sub) as at_sub,
sum(Length_percentage*postspd) as postspd ,
sum(delay) as delay
from input_dataset
group by Code;
quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.