DATA Step, Macro, Functions and more

How do I collapse variables using multiple methods within a BY Statement?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

How do I collapse variables using multiple methods within a BY Statement?

[ Edited ]

I'm using SAS Enterprise Guide 7.1...I have a dataset that I am working to aggregate variables from using two methods: 1) sum and 2) weighted by observation total. A sample of the sort of data that I'm working with is below. So I have multiple IDs that are associated with a single Code variable. Ideally, I'm trying to combine the data to look like the output dataset below (there are something like 2,000 codes with anywhere from 2 to 15 IDs associated with each). Some of the variables are combined simply by summing each observation (i.e. delay) and some are combined by summing the length % times the variable (i.e. at_rur, at_sub, and postspd).

 

Is there a way to use something like proc means with a by statement to produce the new dataset, but still be able to define how the variables are combined? Thank you in advance for your help!

 

i.e. 

input dataset

IDCodeLengthLength %at_rurat_subpostspddelay
100380107-10114          1.29          0.281055              -  
100494107-10114          0.55          0.121055          0.17
100382107-10114          0.39          0.081055              -  
100341107-10114          0.59          0.131055              -  
100386107-10114          0.56          0.120155              -  
100539107-10114          0.61          0.130155              -  
103878107-10114          0.37          0.080150              -  
100551107-10114          0.18          0.040150              -  
103029107-10114          0.13          0.030150              -  
 Total          4.66                  0.61                 0.39              54.28          0.17

 

output dataset

CodeLengthat_rurat_subpostspddelay
107-10114           4.66          0.61          0.39              54.28                 0.17

Accepted Solutions
Solution
‎02-12-2018 10:50 AM
Frequent Contributor
Posts: 109

Re: How do I collapse variables using multiple methods within a BY Statement?

[ Edited ]
Posted in reply to trphelps15

 

proc sql;
create table summery_table as 
select Code, 

sum(Length) as Length,

sum(Length_percentage*at_rur) as at_rur, 

sum(Length_percentage*at_sub) as at_sub, 

sum(Length_percentage*postspd) as postspd ,

sum(delay) as delay from input_dataset group by Code; quit;

View solution in original post


All Replies
Frequent Contributor
Posts: 109

Re: How do I collapse variables using multiple methods within a BY Statement?

Posted in reply to trphelps15
It is not clear how you want the output DS to look like, could you please elaborate.
Occasional Contributor
Posts: 6

Re: How do I collapse variables using multiple methods within a BY Statement?

Posted in reply to Satish_Parida

I've updated the post to include what the final dataset should look like using the input dataset. I also updated some of the text, so hopefully it is more clear. The input dataset has anywhere from 2 to 15 IDs associated with each Code and there are 1,992 Codes. I'm basically just trying to compile the ID variables into the Codes, but I want to use multiple methods when compiling them. I know I can use proc means with a BY group to output a new dataset that sums all of the variables, but I want some of the variables (i.e. at_rur in my example) to sum by weights. So in my example, at_rur =

 

(0.28*1)+(0.12*1)+(0.08*1)+(0.13*1)+(0.12*0)...(0.03*0) = 0.61

 

Same idea for the at_sub and postspd variables. However, the length and delay variables are just a standard sum.

 

Hopefully this helps! Thank you.

Solution
‎02-12-2018 10:50 AM
Frequent Contributor
Posts: 109

Re: How do I collapse variables using multiple methods within a BY Statement?

[ Edited ]
Posted in reply to trphelps15

 

proc sql;
create table summery_table as 
select Code, 

sum(Length) as Length,

sum(Length_percentage*at_rur) as at_rur, 

sum(Length_percentage*at_sub) as at_sub, 

sum(Length_percentage*postspd) as postspd ,

sum(delay) as delay from input_dataset group by Code; quit;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 113 views
  • 0 likes
  • 2 in conversation