Hello,
I have a dataset looks like below, I am wondering if it is possible for SAS to average vertically. Anyone have any idea how to achieve that?
Original Dataset (example):
AcctNum Date Balance Other_Variable
1 201301 A
1 201302 B
1 201303 C
1 201304 D
1 201305 E
1 201306 F
1 201307 G
1 201308 H
1 201309 I
1 201310 J
1 201311 K
1 201312 L
1 201401 M XXXXX (want to keep)
2 201301 N
. . .
. . .
. . .
Dataset want to achieve (average balance for past12 month- avg(B-M))
Date Avg_Bal Other Variable
1 201401 XXX (average for the past 12 month) (Keep all other variable in row 201401)
2 201401 XXX (average for the past 12 month)
3 201401 XXX (average for the past 12 month)
Thanks in advance!
Regards,
Frank
Assuming all of the data is to summarized for the same period 201302-201401 (and not varing by last date in the data which is a moderately obnoxious kettle of fish)
Something like this might help
proc summary data=have nway; /* have is the name of the dataset you already have*/
where 201302 le date le 201401;
class id;
var date balance;
output out=want(drop = _type_ _freq_) max(date) = date mean(balance)=Avg_bal
maxid(date(variable1) date(variable2)) = variable1 variable2;
run;
These should get the values of variables 1 and 2 associated with the largest value of date within the period and the average balance.
If you don't drop the _freq_ you'll have a variable that says how many months (my assumption) of data were used to calculate the average.
Some clarification looks like it is in order.
By past 12 months do you want the average from 201301 to 201312, 201302 to 201401 or a different period? Do all of the AcctNum have balances for the same period or are some shorter?
Do the values of the "other variables" change or are they the same for all of the records within the period you want to summarize? If they vary what is the rule for selection?
Sorry about the confusion. The 12 month is the month I want (in this example is 201401) backed 12 (so it is 201302-201401)
AcctNum Date Balance Variable1 Variable2
1 201301 20
1 201302 9
1 201303 1
1 201304 2
1 201305 5
1 201306 6
1 201307 10
1 201308 4 ......
1 201309 3 AB
1 201310 8 QC
1 201311 7 ON
1 201312 12 BC
1 201401 11 ON 999
2 201301 N
. . .
2 201401 XXX BC 457
. . .
Dataset want to achieve (average balance for past12 month- avg(B-M))
Date Avg_Bal Variable1 Variable2
1 201401 6.5 (1+2+....12)/12 ON 999
2 201401 XXX (average for the past 12 month) BC 457
3 201401 XXX (average for the past 12 month)
Assuming all of the data is to summarized for the same period 201302-201401 (and not varing by last date in the data which is a moderately obnoxious kettle of fish)
Something like this might help
proc summary data=have nway; /* have is the name of the dataset you already have*/
where 201302 le date le 201401;
class id;
var date balance;
output out=want(drop = _type_ _freq_) max(date) = date mean(balance)=Avg_bal
maxid(date(variable1) date(variable2)) = variable1 variable2;
run;
These should get the values of variables 1 and 2 associated with the largest value of date within the period and the average balance.
If you don't drop the _freq_ you'll have a variable that says how many months (my assumption) of data were used to calculate the average.
Thanks so much ! One more question, I am wondering is there a way to also include the observation month balance (201301), but keep the average for following 12 month (201302-201401)?
If you are going to want more than a single balance and maintain this going forward you need to be very specific and explicit about futher requirements such as what the resulting data set structure need to be.
For example does the 201301 balance need to be on the same record as the summaries above or a different record? How do you want to name the variable with that one month's balance in the new dataset?
The 201301 balance would be on the same record as the want output and it could be called Bal_Start_Mon
New dataset would be look like below (everything would be the same as what achieved in the previous code but one more variable want to be added (Bal_Start_Mon). Thanks in advance!
Acct_Num Date Avg_Bal Variable1 Variable2 Bal_Start_Mon (data from Column 201301)
1 201401 6.5 (1+2+....12)/12 ON 999 20
2 201401 XXX (average for the past 12 month) BC 457 .
.
.
.
One way is merge the summarized data with the original value desired.
proc sql;
create table new_want as
select a.*, b.balance as Bal_Start_mon
from want as a left join
(select Acct_num, balance from have where date=201301) as b on
a.Acct_num=b.Acct_num;
quit;
Sql is often a preferred way to join data on conditions.
A data step merge could be done as well but often involves more filtering or termporary datasets.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.