BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
zhige50
Obsidian | Level 7

Hello,

I have a dataset looks like below, I am wondering if it is possible for SAS to average vertically. Anyone have any idea how to achieve that?

Original Dataset (example):

AcctNum   Date          Balance  Other_Variable

1               201301      A

1               201302      B

1               201303      C

1               201304      D

1               201305      E

1               201306      F

1               201307      G

1               201308      H

1               201309      I

1               201310      J

1               201311      K

1               201312      L

1               201401      M         XXXXX (want to keep)

2               201301      N        

.                .                  .

.                .                  .

.                .                  .

Dataset want to achieve (average balance for past12 month- avg(B-M))

                 Date            Avg_Bal                                                   Other Variable

1               201401         XXX (average for the past 12 month)       (Keep all other variable in row 201401)

2               201401         XXX (average for the past 12 month)

3               201401         XXX (average for the past 12 month)

Thanks in advance!

Regards,

Frank

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Assuming all of the data is to summarized for the same period 201302-201401 (and not varing by last date in the data which is a moderately obnoxious kettle of fish)

Something like this might help

proc summary data=have nway; /* have is the name of the dataset you already have*/

     where 201302  le date le 201401;

     class id;

     var date balance;

     output out=want(drop = _type_ _freq_) max(date) = date mean(balance)=Avg_bal

                         maxid(date(variable1) date(variable2)) = variable1 variable2;

run;

These should get the values of variables 1 and 2 associated with the largest value of date within the period and the average balance.

If you don't drop the _freq_ you'll have a variable that says how many months (my assumption) of data were used to calculate the average.

View solution in original post

7 REPLIES 7
ballardw
Super User

Some clarification looks like it is in order.

By past 12 months do you want the average from 201301 to 201312, 201302 to 201401 or a different period? Do all of the AcctNum have balances for the same period or are some shorter?

Do the values of the "other variables" change or are they the same for all of the records within the period you want to summarize? If they vary what is the rule for selection?

zhige50
Obsidian | Level 7

Sorry about the confusion. The 12 month is the month I want (in this example is 201401) backed 12 (so it is 201302-201401)

AcctNum   Date          Balance   Variable1      Variable2

1               201301      20

1               201302      9

1               201303      1

1               201304      2

1               201305      5

1               201306      6

1               201307      10

1               201308      4          ......

1               201309      3          AB

1               201310      8          QC

1               201311      7          ON

1               201312      12        BC

1               201401      11        ON              999

2               201301      N     

.                .                  .

2               201401      XXX      BC             457

.                .                  .

Dataset want to achieve (average balance for past12 month- avg(B-M))

                 Date            Avg_Bal                                           Variable1    Variable2                                       

1               201401         6.5 (1+2+....12)/12                            ON             999

2               201401         XXX (average for the past 12 month)    BC             457

3               201401         XXX (average for the past 12 month)

ballardw
Super User

Assuming all of the data is to summarized for the same period 201302-201401 (and not varing by last date in the data which is a moderately obnoxious kettle of fish)

Something like this might help

proc summary data=have nway; /* have is the name of the dataset you already have*/

     where 201302  le date le 201401;

     class id;

     var date balance;

     output out=want(drop = _type_ _freq_) max(date) = date mean(balance)=Avg_bal

                         maxid(date(variable1) date(variable2)) = variable1 variable2;

run;

These should get the values of variables 1 and 2 associated with the largest value of date within the period and the average balance.

If you don't drop the _freq_ you'll have a variable that says how many months (my assumption) of data were used to calculate the average.

zhige50
Obsidian | Level 7

Thanks so much ! One more question, I am wondering is there a way to also include the observation month balance (201301), but keep the average for following 12 month (201302-201401)?

ballardw
Super User

If you are going to want more than a single balance and maintain this going forward you need to be very specific and explicit about futher requirements such as what the resulting data set structure need to be.

For example does the 201301 balance need to be on the same record as the summaries above or a different record? How do you want to name the variable with that one month's balance in the new dataset?


zhige50
Obsidian | Level 7

The 201301 balance would be on the same record as the want output and it could be called Bal_Start_Mon

New dataset would be look like below (everything would be the same as what achieved in the previous code but one more variable want to be added (Bal_Start_Mon). Thanks in advance!

Acct_Num     Date            Avg_Bal                                           Variable1    Variable2   Bal_Start_Mon (data from Column 201301)                               

1                  201401         6.5 (1+2+....12)/12                            ON             999           20

2                  201401         XXX (average for the past 12 month)    BC             457           .

.

.

.        

ballardw
Super User

One way is merge the summarized data with the original value desired.

proc sql;

     create table new_want as

     select a.*, b.balance as Bal_Start_mon

     from want as a left join

          (select Acct_num, balance from have where date=201301) as b on

          a.Acct_num=b.Acct_num;

quit;

Sql is often a preferred way to join data on conditions.

A data step merge could be done as well but often involves more filtering or termporary datasets.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1958 views
  • 0 likes
  • 2 in conversation