## Creating Duplicate Observations across datasets

Frequent Contributor
Posts: 109

# Creating Duplicate Observations across datasets

Hi All --

This may be an unusual task, and so I will try to explain the best I can please, please let me know if I don't explain the question well enough: I have a set of aggregated data, that i'd like to merge in such a way that it is duplicated for appropriate by-groups. I think the best way to illustrate what I want is with an example:

This is data set of aggregate data (using dummy data here), essentially it is simple average balances across a number of months. There is a field indicating the number of months, the account, the average balance across that number of months:

 account average_bal num_months 1 50\$ 1 1 55\$ 2 1 54\$ 3 2 130\$ 1 2 135\$ 2 2 133\$ 3

So account is the account number, when num_months = 1, the average is really just the balance of the last month, when num_months = 2 its the average balance across the past two months, when num_months=3 then it's the average across last 3 months, etc. etc..

What I want is to have monthly data added to this, so that thee average across all months designated in "num_months" stays the same for each "num_months" group, but the monthly data is present, so this is what I'd like to have:

 account average_bal num_months month monthly_bal 1 50\$ 1 august 50 1 55\$ 2 august 50 1 55\$ 2 july 48 1 54\$ 3 august 50 1 54\$ 3 july 48 1 54\$ 3 june 55 2 130\$ 1 august 130 2 135\$ 2 august 130 2 135\$ 2 july 140 2 133\$ 3 august 130 2 133\$ 3 july 140 2 133\$ 3 june 137

Now I realize that the data doesn't actually match mathematically (the averages don't match the actual calculated average of the 'monthly_bal' values as they should in real life, as I just made up these numbers to illustrate the data structure I'm looking for). So assuming I have a dataset of monthly balances for each month and each account, i'd like to be able to merge and get the above table. A simple cartesian product doesn't get me exactly what I want, but close. I've tried many joining and merging methods with Proc SQL and Datasteps, but can't quite get it. Has anyone else had this problem, or know what I'm missing here? Thanks a lot!

PROC Star
Posts: 8,167

## Re: Creating Duplicate Observations across datasets

Are you just trying to do something like the following?

data have;

input account average_bal num_months;

cards;

1 50 1

1 55 2

1 54 3

2 130 1

2 135 2

2 133 3

;

data want (drop=_;

array _month(3);

retain _month:;

set have;

by account;

if first.account then do;

call missing(of _month(*));

_month(1)=average_bal;

monthly_bal=_month(1);

month=1;

output;

end;

else do;

_month(Num_months)=average_bal*num_months-sum(of _month(*));

do month=1 to num_months;

monthly_bal=_month(month);

output;

end;

end;

run;

Frequent Contributor
Posts: 109

## Re: Creating Duplicate Observations across datasets

Hmm, thank you arthur, however the code does not seem to execute:

29           retain _month();

_

22

ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,

a missing value.

PROC Star
Posts: 8,167

## Re: Creating Duplicate Observations across datasets

Try it with the following:

data have;

input account average_bal num_months;

cards;

1 50 1

1 55 2

1 54 3

2 130 1

2 135 2

2 133 3

;

data want (drop=_;

array _month(3);

retain _month:;

set have;

by account;

if first.account then do;

call missing(of _month(*));

_month(1)=average_bal;

monthly_bal=_month(1);

month=1;

output;

end;

else do;

_month(Num_months)=average_bal*num_months-sum(of _month(*));

do month=1 to num_months;

monthly_bal=_month(month);

output;

end;

end;

run;

Frequent Contributor
Posts: 109

## Re: Creating Duplicate Observations across datasets

Thank you for all of your replies and help Arthur. The code executed, however I'm not exactly sure how I could use that....I'm not trying to calculate averages or totals or anything at all, I was just using average balances as example data. It was really just a question of how to merge two datasets structured in that way, not a question of how to calculate balances. I appreciate you taking the time to try and help however, I no longer need any help on this, it proved to be too difficult to do within SAS and so I manually assembled the data in Excel as I needed..not a very elegant solution but it worked.

Thank you all.

Discussion stats
• 4 replies
• 308 views
• 3 likes
• 2 in conversation