DATA Step, Macro, Functions and more

Creating Duplicate Observations across datasets

Reply
Frequent Contributor
Posts: 105

Creating Duplicate Observations across datasets

Hi All --

This may be an unusual task, and so I will try to explain the best I can please, please let me know if I don't explain the question well enough: I have a set of aggregated data, that i'd like to merge in such a way that it is duplicated for appropriate by-groups. I think the best way to illustrate what I want is with an example:

This is data set of aggregate data (using dummy data here), essentially it is simple average balances across a number of months. There is a field indicating the number of months, the account, the average balance across that number of months:

accountaverage_balnum_months
150$1
155$2
154$3
2130$1
2135$2
2133$3

So account is the account number, when num_months = 1, the average is really just the balance of the last month, when num_months = 2 its the average balance across the past two months, when num_months=3 then it's the average across last 3 months, etc. etc..

What I want is to have monthly data added to this, so that thee average across all months designated in "num_months" stays the same for each "num_months" group, but the monthly data is present, so this is what I'd like to have:

accountaverage_balnum_monthsmonthmonthly_bal
150$1august50
155$2august50
155$2july48
154$3august50
154$3july48
154$3june55
2130$1august130
2135$2august130
2135$2july140
2133$3august130
2133$3july140
2133$3june137

Now I realize that the data doesn't actually match mathematically (the averages don't match the actual calculated average of the 'monthly_bal' values as they should in real life, as I just made up these numbers to illustrate the data structure I'm looking for). So assuming I have a dataset of monthly balances for each month and each account, i'd like to be able to merge and get the above table. A simple cartesian product doesn't get me exactly what I want, but close. I've tried many joining and merging methods with Proc SQL and Datasteps, but can't quite get it. Has anyone else had this problem, or know what I'm missing here? Thanks a lot!

PROC Star
Posts: 7,492

Re: Creating Duplicate Observations across datasets

Posted in reply to AllSoEasy

Are you just trying to do something like the following?

data have;

  input account average_bal num_months;

  cards;

1 50 1

1 55 2

1 54 3

2 130 1

2 135 2

2 133 3

;

data want (drop=_Smiley Happy;

  array _month(3);

  retain _month:;

  set have;

  by account;

  if first.account then do;

    call missing(of _month(*));

    _month(1)=average_bal;

    monthly_bal=_month(1);

    month=1;

    output;

  end;

  else do;

    _month(Num_months)=average_bal*num_months-sum(of _month(*));

    do month=1 to num_months;

      monthly_bal=_month(month);

      output;

    end;

  end;

run;

Frequent Contributor
Posts: 105

Re: Creating Duplicate Observations across datasets

Hmm, thank you arthur, however the code does not seem to execute:

29           retain _month();

                           _

                           22

ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,

              a missing value. 

PROC Star
Posts: 7,492

Re: Creating Duplicate Observations across datasets

Posted in reply to AllSoEasy

Try it with the following:

data have;

  input account average_bal num_months;

  cards;

1 50 1

1 55 2

1 54 3

2 130 1

2 135 2

2 133 3

;

data want (drop=_Smiley Happy;

  array _month(3);

  retain _month:;

  set have;

  by account;

  if first.account then do;

    call missing(of _month(*));

    _month(1)=average_bal;

    monthly_bal=_month(1);

    month=1;

    output;

  end;

  else do;

    _month(Num_months)=average_bal*num_months-sum(of _month(*));

    do month=1 to num_months;

      monthly_bal=_month(month);

      output;

    end;

  end;

run;

Frequent Contributor
Posts: 105

Re: Creating Duplicate Observations across datasets

Thank you for all of your replies and help Arthur. The code executed, however I'm not exactly sure how I could use that....I'm not trying to calculate averages or totals or anything at all, I was just using average balances as example data. It was really just a question of how to merge two datasets structured in that way, not a question of how to calculate balances. I appreciate you taking the time to try and help however, I no longer need any help on this, it proved to be too difficult to do within SAS and so I manually assembled the data in Excel as I needed..not a very elegant solution but it worked.

Thank you all.

Ask a Question
Discussion stats
  • 4 replies
  • 284 views
  • 3 likes
  • 2 in conversation