DATA Step, Macro, Functions and more

Summing monthly variables until a given date is reached

Reply
Frequent Contributor
Posts: 138

Summing monthly variables until a given date is reached

Hi,

 

I have a dataset at the person level with an ID variable, a date variable, and several monthly count variables.

 

ID      Date      mth_200201      mth_200202      mth_200203...mth_201501

1       1/5/2013        5                         1                          4

2       2/13/2007       .                          .                          2

3       3/8/2002        1                          3                         1

 

What I want to do is create a total variable which is a sum of the values in mth_200201 through the mth_ variable corresponding with the date (so for ID 3 above, through mth_200203 for a total of 1+3+1=5). 

 

Any help is much appreciated.

Super User
Posts: 5,093

Re: Summing monthly variables until a given date is reached

[ Edited ]

Walternate,

 

Here's one way to approach the problem.

 

data want;

set have;

total=0;

array m {157} mth_200201 mth_200202 mth_200203 ... mth_201501;

if date >= '01jan2002'd then do i=1 to 157 until (vname(m{i}) = 'mth_' || put( intnx('month', '01jan2002'd, i-1), yymmn6.) );

   total + m{i};

end;

run;

 

Some of the statements could be broken out into smaller pieces, but the logic should work as is.  (It's untested.)

 

Good luck.

Frequent Contributor
Posts: 138

Re: Summing monthly variables until a given date is reached

Hi,

 

Thank you for your reply. I tried it and it worked; the problem is that I only want it to sum for people whose values of date aren't missing. So I tried this:

 

if date ne . then do;
total=0;
array m{157} mth_200201--mth201501;
do i=1 to 157 until
(vname(m{i})=cats('mth', year(date), month(date)));
total+m{i};
end;

 

As I said, the code did exactly what I wanted it to for people with values of date. The problem is that there are people who are missing values of date whom I don't want to have values of total. The code as it is now is retaining the value of total for each person with a value of date and filling it in for each person with missing date until the next person with a value for date is reached. I need a way to leave total blank for those who are missing date (I thought that was what the subsetting if statement at the beginning was supposed to do, but I could be wrong...).

Super User
Posts: 5,093

Re: Summing monthly variables until a given date is reached

Well, in the code I posted you could just switch from (total=0Smiley Wink to (total=.Smiley Wink.

 

That should be sufficient.  The loop only executes when date is within range (so not for missing dates).  So TOTAL would remain missing if DATE is missing.  And the sum statements (total + some_value) should work even if TOTAL starts out being missing.

Super User
Super User
Posts: 7,413

Re: Summing monthly variables until a given date is reached

As has been mentioned above, and this seems to be a common theme, keeping data (that being the 200202/200203 etc) in column names is not a good idea.  It makes working with that data so much harder.  A better idea is to normalise your data:

ID     DATE       MONTH    RESULT

1       1/5/2013  1               5

1       1/5/2013  2               1 

...

 

The reason is that working with the MONTH is now a simple process of using a data column.  You can use all kinds of aggregates on the data, and if at some point there is a need for an output of transposed data, then call a proc transpose.  Your "working" data structure does not need to reflect your "output" data structure.

Ask a Question
Discussion stats
  • 4 replies
  • 243 views
  • 0 likes
  • 3 in conversation