BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have a dataset at the person level with an ID variable, a date variable, and several monthly count variables.

 

ID      Date      mth_200201      mth_200202      mth_200203...mth_201501

1       1/5/2013        5                         1                          4

2       2/13/2007       .                          .                          2

3       3/8/2002        1                          3                         1

 

What I want to do is create a total variable which is a sum of the values in mth_200201 through the mth_ variable corresponding with the date (so for ID 3 above, through mth_200203 for a total of 1+3+1=5). 

 

Any help is much appreciated.

4 REPLIES 4
Astounding
PROC Star

Walternate,

 

Here's one way to approach the problem.

 

data want;

set have;

total=0;

array m {157} mth_200201 mth_200202 mth_200203 ... mth_201501;

if date >= '01jan2002'd then do i=1 to 157 until (vname(m{i}) = 'mth_' || put( intnx('month', '01jan2002'd, i-1), yymmn6.) );

   total + m{i};

end;

run;

 

Some of the statements could be broken out into smaller pieces, but the logic should work as is.  (It's untested.)

 

Good luck.

Walternate
Obsidian | Level 7

Hi,

 

Thank you for your reply. I tried it and it worked; the problem is that I only want it to sum for people whose values of date aren't missing. So I tried this:

 

if date ne . then do;
total=0;
array m{157} mth_200201--mth201501;
do i=1 to 157 until
(vname(m{i})=cats('mth', year(date), month(date)));
total+m{i};
end;

 

As I said, the code did exactly what I wanted it to for people with values of date. The problem is that there are people who are missing values of date whom I don't want to have values of total. The code as it is now is retaining the value of total for each person with a value of date and filling it in for each person with missing date until the next person with a value for date is reached. I need a way to leave total blank for those who are missing date (I thought that was what the subsetting if statement at the beginning was supposed to do, but I could be wrong...).

Astounding
PROC Star

Well, in the code I posted you could just switch from (total=0;) to (total=.;).

 

That should be sufficient.  The loop only executes when date is within range (so not for missing dates).  So TOTAL would remain missing if DATE is missing.  And the sum statements (total + some_value) should work even if TOTAL starts out being missing.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

As has been mentioned above, and this seems to be a common theme, keeping data (that being the 200202/200203 etc) in column names is not a good idea.  It makes working with that data so much harder.  A better idea is to normalise your data:

ID     DATE       MONTH    RESULT

1       1/5/2013  1               5

1       1/5/2013  2               1 

...

 

The reason is that working with the MONTH is now a simple process of using a data column.  You can use all kinds of aggregates on the data, and if at some point there is a need for an output of transposed data, then call a proc transpose.  Your "working" data structure does not need to reflect your "output" data structure.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1416 views
  • 0 likes
  • 3 in conversation