Hi,
I have a dataset at the person level with an ID variable, a date variable, and several monthly count variables.
ID Date mth_200201 mth_200202 mth_200203...mth_201501
1 1/5/2013 5 1 4
2 2/13/2007 . . 2
3 3/8/2002 1 3 1
What I want to do is create a total variable which is a sum of the values in mth_200201 through the mth_ variable corresponding with the date (so for ID 3 above, through mth_200203 for a total of 1+3+1=5).
Any help is much appreciated.
Walternate,
Here's one way to approach the problem.
data want;
set have;
total=0;
array m {157} mth_200201 mth_200202 mth_200203 ... mth_201501;
if date >= '01jan2002'd then do i=1 to 157 until (vname(m{i}) = 'mth_' || put( intnx('month', '01jan2002'd, i-1), yymmn6.) );
total + m{i};
end;
run;
Some of the statements could be broken out into smaller pieces, but the logic should work as is. (It's untested.)
Good luck.
Hi,
Thank you for your reply. I tried it and it worked; the problem is that I only want it to sum for people whose values of date aren't missing. So I tried this:
if date ne . then do;
total=0;
array m{157} mth_200201--mth201501;
do i=1 to 157 until
(vname(m{i})=cats('mth', year(date), month(date)));
total+m{i};
end;
As I said, the code did exactly what I wanted it to for people with values of date. The problem is that there are people who are missing values of date whom I don't want to have values of total. The code as it is now is retaining the value of total for each person with a value of date and filling it in for each person with missing date until the next person with a value for date is reached. I need a way to leave total blank for those who are missing date (I thought that was what the subsetting if statement at the beginning was supposed to do, but I could be wrong...).
Well, in the code I posted you could just switch from (total=0;) to (total=.;).
That should be sufficient. The loop only executes when date is within range (so not for missing dates). So TOTAL would remain missing if DATE is missing. And the sum statements (total + some_value) should work even if TOTAL starts out being missing.
As has been mentioned above, and this seems to be a common theme, keeping data (that being the 200202/200203 etc) in column names is not a good idea. It makes working with that data so much harder. A better idea is to normalise your data:
ID DATE MONTH RESULT
1 1/5/2013 1 5
1 1/5/2013 2 1
...
The reason is that working with the MONTH is now a simple process of using a data column. You can use all kinds of aggregates on the data, and if at some point there is a need for an output of transposed data, then call a proc transpose. Your "working" data structure does not need to reflect your "output" data structure.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.