Help using Base SAS procedures

Summing values based on days in separate column

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 18
Accepted Solution

Summing values based on days in separate column

Hi all,

Thanks for all your help so far, this is the last step that I've been trying to figure out and was wondering if there's an efficient way to go about doing this.

Currently I have data set up in this format:

ID          EndDate          Value          Days

1          1/1/00               2               1

1          1/2/00               2               2

1          2/1/00               3               3

1          etc                    3               4

1          etc                    3               5

1

2          1/1/00               2               1

2           1/2/00               2               2

2            2/1/00               3               3

2            etc                    3               4

2             etc                    3               5

The end product is going to be

ID     Month1Sum     Month1Avg     Month2Sum     Month2Avg

1              

2

3

4

etc

So in essence, what I'm trying to do is for "Days" 1-30, to get the sum and avg of the "Value" column as two new variables, then 31-60, etc. This is a slice of code to show how the breakdown of months would work m12 would be the enddate-30.

Days,Days+30,m1);

Days+31,Days+60,m2);

Days+61,Days+90,m3);

Days+91,Days+120,m4);

Days+121,Days+150),m5);

Days+151,Days+180),m6);

Days+181,Days+210),m7);

Days+211,Days+240),m8);

Days+241,Days+270),m9);

Days+271,Days+300),m10);

Days+301,Days+330),m11);

Thanks again for all your help!


Accepted Solutions
Solution
‎07-24-2012 04:40 PM
Contributor
Posts: 66

Re: Summing values based on days in separate column

For completeness I include the entire code now, together with the generated values for 10 IDs.

data have;
do Days=1 to 300;
do id=1 to 10;
Value=int(30*ranuni(1)/1);
output;
end;
end;
run;

data have2;
format id 8. Days best4.;
set have;
month=int(Days/30)+1;
if Days<=30 then month=1;
else if Days<=60 then month=2;
else if Days<=90 then month=3;
else if Days<=120 then month=4;
else if Days<=150 then month=5;
else if Days<=180 then month=6;
else if Days<=210 then month=7;
else if Days<=240 then month=8;
else if Days<=270 then month=9;
else if Days<=300 then month=10;
else month=11; *just in case *;
if month=1 then Monthval1=Value;
else if month=2 then Monthval2=Value;
else if month=3 then Monthval3=Value;
else if month=4 then Monthval4=Value;
else if month=5 then Monthval5=Value;
else if month=6 then Monthval6=Value;
else if month=7 then Monthval7=Value;
else if month=8 then Monthval8=Value;
else if month=9 then Monthval9=Value;
else if month=10 then Monthval10=Value;
else Monthval11=Value;
run;

proc sort data=have2;
by id days;
run;


%let months=Monthval1 Monthval2 Monthval3 Monthval4 Monthval5
     Monthval6 Monthval7 Monthval8 Monthval9 Monthval10;
%let sum_months=sum_Monthval1 sum_Monthval2 sum_Monthval3 sum_Monthval4 sum_Monthval5
     sum_Monthval6 sum_Monthval7 sum_Monthval8 sum_Monthval9 sum_Monthval10;
%let avg_months=avg_Monthval1 avg_Monthval2 avg_Monthval3 avg_Monthval4 avg_Monthval5
                avg_Monthval6 avg_Monthval7 avg_Monthval8 avg_Monthval9 avg_Monthval10;

proc summary data=have2  ;
var Monthval1 -- Monthval10;
by id;
output out=monthly_stats (drop=_FREQ_ _TYPE_)
  sum=&sum_months mean=&avg_months;
run;
data want;
format id best4. sum_Monthval1 best8. avg_Monthval1 8.2
sum_Monthval2 best8. avg_Monthval2 8.2
sum_Monthval3 best8. avg_Monthval3 8.2
sum_Monthval4 best8. avg_Monthval4 8.2
sum_Monthval5 best8. avg_Monthval5 8.2
sum_Monthval6 best8. avg_Monthval6 8.2
sum_Monthval7 best8. avg_Monthval7 8.2
sum_Monthval8 best8. avg_Monthval8 8.2
sum_Monthval9 best8. avg_Monthval9 8.2
sum_Monthval10 best8. avg_Monthval10 8.2;
set monthly_stats;
run;

options nocenter nodate;
proc print data=want uniform noobs;
run;

View solution in original post


All Replies
Contributor
Posts: 66

Re: Summing values based on days in separate column

We need more precise definition of Days 1-30.  Does the first column "EndDate" determine the number of days, i.e. are these days in a calendar month specified by EndDate?

Occasional Contributor
Posts: 18

Re: Summing values based on days in separate column

The "days" column is calculated using two columns of info. One is a startdate and the other is the enddate. I left it out since I've already calculated the "days" column using that, but if needed you can code it with start/enddate variables. The thing is, the start and end dates are approximately 10 months apart and I'm looking at 30 day intervals rather than months. Meaning, I'm looking at 300 days prior to the end date, (the 300 days is not uniform for all) and trying to figure out the sum/avg value for the first 30 days per ID, then the next 30 days, etc.

Contributor
Posts: 52

Re: Summing values based on days in separate column

I did not check this code... I gave you two solutions for the month calculation, one the calendar month (commented out) and then another which should give you 30 day months. Also, I dropped the variable _name_ from the output datasets only to simplify the  merge. Me, I'd leave it in for a while to double check the transposition. I'm not a transposing whiz, so there might be a way to optimize here, too.

proc sql;

   create table add_month as

      select

         id,

         end_date,

/*       intck("month", "31Dec1999"d, end_date) as month,   */

         int((end_date - "01Jan2000"d) / 30) + 1 as month,

         value,

         days

      from

         original_data;

   create table monthly_values as

      select

         id,

         month,

         "Month" || put(month, z2.0) || "Sum" as sum_ID,

         "Month" || put(month, z2.0) || "Avg" as avg_ID,

         sum(value) as sum_value,

         avg(value) as avg_value

      from

         add_month

      group by

         month

      order by

         id,

         month;

quit;

proc transpose data=monthly_values out=sum_value (drop=_name_);

   id sum_ID;

   var sum_value;

   by id;

run;

proc transpose data=monthly_values out=avg_value (drop=_name_);

   id avg_ID;

   var avg_value;

   by id;

run;

data desired_data;

   merge sum_value avg_value;

   by id;

run;

Hope this helps,

Tish

Super User
Posts: 17,863

Re: Summing values based on days in separate column

Does your date criteria, of 30 day range change per id or is it from a fixed start date and everything is in reference to that?

Occasional Contributor
Posts: 18

Re: Summing values based on days in separate column

It resets per ID and it is all in chronological order. So it would go 1,2,3.... then reset for the next ID.

Super Contributor
Posts: 1,636

Re: Summing values based on days in separate column

example:

data have;
  do id=1,2;
   do days=1 to 90;
     value=days+id;
  output;
end;
   end;
data temp;
  set have;
  if days<31 then group=1;
     else if days<61 then group=2;
     else group=3;
run;
proc sql;
   create table temp as
     select id,group ,sum(value) as sum, mean(value) as mean
    from temp
      group by id,group
     order by id,group;
   quit;

   data want(drop=group mean sum);
     retain  month1sum month2sum month3sum month1mean month2mean month3mean;
     set temp;
  by id;
  array _sum(*) month1sum month2sum month3sum;
     array _mean(*) month1mean month2mean month3mean;
  _sum(group)=sum;
  _mean(group)=mean;
  if last.id then
  output;
run;
proc print;run;
Obs  month1sum  month2sum  month3sum  month1mean  month2mean  month3mean  id

         1      495        1395       2295       16.5        46.5        76.5      1
         2      525        1425       2325       17.5        47.5        77.5      2

Linlin

Occasional Contributor
Posts: 18

Re: Summing values based on days in separate column

LinLin:

Thanks for this code, but I currently have around 200k ID's to work with and was wondering if there was any way to make it more streamlined where it would auto detect the number of ID's.

Reeza:

Would this work if I needed to submit it into a macro because I need to eventually pull about 120 sets of values since I'm looking for monthly and trimester level values? And to clarify, each ID has it's own start/end dates.

Super User
Posts: 17,863

Re: Summing values based on days in separate column

I'm not sure where the start and end dates come in to play. The proc means should be versatile and fine in a macro as long as the days definition is the same across variables, ie all 0 to 30 belong to month 1.

My thought was that if the ranges are fixed, you could use a format for the dates to specify them as well, but a bit unclear about what you're trying to do.

Maybe explaining your data and the final end result and or business context may help, if you can.

Occasional Contributor
Posts: 18

Re: Summing values based on days in separate column

Okay, I'm currently trying to use measurements taken during pregnancy (end date = date of delivery) to quantify how much of each measurement was taken for the periods on a monthly and trimester basis. eg. how much cereal a person ate in the first month of pregnancy, 2nd, etc. as well as during first trimester, etc.

I have several items I'm analyzing, eg. cereal, milk, juice, etc. and the final dataset would have each person and variables for each item and the values for each month and trimester on one row for ease of analysis.

Currently, I'm trying to finalize the processing of lets say cereal, so I can create a macro to deal with all of the other items in the same manner without having to manually do each one. The days are defined by date of delivery minus the length of the pregnancy so that's how they're involved in this data.

Super User
Posts: 17,863

Re: Summing values based on days in separate column

So you have a start_date, end_date and a measurement_date? and then values?

Is your data like this :

start_date end_date measurement_date cereal juice milk

or like this:

start_date end_date measurement_date type value

Occasional Contributor
Posts: 18

Re: Summing values based on days in separate column

Oh, sorry, so what we have are monthly measurements instead of daily. Therefore, I broke it out so that each monthly measurement is divided into the number of days in that month so I have a daily measurement.

From there, I'm trying to put it back together by month (not calendar, but of pregnancy) so that I can see the values for the first, second, third months of pregnancy etc.

I have:

ID start_date end_date day(this is the count that starts at 1 on the start date and goes up to delivery date, then resets for the next ID) value

Each item is in a separate dataset as of now, so I'll eventually need to merge all the data back together once I get the monthly and trimester measurements set up.

So the final dataset would look like:

ID     Item1Month1sum, item1month1avg, item1month2sum/avg,.... item2month1sum/avg.... etc

Thanks again for your help!

Super User
Posts: 17,863

Re: Summing values based on days in separate column

Well, I think what I've suggested will work fine, create the format and then go from there with proc means. It'll reduce the hard coding a lot.

proc format;

value month_fmt

0 - 30 = 1

31 - 60 = 2

61 - 90 = 3

91 - 120 = 4

121 - 150 = 5

151 - 180 = 6

181 - 210 = 7

211 - 240 = 8

241 - 270 = 9

271 - 300 = 10

301 - 330 = 11;

run;

proc means data=have;

class id days;

format days month_fmt.;

var value;

output out=sample n= mean=/autoname;

run;

Contributor
Posts: 66

Re: Summing values based on days in separate column

andp: Linlin's code will work for any number of id's, she just had to generate test data. I understand you already have data; so you don't need the first data step.  If you name your variables sum_Month1, sum_Month2, ..., sum_Month10, so that the consecutive number is at the end of the variable name, then you can sometimes use syntax sum_Month1 -- sum_Month10 instead the explicit list of all vars, for instance in defining an array to summarize over a trimester or whatever.

data have;
informat ID best4. EndDate mmddyy8. Value best8. Days 3.;
input ID EndDate Value Days;
format EndDate date9.;
cards;
1          1/1/00               2               1
1          1/2/00               2               2
1          2/1/00               3               3
1          2/1/00               3               4
1          2/1/00               3               5
2          1/1/00               4               1
2          1/2/00               5               2
2          2/1/00               6               3
2          2/1/00               7               4
2          2/1/00               3               5
3          1/1/00               4               1
3          1/2/00               4               2
3          2/1/00               3               3
3          2/1/00               2               4
3          2/1/00               1               5
;
run;

Data have2;
format id 8. Days best4.;
set have;
if mod(Days,30)=1 then Monthval1=Value;
else if mod(Days,30)=2 then Monthval2=Value;
else if mod(Days,30)=3 then Monthval3=Value;
else if mod(Days,30)=4 then Monthval4=Value;
else if mod(Days,30)=5 then Monthval5=Value;
*more Monthvaln definitions here*;
run;

proc sort data=have2;
by id days;
run;

%let months=Monthval1 Monthval2 Monthval3 Monthval4 Monthval5;
%let sum_months=sum_Monthval1 sum_Monthval2 sum_Monthval3 sum_Monthval4 sum_Monthval5;
%let avg_months=avg_Monthval1 avg_Monthval2 avg_Monthval3 avg_Monthval4 avg_Monthval5;

proc summary data=have2  ;
var Monthval1 -- Monthval5;
by id;
output out=monthly_stats (drop=_FREQ_ _TYPE_)
  sum=&sum_months mean=&avg_months;
run;

* this is the output:


                 s    s    s    s    s    a    a    a    a    a
                 u    u    u    u    u    v    v    v    v    v
                 m    m    m    m    m    g    g    g    g    g
                 _    _    _    _    _    _    _    _    _    _
                 M    M    M    M    M    M    M    M    M    M
                 o    o    o    o    o    o    o    o    o    o
                 n    n    n    n    n    n    n    n    n    n
                 t    t    t    t    t    t    t    t    t    t
                 h    h    h    h    h    h    h    h    h    h
                 v    v    v    v    v    v    v    v    v    v
O                a    a    a    a    a    a    a    a    a    a
b        i       l    l    l    l    l    l    l    l    l    l
s        d       1    2    3    4    5    1    2    3    4    5

1           1    2    2    3    3    3    2    2    3    3    3
2           2    4    5    6    7    3    4    5    6    7    3
3           3    4    4    3    2    1    4    4    3    2    1

*;

Occasional Contributor
Posts: 18

Re: Summing values based on days in separate column

Hi Dorota,

Thanks for the response, I just ran your code with the necessary modifications and noticed that it was summing the months in an odd fashion. Instead of summing just month1, the month1 variable has a value in it every 30 days, and likewise for the rest of the months so that every 30 days each month has a value. I'm wondering if it's the mod() function that's causing this and if there's a way to modify that portion so it sums the first 30 days into one month instead of one day every 30 days.

Tish:

Is there a way to write your code in a way that I could specify the length of the months? I'm not working with all months at 30 days so I was wondering, even if it takes more writing, to specify each month length?

Thanks again!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 469 views
  • 6 likes
  • 5 in conversation