BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
andp
Calcite | Level 5

Hi all,

Thanks for all your help so far, this is the last step that I've been trying to figure out and was wondering if there's an efficient way to go about doing this.

Currently I have data set up in this format:

ID          EndDate          Value          Days

1          1/1/00               2               1

1          1/2/00               2               2

1          2/1/00               3               3

1          etc                    3               4

1          etc                    3               5

1

2          1/1/00               2               1

2           1/2/00               2               2

2            2/1/00               3               3

2            etc                    3               4

2             etc                    3               5

The end product is going to be

ID     Month1Sum     Month1Avg     Month2Sum     Month2Avg

1              

2

3

4

etc

So in essence, what I'm trying to do is for "Days" 1-30, to get the sum and avg of the "Value" column as two new variables, then 31-60, etc. This is a slice of code to show how the breakdown of months would work m12 would be the enddate-30.

Days,Days+30,m1);

Days+31,Days+60,m2);

Days+61,Days+90,m3);

Days+91,Days+120,m4);

Days+121,Days+150),m5);

Days+151,Days+180),m6);

Days+181,Days+210),m7);

Days+211,Days+240),m8);

Days+241,Days+270),m9);

Days+271,Days+300),m10);

Days+301,Days+330),m11);

Thanks again for all your help!

1 ACCEPTED SOLUTION

Accepted Solutions
Dorota_Jarosz
Obsidian | Level 7

For completeness I include the entire code now, together with the generated values for 10 IDs.

data have;
do Days=1 to 300;
do id=1 to 10;
Value=int(30*ranuni(1)/1);
output;
end;
end;
run;

data have2;
format id 8. Days best4.;
set have;
month=int(Days/30)+1;
if Days<=30 then month=1;
else if Days<=60 then month=2;
else if Days<=90 then month=3;
else if Days<=120 then month=4;
else if Days<=150 then month=5;
else if Days<=180 then month=6;
else if Days<=210 then month=7;
else if Days<=240 then month=8;
else if Days<=270 then month=9;
else if Days<=300 then month=10;
else month=11; *just in case *;
if month=1 then Monthval1=Value;
else if month=2 then Monthval2=Value;
else if month=3 then Monthval3=Value;
else if month=4 then Monthval4=Value;
else if month=5 then Monthval5=Value;
else if month=6 then Monthval6=Value;
else if month=7 then Monthval7=Value;
else if month=8 then Monthval8=Value;
else if month=9 then Monthval9=Value;
else if month=10 then Monthval10=Value;
else Monthval11=Value;
run;

proc sort data=have2;
by id days;
run;


%let months=Monthval1 Monthval2 Monthval3 Monthval4 Monthval5
     Monthval6 Monthval7 Monthval8 Monthval9 Monthval10;
%let sum_months=sum_Monthval1 sum_Monthval2 sum_Monthval3 sum_Monthval4 sum_Monthval5
     sum_Monthval6 sum_Monthval7 sum_Monthval8 sum_Monthval9 sum_Monthval10;
%let avg_months=avg_Monthval1 avg_Monthval2 avg_Monthval3 avg_Monthval4 avg_Monthval5
                avg_Monthval6 avg_Monthval7 avg_Monthval8 avg_Monthval9 avg_Monthval10;

proc summary data=have2  ;
var Monthval1 -- Monthval10;
by id;
output out=monthly_stats (drop=_FREQ_ _TYPE_)
  sum=&sum_months mean=&avg_months;
run;
data want;
format id best4. sum_Monthval1 best8. avg_Monthval1 8.2
sum_Monthval2 best8. avg_Monthval2 8.2
sum_Monthval3 best8. avg_Monthval3 8.2
sum_Monthval4 best8. avg_Monthval4 8.2
sum_Monthval5 best8. avg_Monthval5 8.2
sum_Monthval6 best8. avg_Monthval6 8.2
sum_Monthval7 best8. avg_Monthval7 8.2
sum_Monthval8 best8. avg_Monthval8 8.2
sum_Monthval9 best8. avg_Monthval9 8.2
sum_Monthval10 best8. avg_Monthval10 8.2;
set monthly_stats;
run;

options nocenter nodate;
proc print data=want uniform noobs;
run;

View solution in original post

20 REPLIES 20
Dorota_Jarosz
Obsidian | Level 7

We need more precise definition of Days 1-30.  Does the first column "EndDate" determine the number of days, i.e. are these days in a calendar month specified by EndDate?

andp
Calcite | Level 5

The "days" column is calculated using two columns of info. One is a startdate and the other is the enddate. I left it out since I've already calculated the "days" column using that, but if needed you can code it with start/enddate variables. The thing is, the start and end dates are approximately 10 months apart and I'm looking at 30 day intervals rather than months. Meaning, I'm looking at 300 days prior to the end date, (the 300 days is not uniform for all) and trying to figure out the sum/avg value for the first 30 days per ID, then the next 30 days, etc.

tish
Calcite | Level 5

I did not check this code... I gave you two solutions for the month calculation, one the calendar month (commented out) and then another which should give you 30 day months. Also, I dropped the variable _name_ from the output datasets only to simplify the  merge. Me, I'd leave it in for a while to double check the transposition. I'm not a transposing whiz, so there might be a way to optimize here, too.

proc sql;

   create table add_month as

      select

         id,

         end_date,

/*       intck("month", "31Dec1999"d, end_date) as month,   */

         int((end_date - "01Jan2000"d) / 30) + 1 as month,

         value,

         days

      from

         original_data;

   create table monthly_values as

      select

         id,

         month,

         "Month" || put(month, z2.0) || "Sum" as sum_ID,

         "Month" || put(month, z2.0) || "Avg" as avg_ID,

         sum(value) as sum_value,

         avg(value) as avg_value

      from

         add_month

      group by

         month

      order by

         id,

         month;

quit;

proc transpose data=monthly_values out=sum_value (drop=_name_);

   id sum_ID;

   var sum_value;

   by id;

run;

proc transpose data=monthly_values out=avg_value (drop=_name_);

   id avg_ID;

   var avg_value;

   by id;

run;

data desired_data;

   merge sum_value avg_value;

   by id;

run;

Hope this helps,

Tish

Reeza
Super User

Does your date criteria, of 30 day range change per id or is it from a fixed start date and everything is in reference to that?

andp
Calcite | Level 5

It resets per ID and it is all in chronological order. So it would go 1,2,3.... then reset for the next ID.

Linlin
Lapis Lazuli | Level 10

example:

data have;
  do id=1,2;
   do days=1 to 90;
     value=days+id;
  output;
end;
   end;
data temp;
  set have;
  if days<31 then group=1;
     else if days<61 then group=2;
     else group=3;
run;
proc sql;
   create table temp as
     select id,group ,sum(value) as sum, mean(value) as mean
    from temp
      group by id,group
     order by id,group;
   quit;

   data want(drop=group mean sum);
     retain  month1sum month2sum month3sum month1mean month2mean month3mean;
     set temp;
  by id;
  array _sum(*) month1sum month2sum month3sum;
     array _mean(*) month1mean month2mean month3mean;
  _sum(group)=sum;
  _mean(group)=mean;
  if last.id then
  output;
run;
proc print;run;
Obs  month1sum  month2sum  month3sum  month1mean  month2mean  month3mean  id

         1      495        1395       2295       16.5        46.5        76.5      1
         2      525        1425       2325       17.5        47.5        77.5      2

Linlin

andp
Calcite | Level 5

LinLin:

Thanks for this code, but I currently have around 200k ID's to work with and was wondering if there was any way to make it more streamlined where it would auto detect the number of ID's.

Reeza:

Would this work if I needed to submit it into a macro because I need to eventually pull about 120 sets of values since I'm looking for monthly and trimester level values? And to clarify, each ID has it's own start/end dates.

Reeza
Super User

I'm not sure where the start and end dates come in to play. The proc means should be versatile and fine in a macro as long as the days definition is the same across variables, ie all 0 to 30 belong to month 1.

My thought was that if the ranges are fixed, you could use a format for the dates to specify them as well, but a bit unclear about what you're trying to do.

Maybe explaining your data and the final end result and or business context may help, if you can.

andp
Calcite | Level 5

Okay, I'm currently trying to use measurements taken during pregnancy (end date = date of delivery) to quantify how much of each measurement was taken for the periods on a monthly and trimester basis. eg. how much cereal a person ate in the first month of pregnancy, 2nd, etc. as well as during first trimester, etc.

I have several items I'm analyzing, eg. cereal, milk, juice, etc. and the final dataset would have each person and variables for each item and the values for each month and trimester on one row for ease of analysis.

Currently, I'm trying to finalize the processing of lets say cereal, so I can create a macro to deal with all of the other items in the same manner without having to manually do each one. The days are defined by date of delivery minus the length of the pregnancy so that's how they're involved in this data.

Reeza
Super User

So you have a start_date, end_date and a measurement_date? and then values?

Is your data like this :

start_date end_date measurement_date cereal juice milk

or like this:

start_date end_date measurement_date type value

andp
Calcite | Level 5

Oh, sorry, so what we have are monthly measurements instead of daily. Therefore, I broke it out so that each monthly measurement is divided into the number of days in that month so I have a daily measurement.

From there, I'm trying to put it back together by month (not calendar, but of pregnancy) so that I can see the values for the first, second, third months of pregnancy etc.

I have:

ID start_date end_date day(this is the count that starts at 1 on the start date and goes up to delivery date, then resets for the next ID) value

Each item is in a separate dataset as of now, so I'll eventually need to merge all the data back together once I get the monthly and trimester measurements set up.

So the final dataset would look like:

ID     Item1Month1sum, item1month1avg, item1month2sum/avg,.... item2month1sum/avg.... etc

Thanks again for your help!

Reeza
Super User

Well, I think what I've suggested will work fine, create the format and then go from there with proc means. It'll reduce the hard coding a lot.

proc format;

value month_fmt

0 - 30 = 1

31 - 60 = 2

61 - 90 = 3

91 - 120 = 4

121 - 150 = 5

151 - 180 = 6

181 - 210 = 7

211 - 240 = 8

241 - 270 = 9

271 - 300 = 10

301 - 330 = 11;

run;

proc means data=have;

class id days;

format days month_fmt.;

var value;

output out=sample n= mean=/autoname;

run;

Dorota_Jarosz
Obsidian | Level 7

andp: Linlin's code will work for any number of id's, she just had to generate test data. I understand you already have data; so you don't need the first data step.  If you name your variables sum_Month1, sum_Month2, ..., sum_Month10, so that the consecutive number is at the end of the variable name, then you can sometimes use syntax sum_Month1 -- sum_Month10 instead the explicit list of all vars, for instance in defining an array to summarize over a trimester or whatever.

data have;
informat ID best4. EndDate mmddyy8. Value best8. Days 3.;
input ID EndDate Value Days;
format EndDate date9.;
cards;
1          1/1/00               2               1
1          1/2/00               2               2
1          2/1/00               3               3
1          2/1/00               3               4
1          2/1/00               3               5
2          1/1/00               4               1
2          1/2/00               5               2
2          2/1/00               6               3
2          2/1/00               7               4
2          2/1/00               3               5
3          1/1/00               4               1
3          1/2/00               4               2
3          2/1/00               3               3
3          2/1/00               2               4
3          2/1/00               1               5
;
run;

Data have2;
format id 8. Days best4.;
set have;
if mod(Days,30)=1 then Monthval1=Value;
else if mod(Days,30)=2 then Monthval2=Value;
else if mod(Days,30)=3 then Monthval3=Value;
else if mod(Days,30)=4 then Monthval4=Value;
else if mod(Days,30)=5 then Monthval5=Value;
*more Monthvaln definitions here*;
run;

proc sort data=have2;
by id days;
run;

%let months=Monthval1 Monthval2 Monthval3 Monthval4 Monthval5;
%let sum_months=sum_Monthval1 sum_Monthval2 sum_Monthval3 sum_Monthval4 sum_Monthval5;
%let avg_months=avg_Monthval1 avg_Monthval2 avg_Monthval3 avg_Monthval4 avg_Monthval5;

proc summary data=have2  ;
var Monthval1 -- Monthval5;
by id;
output out=monthly_stats (drop=_FREQ_ _TYPE_)
  sum=&sum_months mean=&avg_months;
run;

* this is the output:


                 s    s    s    s    s    a    a    a    a    a
                 u    u    u    u    u    v    v    v    v    v
                 m    m    m    m    m    g    g    g    g    g
                 _    _    _    _    _    _    _    _    _    _
                 M    M    M    M    M    M    M    M    M    M
                 o    o    o    o    o    o    o    o    o    o
                 n    n    n    n    n    n    n    n    n    n
                 t    t    t    t    t    t    t    t    t    t
                 h    h    h    h    h    h    h    h    h    h
                 v    v    v    v    v    v    v    v    v    v
O                a    a    a    a    a    a    a    a    a    a
b        i       l    l    l    l    l    l    l    l    l    l
s        d       1    2    3    4    5    1    2    3    4    5

1           1    2    2    3    3    3    2    2    3    3    3
2           2    4    5    6    7    3    4    5    6    7    3
3           3    4    4    3    2    1    4    4    3    2    1

*;

andp
Calcite | Level 5

Hi Dorota,

Thanks for the response, I just ran your code with the necessary modifications and noticed that it was summing the months in an odd fashion. Instead of summing just month1, the month1 variable has a value in it every 30 days, and likewise for the rest of the months so that every 30 days each month has a value. I'm wondering if it's the mod() function that's causing this and if there's a way to modify that portion so it sums the first 30 days into one month instead of one day every 30 days.

Tish:

Is there a way to write your code in a way that I could specify the length of the months? I'm not working with all months at 30 days so I was wondering, even if it takes more writing, to specify each month length?

Thanks again!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 2242 views
  • 6 likes
  • 5 in conversation