BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rj438
Calcite | Level 5

I have a dataset that looks like this:

ObsIDDATEJan-03Feb-03Mar-03Apr-03May-03Jun-031 year5 years
111June 2003 000.23858109.35700   
222May 2003 1.4100110   

 

I need to calculate the last two columns based on the DATE variable. So if the DATE = June 2003 then 1 Year = sum of values across June 2002 to June 2003. If DATE = May 2003 then 1 Year = sum of values across May 2002 to May 2003. 

 

Please help! I think you can do this with arrays but I am not very familiar using arrays. 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data shell;
  infile datalines dsd truncover;
  input f:$6. Date:MMDDYY10. Jan2003:BEST12. Feb2003:BEST12. Mar2003:BEST12. Apr2003:BEST12. May2003:BEST12. Jun2003:BEST12. Jul2003:BEST12. Aug2003:BEST12. Sep2003:BEST12. Oct2003:BEST12. Nov2003:BEST12. Dec2003:BEST12. Jan2004:BEST12. Feb2004:BEST12. Mar2004:BEST12. Apr2004:BEST12. May2004:BEST12. Jun2004:BEST12. Jul2004:BEST12. Aug2004:BEST12. Sep2004:BEST12. Oct2004:BEST12. Nov2004:BEST12. Dec2004:BEST12. Jan2005:BEST12. Feb2005:BEST12. Mar2005:BEST12. Apr2005:BEST12. May2005:BEST12. Jun2005:BEST12. Jul2005:BEST12. Aug2005:BEST12. Sep2005:BEST12. Oct2005:BEST12. Nov2005:BEST12. Dec2005:BEST12. Jan2006:BEST12. Feb2006:BEST12. Mar2006:BEST12. Apr2006:BEST12. May2006:BEST12. Jun2006:BEST12. Jul2006:BEST12. Aug2006:BEST12. Sep2006:BEST12. Oct2006:BEST12. Nov2006:BEST12. Dec2006:BEST12. Jan2007:BEST12. Feb2007:BEST12. Mar2007:BEST12. Apr2007:BEST12. May2007:BEST12. Jun2007:BEST12. Jul2007:BEST12. Aug2007:BEST12. Sep2007:BEST12. Oct2007:BEST12. Nov2007:BEST12. Dec2007:BEST12. Jan2004_yr:32. Feb2004_yr:32.;
format date monyy7.;
datalines4;
01001, 060107, 0,0,0.23858,109.3571,0,0,0.23858,0.08946,0,0,0,0,0,0,0,1.3312,0,0,0.1133,0,73.86797,0,26.34095,0,0,0,0.37603,10.72756,0.10744,0.42847,5.34498,24.03554,0,0,0.91321,0,0,0.0761,0,0.1522,0.25366,0,0.02536,0.05074,0,0,0,0,0,0.31592,0,0.0486,0,0,0,0,0,0.0486,0,0,0,0
01003, 072106, 0,0.04498,0.11694,0.04498,4.54283,0,0.04498,0.06296,0,0,0.08996,0,0,0,0,0,0,0.12736,0.04246,0,1768.785,0,5.47608,0,0.11864,0,0.15819,2.12758,0.0791,1.3182,119.2311,1977.33,0,0,0,0,0.07392,0,0,0,0.18479,0.22174,0.07392,0.91654,0,0.48044,1.18263,0,0,0,0,0,0.45552,0.12614,1.96222,0.07008,0,0.70079,0,0,0,0.04498
01005, 050406, 0,0,0,0.37034,0,0,0.37034,0,0,0,0,0,0,0,0,0,0,0,0,0,22.92384,0,0,0,0,0,2.35742,1.92085,0,0,1.17871,39.32827,0,0,0,0.26193,0,0,0,0,0.12642,0.50572,0,0.42142,0,0,0.63214,0,10.1819,0,0,0,0,0,0.08146,0,0,0,0,0,0,0
01007, 051207, 0.19462,0,0.12974,16.218,4.8654,0,0,0,0,0,0,0,0,0,0,0,4.38758,0.75216,0,0,22.06327,0,0,0,0.2406,0,0,0.30075,0,0.2979,3.36851,53.82838,0,0,0,0,0,0,0,0,0,0.58518,0,0,0,0,0.23408,0,0,0.84553,0,0.56368,0,1.12736,0,0,0,0.67642,0,0,0.19462,0
;;;;

data want;
 set shell;
 array x{*}  Jan2003--Dec2007;
 do i=-12 to 0;
   temp=put(intnx('month',date,i,'s'),monyy7.);
   one_year=sum(one_year,vvaluex(temp));
 end;
*keep f date one_year;
 drop i temp;
 run;

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Arrays could work, although a better approach would be to work with a long data set instead of a wide data set.

 

But you have to help us first. Provide a portion of your data following these instructions and not via any other method. We can't work with screen captures, and we can't work with fake data sets like the above which don't have legal SAS variable names.

--
Paige Miller
rj438
Calcite | Level 5

Is this better? I am looking to create new columns titled "1_Year" and "5_Years" by creating sums based on the condition such that if the DATE = June 2003 then 1 Year = sum of values across June 2002 to June 2003. If DATE = May 2003 then 1 Year = sum of values across May 2002 to May 2003.

data shell;
  infile datalines dsd truncover;
  input f:$6. Date:MMDDYY10. Jan2003:BEST12. Feb2003:BEST12. Mar2003:BEST12. Apr2003:BEST12. May2003:BEST12. Jun2003:BEST12. Jul2003:BEST12. Aug2003:BEST12. Sep2003:BEST12. Oct2003:BEST12. Nov2003:BEST12. Dec2003:BEST12. Jan2004:BEST12. Feb2004:BEST12. Mar2004:BEST12. Apr2004:BEST12. May2004:BEST12. Jun2004:BEST12. Jul2004:BEST12. Aug2004:BEST12. Sep2004:BEST12. Oct2004:BEST12. Nov2004:BEST12. Dec2004:BEST12. Jan2005:BEST12. Feb2005:BEST12. Mar2005:BEST12. Apr2005:BEST12. May2005:BEST12. Jun2005:BEST12. Jul2005:BEST12. Aug2005:BEST12. Sep2005:BEST12. Oct2005:BEST12. Nov2005:BEST12. Dec2005:BEST12. Jan2006:BEST12. Feb2006:BEST12. Mar2006:BEST12. Apr2006:BEST12. May2006:BEST12. Jun2006:BEST12. Jul2006:BEST12. Aug2006:BEST12. Sep2006:BEST12. Oct2006:BEST12. Nov2006:BEST12. Dec2006:BEST12. Jan2007:BEST12. Feb2007:BEST12. Mar2007:BEST12. Apr2007:BEST12. May2007:BEST12. Jun2007:BEST12. Jul2007:BEST12. Aug2007:BEST12. Sep2007:BEST12. Oct2007:BEST12. Nov2007:BEST12. Dec2007:BEST12. Jan2004_yr:32. Feb2004_yr:32.;
datalines4;
01001, 060107, 0,0,0.23858,109.3571,0,0,0.23858,0.08946,0,0,0,0,0,0,0,1.3312,0,0,0.1133,0,73.86797,0,26.34095,0,0,0,0.37603,10.72756,0.10744,0.42847,5.34498,24.03554,0,0,0.91321,0,0,0.0761,0,0.1522,0.25366,0,0.02536,0.05074,0,0,0,0,0,0.31592,0,0.0486,0,0,0,0,0,0.0486,0,0,0,0
01003, 072106, 0,0.04498,0.11694,0.04498,4.54283,0,0.04498,0.06296,0,0,0.08996,0,0,0,0,0,0,0.12736,0.04246,0,1768.785,0,5.47608,0,0.11864,0,0.15819,2.12758,0.0791,1.3182,119.2311,1977.33,0,0,0,0,0.07392,0,0,0,0.18479,0.22174,0.07392,0.91654,0,0.48044,1.18263,0,0,0,0,0,0.45552,0.12614,1.96222,0.07008,0,0.70079,0,0,0,0.04498
01005, 050406, 0,0,0,0.37034,0,0,0.37034,0,0,0,0,0,0,0,0,0,0,0,0,0,22.92384,0,0,0,0,0,2.35742,1.92085,0,0,1.17871,39.32827,0,0,0,0.26193,0,0,0,0,0.12642,0.50572,0,0.42142,0,0,0.63214,0,10.1819,0,0,0,0,0,0.08146,0,0,0,0,0,0,0
01007, 051207, 0.19462,0,0.12974,16.218,4.8654,0,0,0,0,0,0,0,0,0,0,0,4.38758,0.75216,0,0,22.06327,0,0,0,0.2406,0,0,0.30075,0,0.2979,3.36851,53.82838,0,0,0,0,0,0,0,0,0,0.58518,0,0,0,0,0.23408,0,0,0.84553,0,0.56368,0,1.12736,0,0,0,0.67642,0,0,0.19462,0
01009, 072108, 0,0,0.27941,1.32087,1.62569,0.20322,0.20322,0,0,0,0.127,0,0,2.14764,0,0,0,0,0.61012,0,2.0012,0.82977,0.31726,0,0.44388,0,0.98122,0.63078,0.60742,0.62967,1.612,20.90636,0,0,0.09344,0,0,0,0.28926,0.089,0,0.22252,0.06676,0.06676,34.48916,0,0,0,0,0,0,0,0,0,0.25557,0.7454,0,0,0,0.03276,0,2.14764
;;;;

proc print data=shell; format Date date9.;
 run;
PaigeMiller
Diamond | Level 26

First, thank you for providing data in the proper format. This makes everyone's job easier now.

 

The idea of using variable names that contain data (in this case month and year) makes me very uncomfortable, as it makes the coding much more difficult. SAS does not know that a variable name of Jan2003 comes before a variable name of Apr2003, as the variable names can only be sorted alphabetically, and it doesn't know what month comes before Jan2004. 

 

You need a long data set, not a wide data set, and then the programming becomes easy, as you can use the built in SAS date functions and now SAS knows that before Jan2004 it was Dec2003. See Maxim 19. So, let me create a long data set to replace your wide data set.

 

data long;
    set shell;
    array months jan2003--dec2007;
    do i=1 to dim(months);
        month = input(vname(months(i)),monyy7.);
        value=months(i);
        output;
    end;
    format month date9.; /* This line is optional, the program works without it */
    keep f date month value;
run;

data long2;
    set long;
    /* Next line determines if we are in the 12 months before value in variable date */
    prev_12_month = 1<=intck('month',month,date)<=12;
run;

/* Obtain sums */
proc summary data=long2 nway;
    class f;
    var value/weight=prev_12_month;
    output out=want sum=;
run;

If you, from now on, start with a long data set instead of a wide data set, you will be much better off and calculations are simple. SAS knows that Dec 2003 comes one month before Jan 2004. That's done in the INTCK function. So if you start with a long data set in this case, your programming would begin where I have data long2; in the code, and isn't that much simpler than arrays?

 

I point out one additional benefit of long instead of wide data sets. If you get another data set with different months (in the wide data set this means different variable names), the code would have to be modified. But in the long data set case, there's no change needed in the code at all.


Finally, your original problem statement seems to imply you want 13 months in the sum. That's what you said:

 

... by creating sums based on the condition such that if the DATE = June 2003 then 1 Year = sum of values across June 2002 to June 2003.

 

If you really want 13 months instead of the 12 that I programmed, I leave that to you to fix in the above code as a homework assignment.

 

 

 

 

--
Paige Miller
Ksharp
Super User
data shell;
  infile datalines dsd truncover;
  input f:$6. Date:MMDDYY10. Jan2003:BEST12. Feb2003:BEST12. Mar2003:BEST12. Apr2003:BEST12. May2003:BEST12. Jun2003:BEST12. Jul2003:BEST12. Aug2003:BEST12. Sep2003:BEST12. Oct2003:BEST12. Nov2003:BEST12. Dec2003:BEST12. Jan2004:BEST12. Feb2004:BEST12. Mar2004:BEST12. Apr2004:BEST12. May2004:BEST12. Jun2004:BEST12. Jul2004:BEST12. Aug2004:BEST12. Sep2004:BEST12. Oct2004:BEST12. Nov2004:BEST12. Dec2004:BEST12. Jan2005:BEST12. Feb2005:BEST12. Mar2005:BEST12. Apr2005:BEST12. May2005:BEST12. Jun2005:BEST12. Jul2005:BEST12. Aug2005:BEST12. Sep2005:BEST12. Oct2005:BEST12. Nov2005:BEST12. Dec2005:BEST12. Jan2006:BEST12. Feb2006:BEST12. Mar2006:BEST12. Apr2006:BEST12. May2006:BEST12. Jun2006:BEST12. Jul2006:BEST12. Aug2006:BEST12. Sep2006:BEST12. Oct2006:BEST12. Nov2006:BEST12. Dec2006:BEST12. Jan2007:BEST12. Feb2007:BEST12. Mar2007:BEST12. Apr2007:BEST12. May2007:BEST12. Jun2007:BEST12. Jul2007:BEST12. Aug2007:BEST12. Sep2007:BEST12. Oct2007:BEST12. Nov2007:BEST12. Dec2007:BEST12. Jan2004_yr:32. Feb2004_yr:32.;
format date monyy7.;
datalines4;
01001, 060107, 0,0,0.23858,109.3571,0,0,0.23858,0.08946,0,0,0,0,0,0,0,1.3312,0,0,0.1133,0,73.86797,0,26.34095,0,0,0,0.37603,10.72756,0.10744,0.42847,5.34498,24.03554,0,0,0.91321,0,0,0.0761,0,0.1522,0.25366,0,0.02536,0.05074,0,0,0,0,0,0.31592,0,0.0486,0,0,0,0,0,0.0486,0,0,0,0
01003, 072106, 0,0.04498,0.11694,0.04498,4.54283,0,0.04498,0.06296,0,0,0.08996,0,0,0,0,0,0,0.12736,0.04246,0,1768.785,0,5.47608,0,0.11864,0,0.15819,2.12758,0.0791,1.3182,119.2311,1977.33,0,0,0,0,0.07392,0,0,0,0.18479,0.22174,0.07392,0.91654,0,0.48044,1.18263,0,0,0,0,0,0.45552,0.12614,1.96222,0.07008,0,0.70079,0,0,0,0.04498
01005, 050406, 0,0,0,0.37034,0,0,0.37034,0,0,0,0,0,0,0,0,0,0,0,0,0,22.92384,0,0,0,0,0,2.35742,1.92085,0,0,1.17871,39.32827,0,0,0,0.26193,0,0,0,0,0.12642,0.50572,0,0.42142,0,0,0.63214,0,10.1819,0,0,0,0,0,0.08146,0,0,0,0,0,0,0
01007, 051207, 0.19462,0,0.12974,16.218,4.8654,0,0,0,0,0,0,0,0,0,0,0,4.38758,0.75216,0,0,22.06327,0,0,0,0.2406,0,0,0.30075,0,0.2979,3.36851,53.82838,0,0,0,0,0,0,0,0,0,0.58518,0,0,0,0,0.23408,0,0,0.84553,0,0.56368,0,1.12736,0,0,0,0.67642,0,0,0.19462,0
;;;;

data want;
 set shell;
 array x{*}  Jan2003--Dec2007;
 do i=-12 to 0;
   temp=put(intnx('month',date,i,'s'),monyy7.);
   one_year=sum(one_year,vvaluex(temp));
 end;
*keep f date one_year;
 drop i temp;
 run;

PaigeMiller
Diamond | Level 26

This is really clever code, @Ksharp , thanks!

 

In real life, there might be cases where 12 months earlier is not in the array. Also if this program is to be run on a regular basis and the months in the data set change from what they are now to some other set of months, the code needs to be dynamic. Starting with a long data set overcomes both of these problems and runs without human intervention.

--
Paige Miller
Ksharp
Super User
I think could use a macro variable to contain these MONYY variable name .
........
if temp in ('JAN2002' 'FEB2002' ........... ) then do;
.............
rj438
Calcite | Level 5
Thanks both @Ksharp & @PaigeMiller!
I ended up using @Ksharp's code because it was easier for this quick thing. In fact, my original dataset was long form but because I need to merge with a different dataset with multiple participants per obs of this dataset, I needed to keep it simpler (wide form in this case). But both your codes should be really useful in the future. Thanks again!!!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1040 views
  • 1 like
  • 3 in conversation