BookmarkSubscribeRSS Feed
abasiamaokonna0
Calcite | Level 5

hi Guyz

 

I need solution to this problem. I have this dataset below,I only want to calculate the total points for the last three dates for each of the ID variables. I have tried what i know but still giving me wrong values. Not to summerise all by the ID but only to summerise the last three dates when sorted in ascending date.

 

DATA DATES;
INPUT ID : $8. VISIT : DATE9. POINT : 8.;
FORMAT VISIT DATE9.;
DATALINES;
600 20MAY2016 20
600 21JUN2016 30
600 23JUL2016 50
600 30AUG2016 50
600 21SEP2016 30
600 31OCT2016 12
600 21NOV2016 23
600 22DEC2016 22
600 20JAN2017 30
100 20MAY2016 78
100 21JUN2016 54
100 23JUL2016 56
100 30AUG2016 56
100 21SEP2016 89
100 31OCT2016 46
100 21NOV2016 78
100 22DEC2016 78
100 20JAN2017 34
100 22FEB2017 65
100 24MAR2017 45
100 30APR2017 45
200 20MAY2016 90
200 21JUN2016 78
200 23JUL2016 45
200 30AUG2016 45
200 21SEP2016 54
200 31OCT2016 67
200 21NOV2016 78
200 22DEC2016 67
200 20JAN2017 56
200 22FEB2017 43
200 24MAR2017 66
200 30APR2017 65
300 20MAY2016 78
300 21JUN2016 66
300 23JUL2016 55
300 30AUG2016 44
300 21SEP2016 33
300 31OCT2016 11
300 21NOV2016 56
300 22DEC2016 43
300 20JAN2017 78
300 22FEB2017 55
300 24MAR2017 55
300 30APR2017 33
400 20MAY2016 87
400 21JUN2016 67
400 23JUL2016 68
400 30AUG2016 43
400 21SEP2016 46
400 31OCT2016 32
400 21NOV2016 32
400 22DEC2016 44
400 20JAN2017 44
400 22FEB2017 43
400 24MAR2017 54
400 30APR2017 66
500 20MAY2016 69
500 21JUN2016 54
500 23JUL2016 67
500 30AUG2016 67
500 21SEP2016 54
500 31OCT2016 43
500 21NOV2016 43
500 22DEC2016 67
500 20JAN2017 56
500 22FEB2017 56
500 24MAR2017 56
500 30APR2017 43
;
RUN;
15 REPLIES 15
art297
Opal | Level 21

Here is one way:

data want (drop=count count2);
  do until (last.id);
    set dates;
    by id notsorted;
    if first.id then count=1;
    else count+1;
  end;
  do until (last.id);
    set dates;
    by id notsorted;
    if first.id then do;
      count2=1;
      sum=0;
    end;
    else count2+1;
    if count2 ge count-2 then sum+point;
    if last.id then output;
  end;
run;

Art, CEO, AnalystFinder.com

abasiamaokonna0
Calcite | Level 5

Thanks guys,do you guyz know how i can automate the solution such that,it gives the user flexibility to select number of months he or she wants to sum up.If the user wants last 3 or 4 or5 month,he or she can do that without having to change the whole code.Kind of macro variable

DATA DATES;
INPUT id:$8. VISIT:DATE9. POINT:8.;
FORMAT VISIT DATE9.;
DATALINES;
600 20MAY2016 20
600 21JUN2016 30
600 23JUL2016 50
600 30AUG2016 50
600 21SEP2016 30
600 31OCT2016 12
600 21NOV2016 23
600 22DEC2016 22
600 20JAN2017 30
100 20MAY2016 78
100 21JUN2016 54
100 23JUL2016 56
100 30AUG2016 56
100 21SEP2016 89
100 31OCT2016 46
100 21NOV2016 78
100 22DEC2016 78
100 20JAN2017 34
100 22FEB2017 65
100 24MAR2017 45
100 30APR2017 45
200 20MAY2016 90
200 21JUN2016 78
200 23JUL2016 45
200 30AUG2016 45
200 21SEP2016 54
200 31OCT2016 67
200 21NOV2016 78
200 22DEC2016 67
200 20JAN2017 56
200 22FEB2017 43
200 24MAR2017 66
200 30APR2017 65
300 20MAY2016 78
300 21JUN2016 66
300 23JUL2016 55
300 30AUG2016 44
300 21SEP2016 33
300 31OCT2016 11
300 21NOV2016 56
300 22DEC2016 43
300 20JAN2017 78
300 22FEB2017 55
300 24MAR2017 55
300 30APR2017 33
400 20MAY2016 87
400 21JUN2016 67
400 23JUL2016 68
400 30AUG2016 43
400 21SEP2016 46
400 31OCT2016 32
400 21NOV2016 32
400 22DEC2016 44
400 20JAN2017 44
400 22FEB2017 43
400 24MAR2017 54
400 30APR2017 66
500 20MAY2016 69
500 21JUN2016 54
500 23JUL2016 67
500 30AUG2016 67
500 21SEP2016 54
500 31OCT2016 43
500 21NOV2016 43
500 22DEC2016 67
500 20JAN2017 56
500 22FEB2017 56
500 24MAR2017 56
500 30APR2017 43
;
RUN;

art297
Opal | Level 21

Using the method I suggested only one number would have to be represented as a macro variable, namely:

   if count2 ge count-2 then sum+point;

The -2 represents the number of months minus one.

 

Art, CEO, AnalystFinder.com

 

s_lassen
Meteorite | Level 14

@abasiamaokonna0 wrote:

Thanks guys,do you guyz know how i can automate the solution such that,it gives the user flexibility to select number of months he or she wants to sum up.If the user wants last 3 or 4 or5 month,he or she can do that without having to change the whole code.Kind of macro variable

 


I have this suggestion - I put in a check to see if there actually were data points enough for a given ID, the actual number of data points used is in the variable COUNT. I was too lazy to sort your data, so I put in a NOTSORTED in the BY statement. You will see that if you use a high value (e.g. 30) for &N, you will get the same output as from PROC SUMMARY.

%let n=3;
data want;
  do _N_=1 by 1 until(last.id);
    set dates;
    by id notsorted;
    array points(0:%eval(&n-1)) _temporary_;
    points(mod(_N_,&n))=point;
    end;
  count=min(_N_,&n); /* the number of data points actually available for this ID, if less than &n */
  /* null the datapoints from previous IDs, if any */ 
do _N_=_N_+1 to &n; points(mod(_N_,&n))=.; end; sum=sum(of points(*)); run;
ballardw
Super User

@abasiamaokonna0 wrote:

Thanks guys,do you guyz know how i can automate the solution such that,it gives the user flexibility to select number of months he or she wants to sum up.If the user wants last 3 or 4 or5 month,he or she can do that without having to change the whole code.Kind of macro variable

 


How will you collect the desired interval data from the user? Will they supply "number of last months" or an interval (start date and end date)?

What is the desired result if there are fewer than the requested "months" of data available?

What is the desired result if the requested Interval has one or both endpoints outside of the available data?

What is the desired result if the request is for a period where there are no records for one of the ID values?

 

Also with your data intervals more than a little irregular you might even have to define "month"

mkeintz
PROC Star

Assuming the data are grouped by ID (and sorted by date within ID), there is no need for do loops, or extra proc's.  Just maintain a temporary array carrying the most recent 3 (or some other number) of POINTS values, as in:

 

DATA DATES;
INPUT ID : $8. VISIT : DATE9. POINT : 8.;
FORMAT VISIT DATE9.;
DATALINES;
600 20MAY2016 20
600 21JUN2016 30
600 23JUL2016 50
600 30AUG2016 50
600 21SEP2016 30
600 31OCT2016 12
600 21NOV2016 23
600 22DEC2016 22
600 20JAN2017 30
100 20MAY2016 78
100 21JUN2016 54
100 23JUL2016 56
100 30AUG2016 56
100 21SEP2016 89
100 31OCT2016 46
100 21NOV2016 78
100 22DEC2016 78
100 20JAN2017 34
100 22FEB2017 65
100 24MAR2017 45
100 30APR2017 45
200 20MAY2016 90
200 21JUN2016 78
200 23JUL2016 45
200 30AUG2016 45
200 21SEP2016 54
200 31OCT2016 67
200 21NOV2016 78
200 22DEC2016 67
200 20JAN2017 56
200 22FEB2017 43
200 24MAR2017 66
200 30APR2017 65
300 20MAY2016 78
300 21JUN2016 66
300 23JUL2016 55
300 30AUG2016 44
300 21SEP2016 33
300 31OCT2016 11
300 21NOV2016 56
300 22DEC2016 43
300 20JAN2017 78
300 22FEB2017 55
300 24MAR2017 55
300 30APR2017 33
400 20MAY2016 87
400 21JUN2016 67
400 23JUL2016 68
400 30AUG2016 43
400 21SEP2016 46
400 31OCT2016 32
400 21NOV2016 32
400 22DEC2016 44
400 20JAN2017 44
400 22FEB2017 43
400 24MAR2017 54
400 30APR2017 66
500 20MAY2016 69
500 21JUN2016 54
500 23JUL2016 67
500 30AUG2016 67
500 21SEP2016 54
500 31OCT2016 43
500 21NOV2016 43
500 22DEC2016 67
500 20JAN2017 56
500 22FEB2017 56
500 24MAR2017 56
500 30APR2017 43
;
RUN;


%let ndate=3;
data want (keep=id total_points);
  set dates;
  by id notsorted;

  array pts {&ndate} _temporary_;
  if first.id then call missing(of pts{*});
  pts{mod(_N_,&ndate)+1}=point;
  if last.id;
  total_points=sum(of pts{*});
run;

 

This program does the following:

  1.  Assumes the data are grouped by ID (but the NOTSORTED parameter in the BY statement allows the ID groups to be in any order.
  2. The TEMPORARY array
    1. Is the same size as the number of trailing dates wanted.  Just change the macrovar NDATES to get different trailing window sizes.
    2. has its values automatically retained from observation to observation.
  3. The mod(_N_,3) function is the remainder of the observation number divided by 3 (for NDATES=3), resulting in a 0, 1, or 2.  Adding 1 yields a way to put POINT in elements 1, 2, or 3 of the PTS 3-element array.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
s_lassen
Meteorite | Level 14

mkeintz,

Just a couple of points:

  • The loop in my suggestion automatically gives the _N_ value the number of data points read from that ID, meaning that you will not have to calculate that otherwise in case the user entered a numeric value greater than the smallest number of observations in a by group.
  • by zero-basing the array (array pts (0:%eval(&ndate-1)) _temporary_;) you do not have to subtract 1 from the output of the MOD() function every time. I think this saves CPU, and makes the program simpler. 

 

@mkeintz wrote:

Assuming the data are grouped by ID (and sorted by date within ID), there is no need for do loops, or extra proc's.  Just maintain a temporary array carrying the most recent 3 (or some other number) of POINTS values, as in:

 

DATA DATES;
INPUT ID : $8. VISIT : DATE9. POINT : 8.;
FORMAT VISIT DATE9.;
DATALINES;
600 20MAY2016 20
600 21JUN2016 30
600 23JUL2016 50
600 30AUG2016 50
600 21SEP2016 30
600 31OCT2016 12
600 21NOV2016 23
600 22DEC2016 22
600 20JAN2017 30
100 20MAY2016 78
100 21JUN2016 54
100 23JUL2016 56
100 30AUG2016 56
100 21SEP2016 89
100 31OCT2016 46
100 21NOV2016 78
100 22DEC2016 78
100 20JAN2017 34
100 22FEB2017 65
100 24MAR2017 45
100 30APR2017 45
200 20MAY2016 90
200 21JUN2016 78
200 23JUL2016 45
200 30AUG2016 45
200 21SEP2016 54
200 31OCT2016 67
200 21NOV2016 78
200 22DEC2016 67
200 20JAN2017 56
200 22FEB2017 43
200 24MAR2017 66
200 30APR2017 65
300 20MAY2016 78
300 21JUN2016 66
300 23JUL2016 55
300 30AUG2016 44
300 21SEP2016 33
300 31OCT2016 11
300 21NOV2016 56
300 22DEC2016 43
300 20JAN2017 78
300 22FEB2017 55
300 24MAR2017 55
300 30APR2017 33
400 20MAY2016 87
400 21JUN2016 67
400 23JUL2016 68
400 30AUG2016 43
400 21SEP2016 46
400 31OCT2016 32
400 21NOV2016 32
400 22DEC2016 44
400 20JAN2017 44
400 22FEB2017 43
400 24MAR2017 54
400 30APR2017 66
500 20MAY2016 69
500 21JUN2016 54
500 23JUL2016 67
500 30AUG2016 67
500 21SEP2016 54
500 31OCT2016 43
500 21NOV2016 43
500 22DEC2016 67
500 20JAN2017 56
500 22FEB2017 56
500 24MAR2017 56
500 30APR2017 43
;
RUN;


%let ndate=3;
data want (keep=id total_points);
  set dates;
  by id notsorted;

  array pts {&ndate} _temporary_;
  if first.id then call missing(of pts{*});
  pts{mod(_N_,&ndate)+1}=point;
  if last.id;
  total_points=sum(of pts{*});
run;

 

This program does the following:

  1.  Assumes the data are grouped by ID (but the NOTSORTED parameter in the BY statement allows the ID groups to be in any order.
  2. The TEMPORARY array
    1. Is the same size as the number of trailing dates wanted.  Just change the macrovar NDATES to get different trailing window sizes.
    2. has its values automatically retained from observation to observation.
  3. The mod(_N_,3) function is the remainder of the observation number divided by 3 (for NDATES=3), resulting in a 0, 1, or 2.  Adding 1 yields a way to put POINT in elements 1, 2, or 3 of the PTS 3-element array.

 

mkeintz
PROC Star

@s_lassen  - Somehow I overlooked your response, which preceded mine substantially.  Wouldn't have been so expansive in my explanation if I had seen it.

 

Yes, by all means let array have an index range of 0 to ndate-1 instead of 1 to ndate.  I didn't want to overburden the OP with what I suspected would be more new knowledge than wanted.

 

However, I'm not sure  your program as presented does what is intended.  In particular the "sum=sum(of points{*})" follows (rather than precedes) resetting of the array to missing.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

A running total can be maintained without an array.

This will work even if there are fewer records than need to be summed.


%let n=3;
data WANT(keep= GRP_ID SUM);
  set DATES end=LASTOBS ;
  GRP_ID=lag(ID);
  if _N_>1 & ID ne GRP_ID then do;
    output;  
    call missing(SUM);
  end;
  SUM + sum(POINT, - (lag&n(ID)=ID) * lag&n(POINT) );
  if LASTOBS then output;  
run;

 

s_lassen
Meteorite | Level 14

@mkeintz wrote:

@s_lassen  - Somehow I overlooked your response, which preceded mine substantially.  Wouldn't have been so expansive in my explanation if I had seen it.

 

Yes, by all means let array have an index range of 0 to ndate-1 instead of 1 to ndate.  I didn't want to overburden the OP with what I suspected would be more new knowledge than wanted.

 

However, I'm not sure  your program as presented does what is intended.  In particular the "sum=sum(of points{*})" follows (rather than precedes) resetting of the array to missing.


Actually, the "setting of the array to missing" only concerns the elements that were not set by this ID's observations, in case there were fewer observations than wanted.

Reeza
Super User

@abasiamaokonna0 I edited your post for some small insignificant grammatical errors, but also to format your code. 

Please use the insert code (6/7th) icons in the Rich Text Editor to insert your code, otherwise the forum can garble your code, especially if it has colons. I also changed your subject line to be more descriptive. 

ChrisNZ
Tourmaline | Level 20

Here is another way:


data WANT;
  merge DATES 
        DATES (firstobs=2 rename=( ID=ID2 POINT=POINT2))
        DATES (firstobs=3 rename=( ID=ID3 POINT=POINT3))
        DATES (firstobs=4 rename=( ID=ID4 POINT=POINT4));
  if ID3 ne ID4; %* Check that ID3 is the last ID, i.e. ID4 is a new ID;
  SUM=sum(POINT, POINT2, POINT3);
  output;
  keep ID SUM;
run;

This assumes there are always at least 3 dates per ID, and no check is made to ensure this.

 

ID SUM
600 75
100 155
200 174
300 143
400 163
500 155
novinosrin
Tourmaline | Level 20

Another fun way:

/*Assuming the number of records in each group is divisible by 3 as it is in your sample to conveniently make it sets of three*/

data want;

     do until(last.id);

           do _n_=1 by 1 until(mod(_n_,3)=0);

                set dates;

                by id notsorted;

                array temp{3} _temporary_;

                temp(_n_)=point;

 

                if last.id then

                     do;

                           sum=sum(of temp(*));

                           output;

                     end;

           end;

     end;

run;

 

Regards,

Naveen Srinivasan

Ksharp
Super User
DATA DATES;
INPUT ID : $8. VISIT : DATE9. POINT : 8.;
FORMAT VISIT DATE9.;
DATALINES;
600 20MAY2016 20
600 21JUN2016 30
600 23JUL2016 50
600 30AUG2016 50
600 21SEP2016 30
600 31OCT2016 12
600 21NOV2016 23
600 22DEC2016 22
600 20JAN2017 30
100 20MAY2016 78
100 21JUN2016 54
100 23JUL2016 56
100 30AUG2016 56
100 21SEP2016 89
100 31OCT2016 46
100 21NOV2016 78
100 22DEC2016 78
100 20JAN2017 34
100 22FEB2017 65
100 24MAR2017 45
100 30APR2017 45
200 20MAY2016 90
200 21JUN2016 78
200 23JUL2016 45
200 30AUG2016 45
200 21SEP2016 54
200 31OCT2016 67
200 21NOV2016 78
200 22DEC2016 67
200 20JAN2017 56
200 22FEB2017 43
200 24MAR2017 66
200 30APR2017 65
300 20MAY2016 78
300 21JUN2016 66
300 23JUL2016 55
300 30AUG2016 44
300 21SEP2016 33
300 31OCT2016 11
300 21NOV2016 56
300 22DEC2016 43
300 20JAN2017 78
300 22FEB2017 55
300 24MAR2017 55
300 30APR2017 33
400 20MAY2016 87
400 21JUN2016 67
400 23JUL2016 68
400 30AUG2016 43
400 21SEP2016 46
400 31OCT2016 32
400 21NOV2016 32
400 22DEC2016 44
400 20JAN2017 44
400 22FEB2017 43
400 24MAR2017 54
400 30APR2017 66
500 20MAY2016 69
500 21JUN2016 54
500 23JUL2016 67
500 30AUG2016 67
500 21SEP2016 54
500 31OCT2016 43
500 21NOV2016 43
500 22DEC2016 67
500 20JAN2017 56
500 22FEB2017 56
500 24MAR2017 56
500 30APR2017 43
;
RUN;
data temp;
 merge dates dates(keep=id rename=(id=_id) firstobs=4);
 if id ne _id;
run;
proc summary data=temp nway;
class id;
var point;
output out=want sum=;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 2030 views
  • 1 like
  • 10 in conversation