hi Guyz
I need solution to this problem. I have this dataset below,I only want to calculate the total points for the last three dates for each of the ID variables. I have tried what i know but still giving me wrong values. Not to summerise all by the ID but only to summerise the last three dates when sorted in ascending date.
DATA DATES;
INPUT ID : $8. VISIT : DATE9. POINT : 8.;
FORMAT VISIT DATE9.;
DATALINES;
600 20MAY2016 20
600 21JUN2016 30
600 23JUL2016 50
600 30AUG2016 50
600 21SEP2016 30
600 31OCT2016 12
600 21NOV2016 23
600 22DEC2016 22
600 20JAN2017 30
100 20MAY2016 78
100 21JUN2016 54
100 23JUL2016 56
100 30AUG2016 56
100 21SEP2016 89
100 31OCT2016 46
100 21NOV2016 78
100 22DEC2016 78
100 20JAN2017 34
100 22FEB2017 65
100 24MAR2017 45
100 30APR2017 45
200 20MAY2016 90
200 21JUN2016 78
200 23JUL2016 45
200 30AUG2016 45
200 21SEP2016 54
200 31OCT2016 67
200 21NOV2016 78
200 22DEC2016 67
200 20JAN2017 56
200 22FEB2017 43
200 24MAR2017 66
200 30APR2017 65
300 20MAY2016 78
300 21JUN2016 66
300 23JUL2016 55
300 30AUG2016 44
300 21SEP2016 33
300 31OCT2016 11
300 21NOV2016 56
300 22DEC2016 43
300 20JAN2017 78
300 22FEB2017 55
300 24MAR2017 55
300 30APR2017 33
400 20MAY2016 87
400 21JUN2016 67
400 23JUL2016 68
400 30AUG2016 43
400 21SEP2016 46
400 31OCT2016 32
400 21NOV2016 32
400 22DEC2016 44
400 20JAN2017 44
400 22FEB2017 43
400 24MAR2017 54
400 30APR2017 66
500 20MAY2016 69
500 21JUN2016 54
500 23JUL2016 67
500 30AUG2016 67
500 21SEP2016 54
500 31OCT2016 43
500 21NOV2016 43
500 22DEC2016 67
500 20JAN2017 56
500 22FEB2017 56
500 24MAR2017 56
500 30APR2017 43
;
RUN;
Here is one way:
data want (drop=count count2); do until (last.id); set dates; by id notsorted; if first.id then count=1; else count+1; end; do until (last.id); set dates; by id notsorted; if first.id then do; count2=1; sum=0; end; else count2+1; if count2 ge count-2 then sum+point; if last.id then output; end; run;
Art, CEO, AnalystFinder.com
Thanks guys,do you guyz know how i can automate the solution such that,it gives the user flexibility to select number of months he or she wants to sum up.If the user wants last 3 or 4 or5 month,he or she can do that without having to change the whole code.Kind of macro variable
DATA DATES;
INPUT id:$8. VISIT:DATE9. POINT:8.;
FORMAT VISIT DATE9.;
DATALINES;
600 20MAY2016 20
600 21JUN2016 30
600 23JUL2016 50
600 30AUG2016 50
600 21SEP2016 30
600 31OCT2016 12
600 21NOV2016 23
600 22DEC2016 22
600 20JAN2017 30
100 20MAY2016 78
100 21JUN2016 54
100 23JUL2016 56
100 30AUG2016 56
100 21SEP2016 89
100 31OCT2016 46
100 21NOV2016 78
100 22DEC2016 78
100 20JAN2017 34
100 22FEB2017 65
100 24MAR2017 45
100 30APR2017 45
200 20MAY2016 90
200 21JUN2016 78
200 23JUL2016 45
200 30AUG2016 45
200 21SEP2016 54
200 31OCT2016 67
200 21NOV2016 78
200 22DEC2016 67
200 20JAN2017 56
200 22FEB2017 43
200 24MAR2017 66
200 30APR2017 65
300 20MAY2016 78
300 21JUN2016 66
300 23JUL2016 55
300 30AUG2016 44
300 21SEP2016 33
300 31OCT2016 11
300 21NOV2016 56
300 22DEC2016 43
300 20JAN2017 78
300 22FEB2017 55
300 24MAR2017 55
300 30APR2017 33
400 20MAY2016 87
400 21JUN2016 67
400 23JUL2016 68
400 30AUG2016 43
400 21SEP2016 46
400 31OCT2016 32
400 21NOV2016 32
400 22DEC2016 44
400 20JAN2017 44
400 22FEB2017 43
400 24MAR2017 54
400 30APR2017 66
500 20MAY2016 69
500 21JUN2016 54
500 23JUL2016 67
500 30AUG2016 67
500 21SEP2016 54
500 31OCT2016 43
500 21NOV2016 43
500 22DEC2016 67
500 20JAN2017 56
500 22FEB2017 56
500 24MAR2017 56
500 30APR2017 43
;
RUN;
Using the method I suggested only one number would have to be represented as a macro variable, namely:
if count2 ge count-2 then sum+point;
The -2 represents the number of months minus one.
Art, CEO, AnalystFinder.com
@abasiamaokonna0 wrote:Thanks guys,do you guyz know how i can automate the solution such that,it gives the user flexibility to select number of months he or she wants to sum up.If the user wants last 3 or 4 or5 month,he or she can do that without having to change the whole code.Kind of macro variable
I have this suggestion - I put in a check to see if there actually were data points enough for a given ID, the actual number of data points used is in the variable COUNT. I was too lazy to sort your data, so I put in a NOTSORTED in the BY statement. You will see that if you use a high value (e.g. 30) for &N, you will get the same output as from PROC SUMMARY.
%let n=3; data want; do _N_=1 by 1 until(last.id); set dates; by id notsorted; array points(0:%eval(&n-1)) _temporary_; points(mod(_N_,&n))=point; end; count=min(_N_,&n); /* the number of data points actually available for this ID, if less than &n */ /* null the datapoints from previous IDs, if any */
do _N_=_N_+1 to &n; points(mod(_N_,&n))=.; end; sum=sum(of points(*)); run;
@abasiamaokonna0 wrote:
Thanks guys,do you guyz know how i can automate the solution such that,it gives the user flexibility to select number of months he or she wants to sum up.If the user wants last 3 or 4 or5 month,he or she can do that without having to change the whole code.Kind of macro variable
How will you collect the desired interval data from the user? Will they supply "number of last months" or an interval (start date and end date)?
What is the desired result if there are fewer than the requested "months" of data available?
What is the desired result if the requested Interval has one or both endpoints outside of the available data?
What is the desired result if the request is for a period where there are no records for one of the ID values?
Also with your data intervals more than a little irregular you might even have to define "month"
Assuming the data are grouped by ID (and sorted by date within ID), there is no need for do loops, or extra proc's. Just maintain a temporary array carrying the most recent 3 (or some other number) of POINTS values, as in:
DATA DATES;
INPUT ID : $8. VISIT : DATE9. POINT : 8.;
FORMAT VISIT DATE9.;
DATALINES;
600 20MAY2016 20
600 21JUN2016 30
600 23JUL2016 50
600 30AUG2016 50
600 21SEP2016 30
600 31OCT2016 12
600 21NOV2016 23
600 22DEC2016 22
600 20JAN2017 30
100 20MAY2016 78
100 21JUN2016 54
100 23JUL2016 56
100 30AUG2016 56
100 21SEP2016 89
100 31OCT2016 46
100 21NOV2016 78
100 22DEC2016 78
100 20JAN2017 34
100 22FEB2017 65
100 24MAR2017 45
100 30APR2017 45
200 20MAY2016 90
200 21JUN2016 78
200 23JUL2016 45
200 30AUG2016 45
200 21SEP2016 54
200 31OCT2016 67
200 21NOV2016 78
200 22DEC2016 67
200 20JAN2017 56
200 22FEB2017 43
200 24MAR2017 66
200 30APR2017 65
300 20MAY2016 78
300 21JUN2016 66
300 23JUL2016 55
300 30AUG2016 44
300 21SEP2016 33
300 31OCT2016 11
300 21NOV2016 56
300 22DEC2016 43
300 20JAN2017 78
300 22FEB2017 55
300 24MAR2017 55
300 30APR2017 33
400 20MAY2016 87
400 21JUN2016 67
400 23JUL2016 68
400 30AUG2016 43
400 21SEP2016 46
400 31OCT2016 32
400 21NOV2016 32
400 22DEC2016 44
400 20JAN2017 44
400 22FEB2017 43
400 24MAR2017 54
400 30APR2017 66
500 20MAY2016 69
500 21JUN2016 54
500 23JUL2016 67
500 30AUG2016 67
500 21SEP2016 54
500 31OCT2016 43
500 21NOV2016 43
500 22DEC2016 67
500 20JAN2017 56
500 22FEB2017 56
500 24MAR2017 56
500 30APR2017 43
;
RUN;
%let ndate=3;
data want (keep=id total_points);
set dates;
by id notsorted;
array pts {&ndate} _temporary_;
if first.id then call missing(of pts{*});
pts{mod(_N_,&ndate)+1}=point;
if last.id;
total_points=sum(of pts{*});
run;
This program does the following:
mkeintz,
Just a couple of points:
@mkeintz wrote:Assuming the data are grouped by ID (and sorted by date within ID), there is no need for do loops, or extra proc's. Just maintain a temporary array carrying the most recent 3 (or some other number) of POINTS values, as in:
DATA DATES; INPUT ID : $8. VISIT : DATE9. POINT : 8.; FORMAT VISIT DATE9.; DATALINES; 600 20MAY2016 20 600 21JUN2016 30 600 23JUL2016 50 600 30AUG2016 50 600 21SEP2016 30 600 31OCT2016 12 600 21NOV2016 23 600 22DEC2016 22 600 20JAN2017 30 100 20MAY2016 78 100 21JUN2016 54 100 23JUL2016 56 100 30AUG2016 56 100 21SEP2016 89 100 31OCT2016 46 100 21NOV2016 78 100 22DEC2016 78 100 20JAN2017 34 100 22FEB2017 65 100 24MAR2017 45 100 30APR2017 45 200 20MAY2016 90 200 21JUN2016 78 200 23JUL2016 45 200 30AUG2016 45 200 21SEP2016 54 200 31OCT2016 67 200 21NOV2016 78 200 22DEC2016 67 200 20JAN2017 56 200 22FEB2017 43 200 24MAR2017 66 200 30APR2017 65 300 20MAY2016 78 300 21JUN2016 66 300 23JUL2016 55 300 30AUG2016 44 300 21SEP2016 33 300 31OCT2016 11 300 21NOV2016 56 300 22DEC2016 43 300 20JAN2017 78 300 22FEB2017 55 300 24MAR2017 55 300 30APR2017 33 400 20MAY2016 87 400 21JUN2016 67 400 23JUL2016 68 400 30AUG2016 43 400 21SEP2016 46 400 31OCT2016 32 400 21NOV2016 32 400 22DEC2016 44 400 20JAN2017 44 400 22FEB2017 43 400 24MAR2017 54 400 30APR2017 66 500 20MAY2016 69 500 21JUN2016 54 500 23JUL2016 67 500 30AUG2016 67 500 21SEP2016 54 500 31OCT2016 43 500 21NOV2016 43 500 22DEC2016 67 500 20JAN2017 56 500 22FEB2017 56 500 24MAR2017 56 500 30APR2017 43 ; RUN; %let ndate=3; data want (keep=id total_points); set dates; by id notsorted; array pts {&ndate} _temporary_; if first.id then call missing(of pts{*}); pts{mod(_N_,&ndate)+1}=point; if last.id; total_points=sum(of pts{*}); run;
This program does the following:
- Assumes the data are grouped by ID (but the NOTSORTED parameter in the BY statement allows the ID groups to be in any order.
- The TEMPORARY array
- Is the same size as the number of trailing dates wanted. Just change the macrovar NDATES to get different trailing window sizes.
- has its values automatically retained from observation to observation.
- The mod(_N_,3) function is the remainder of the observation number divided by 3 (for NDATES=3), resulting in a 0, 1, or 2. Adding 1 yields a way to put POINT in elements 1, 2, or 3 of the PTS 3-element array.
@s_lassen - Somehow I overlooked your response, which preceded mine substantially. Wouldn't have been so expansive in my explanation if I had seen it.
Yes, by all means let array have an index range of 0 to ndate-1 instead of 1 to ndate. I didn't want to overburden the OP with what I suspected would be more new knowledge than wanted.
However, I'm not sure your program as presented does what is intended. In particular the "sum=sum(of points{*})" follows (rather than precedes) resetting of the array to missing.
A running total can be maintained without an array.
This will work even if there are fewer records than need to be summed.
%let n=3;
data WANT(keep= GRP_ID SUM);
set DATES end=LASTOBS ;
GRP_ID=lag(ID);
if _N_>1 & ID ne GRP_ID then do;
output;
call missing(SUM);
end;
SUM + sum(POINT, - (lag&n(ID)=ID) * lag&n(POINT) );
if LASTOBS then output;
run;
@mkeintz wrote:@s_lassen - Somehow I overlooked your response, which preceded mine substantially. Wouldn't have been so expansive in my explanation if I had seen it.
Yes, by all means let array have an index range of 0 to ndate-1 instead of 1 to ndate. I didn't want to overburden the OP with what I suspected would be more new knowledge than wanted.
However, I'm not sure your program as presented does what is intended. In particular the "sum=sum(of points{*})" follows (rather than precedes) resetting of the array to missing.
Actually, the "setting of the array to missing" only concerns the elements that were not set by this ID's observations, in case there were fewer observations than wanted.
@abasiamaokonna0 I edited your post for some small insignificant grammatical errors, but also to format your code.
Please use the insert code (6/7th) icons in the Rich Text Editor to insert your code, otherwise the forum can garble your code, especially if it has colons. I also changed your subject line to be more descriptive.
Here is another way:
data WANT;
merge DATES
DATES (firstobs=2 rename=( ID=ID2 POINT=POINT2))
DATES (firstobs=3 rename=( ID=ID3 POINT=POINT3))
DATES (firstobs=4 rename=( ID=ID4 POINT=POINT4));
if ID3 ne ID4; %* Check that ID3 is the last ID, i.e. ID4 is a new ID;
SUM=sum(POINT, POINT2, POINT3);
output;
keep ID SUM;
run;
This assumes there are always at least 3 dates per ID, and no check is made to ensure this.
ID | SUM |
---|---|
600 | 75 |
100 | 155 |
200 | 174 |
300 | 143 |
400 | 163 |
500 | 155 |
Another fun way:
/*Assuming the number of records in each group is divisible by 3 as it is in your sample to conveniently make it sets of three*/
data want;
do until(last.id);
do _n_=1 by 1 until(mod(_n_,3)=0);
set dates;
by id notsorted;
array temp{3} _temporary_;
temp(_n_)=point;
if last.id then
do;
sum=sum(of temp(*));
output;
end;
end;
end;
run;
Regards,
Naveen Srinivasan
DATA DATES;
INPUT ID : $8. VISIT : DATE9. POINT : 8.;
FORMAT VISIT DATE9.;
DATALINES;
600 20MAY2016 20
600 21JUN2016 30
600 23JUL2016 50
600 30AUG2016 50
600 21SEP2016 30
600 31OCT2016 12
600 21NOV2016 23
600 22DEC2016 22
600 20JAN2017 30
100 20MAY2016 78
100 21JUN2016 54
100 23JUL2016 56
100 30AUG2016 56
100 21SEP2016 89
100 31OCT2016 46
100 21NOV2016 78
100 22DEC2016 78
100 20JAN2017 34
100 22FEB2017 65
100 24MAR2017 45
100 30APR2017 45
200 20MAY2016 90
200 21JUN2016 78
200 23JUL2016 45
200 30AUG2016 45
200 21SEP2016 54
200 31OCT2016 67
200 21NOV2016 78
200 22DEC2016 67
200 20JAN2017 56
200 22FEB2017 43
200 24MAR2017 66
200 30APR2017 65
300 20MAY2016 78
300 21JUN2016 66
300 23JUL2016 55
300 30AUG2016 44
300 21SEP2016 33
300 31OCT2016 11
300 21NOV2016 56
300 22DEC2016 43
300 20JAN2017 78
300 22FEB2017 55
300 24MAR2017 55
300 30APR2017 33
400 20MAY2016 87
400 21JUN2016 67
400 23JUL2016 68
400 30AUG2016 43
400 21SEP2016 46
400 31OCT2016 32
400 21NOV2016 32
400 22DEC2016 44
400 20JAN2017 44
400 22FEB2017 43
400 24MAR2017 54
400 30APR2017 66
500 20MAY2016 69
500 21JUN2016 54
500 23JUL2016 67
500 30AUG2016 67
500 21SEP2016 54
500 31OCT2016 43
500 21NOV2016 43
500 22DEC2016 67
500 20JAN2017 56
500 22FEB2017 56
500 24MAR2017 56
500 30APR2017 43
;
RUN;
data temp;
merge dates dates(keep=id rename=(id=_id) firstobs=4);
if id ne _id;
run;
proc summary data=temp nway;
class id;
var point;
output out=want sum=;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.