Hi Guys,
I have some sample data below and wanted to calculate the SUM_TAG column.
ID | KEY | DATE | VALUE | SUM_TAG |
1 | A | 01/01/20 | 10 | 1 |
2 | A | 01/02/20 | 20 | 1 |
3 | A | 01/04/20 | 30 | 1 |
4 | A | 01/07/20 | 40 | 1 |
5 | A | 01/11/20 | 50 | 2 |
6 | A | 01/01/21 | 60 | 2 |
7 | A | 01/07/21 | 70 | 3 |
8 | B | 01/08/20 | 80 | 1 |
9 | B | 01/10/20 | 90 | 1 |
10 | B | 01/02/21 | 100 | 2 |
The logic on how I populate the column SUM_TAG is first is to group it by column KEY. Once grouped, the DATE value in each row should be within 3 months of the above row.
For example, the SUM_TAG of ID 1 and 2 is 1 because the DATE of ID 1 and 2 is within 3 months which is 01/01/20 and 01/02/20. Also ID 3 is has a SUM_TAG of 1 because the DATE value is 01/04/20 which is within 3 months of ID 2 which has the DATE 01/02/20.
Starting from ID 5 the SUM_TAG is 2 because the DATE value of the record is more than 3 months to the DATE value of ID 4.
I tried using first., last. and lag function to compare the rows but the result is as desired.
Can anyone help me?
Thanks.
Albert0
@Albert0 This code produces the sum_tag values you had in your table
data have;
length id 4
key $1
date 8
value 4
;
input ID KEY $ DATE :ddmmyy. VALUE;
format date ddmmyy.;
datalines;
1 A 01/01/20 10
2 A 01/02/20 20
3 A 01/04/20 30
4 A 01/07/20 40
5 A 01/11/20 50
6 A 01/01/21 60
7 A 01/07/21 70
8 B 01/08/20 80
9 B 01/10/20 90
10 B 01/02/21 100
;
run;
proc sort data=have;
by key id;
run;
data want(drop= prev_dt diff);
length prev_dt 8;
set have;
by key id;
/* Reset sum_tag to 1 */
if ((_n_=1) OR (lag(key) NE key)) then sum_tag=1;
prev_dt = lag(date); /* Find Previous date */
diff= intck('month',prev_dt,date);
/* Increase sum_tag whenever diff exceeds 3 */
if (diff > 3) then
sum_tag + 1;
run;
Hope this helps,
Ahmed
I tried using first., last. and lag function to compare the rows but the result is as desired.
Show us what you tried. Explain why the result was not acceptable.
For future reference, do not provide data as screen captures. It is much better to provide data in a form we can work with, as described here:
How to create a data step version of your data AKA generate sample data for forums
If you are going to use dates without 4 digits you really need to tell us what format you are using if you are not providing actual SAS data step code to describe the data.
For example 01/02/20 could be: 02 Jan 2020, 01 Feb 2020 or 20 Feb 2001. Depending on which of these the values are actually using telling "within 3 months" is pretty hard.
AND depending on the system YEARCUTOFF value in effect might be entirely different years.
Moral of the story: It is a bad idea (unless your boss insists) to ever use dates without 4 digits for display.
Additionally from a picture we cannot tell if you actually have SAS date values, which will be needed to do comparisons like "within 3 months" or a character value. If your date is a character variable then your solution will involve creating a SAS date value.
Hi @Albert0
Are you sure whether
5 | A | 01/11/20 | 50 | 2 |
6 | A | 01/01/21 | 60 | 2 |
the count of monthly intervals between 01/11/20 and 01/01/21 is within 3 months? i.e. considering it is formatted as MMDDYY
@novinosrin wrote:
Hi @Albert0
Are you sure whether
5 A 01/11/20 50 2 6 A 01/01/21 60 2
the count of monthly intervals between 01/11/20 and 01/01/21 is within 3 months? i.e. considering it is formatted as MMDDYY
I suspect the data is actually YYMMDD. But that is why I brought to point up about 4 digit years earlier.
Sir @ballardw my apologies, I overlooked your previous post. Yes, that's my concern too.
@Albert0 This code produces the sum_tag values you had in your table
data have;
length id 4
key $1
date 8
value 4
;
input ID KEY $ DATE :ddmmyy. VALUE;
format date ddmmyy.;
datalines;
1 A 01/01/20 10
2 A 01/02/20 20
3 A 01/04/20 30
4 A 01/07/20 40
5 A 01/11/20 50
6 A 01/01/21 60
7 A 01/07/21 70
8 B 01/08/20 80
9 B 01/10/20 90
10 B 01/02/21 100
;
run;
proc sort data=have;
by key id;
run;
data want(drop= prev_dt diff);
length prev_dt 8;
set have;
by key id;
/* Reset sum_tag to 1 */
if ((_n_=1) OR (lag(key) NE key)) then sum_tag=1;
prev_dt = lag(date); /* Find Previous date */
diff= intck('month',prev_dt,date);
/* Increase sum_tag whenever diff exceeds 3 */
if (diff > 3) then
sum_tag + 1;
run;
Hope this helps,
Ahmed
Something like this:
data want;
set have;
by Key ID;
if intck('month',lag(date),date)>3 then
sum_tag+1;
if first.Key then
sum_tag=1;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.