BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Albert0
Quartz | Level 8

Hi Guys,

 

I have some sample data below and wanted to calculate the SUM_TAG column.

 

IDKEYDATEVALUESUM_TAG
1A01/01/20101
2A01/02/20201
3A01/04/20301
4A01/07/20401
5A01/11/20502
6A01/01/21602
7A01/07/21703
8B01/08/20801
9B01/10/20901
10B01/02/211002

 

The logic on how I populate the column SUM_TAG is first is to group it by column KEY. Once grouped, the DATE value in each row should be within 3 months of the above row.

 

For example, the SUM_TAG of ID 1 and 2 is 1 because the DATE of ID 1 and 2 is within 3 months which is 01/01/20 and 01/02/20. Also ID 3 is has a SUM_TAG of 1 because the DATE value is 01/04/20  which is within 3 months of ID 2 which has the DATE 01/02/20.

 

Starting from ID 5 the SUM_TAG is 2 because the DATE value of the record is more than 3 months to the DATE value of ID 4.

 

I tried using first., last. and lag function to compare the rows but the result is as desired.

 

Can anyone help me?

 

Thanks.

Albert0

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
AhmedAl_Attar
Rhodochrosite | Level 12

@Albert0 This code produces the sum_tag values you had in your table

data have;
	length id 4
		key $1
		date 8
		value 4
		;
	input ID KEY $ DATE :ddmmyy. VALUE;
	format date ddmmyy.;
datalines; 
1 A 01/01/20 10
2 A 01/02/20 20
3 A 01/04/20 30
4 A 01/07/20 40
5 A 01/11/20 50
6 A 01/01/21 60
7 A 01/07/21 70
8 B 01/08/20 80
9 B 01/10/20 90
10 B 01/02/21 100
;
run;

proc sort data=have;
	by key id;
run;

data want(drop= prev_dt diff);
	length prev_dt 8;

	set have;
	by key id;

	/* Reset sum_tag to 1 */
	if ((_n_=1) OR (lag(key) NE key)) then sum_tag=1;

	prev_dt = lag(date); /* Find Previous date */
diff= intck('month',prev_dt,date); /* Increase sum_tag whenever diff exceeds 3 */ if (diff > 3) then sum_tag + 1; run;

Hope this helps,

Ahmed

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

I tried using first., last. and lag function to compare the rows but the result is as desired.


Show us what you tried. Explain why the result was not acceptable.

 

For future reference, do not provide data as screen captures. It is much better to provide data in a form we can work with, as described here: 

How to create a data step version of your data AKA generate sample data for forums

--
Paige Miller
ballardw
Super User

If you are going to use dates without 4 digits you really need to tell us what format you are using if you are not providing actual SAS data step code to describe the data.

For example 01/02/20 could be:   02 Jan 2020, 01 Feb 2020 or 20 Feb 2001. Depending on which of these the values are actually using telling "within 3 months" is pretty hard.

AND depending on the system YEARCUTOFF value in effect might be entirely different years.

 

Moral of the story: It is a bad idea (unless your boss insists) to ever use dates without 4 digits for display.

 

Additionally from a picture we cannot tell if you actually have SAS date values, which will be needed to do comparisons like "within 3 months" or a character value. If your date is a character variable then your solution will involve creating a SAS date value.

Albert0
Quartz | Level 8
Sorry to trouble you. The format I have in my example is in DD/MM/YY.
novinosrin
Tourmaline | Level 20

Hi @Albert0 

Are you sure whether

5 A 01/11/20 50 2
6 A 01/01/21 60 2

 

the count  of monthly intervals between 01/11/20 and 01/01/21 is within 3 months? i.e. considering it is formatted as MMDDYY 

ballardw
Super User

@novinosrin wrote:

Hi @Albert0 

Are you sure whether

5 A 01/11/20 50 2
6 A 01/01/21 60 2

 

the count  of monthly intervals between 01/11/20 and 01/01/21 is within 3 months? i.e. considering it is formatted as MMDDYY 


I suspect the data is actually YYMMDD. But that is why I brought to point up about 4 digit years earlier.

novinosrin
Tourmaline | Level 20

Sir @ballardw  my apologies, I overlooked your previous post. Yes, that's my concern too. 

Albert0
Quartz | Level 8
Sorry the format is DD/MM/YY
AhmedAl_Attar
Rhodochrosite | Level 12

@Albert0 This code produces the sum_tag values you had in your table

data have;
	length id 4
		key $1
		date 8
		value 4
		;
	input ID KEY $ DATE :ddmmyy. VALUE;
	format date ddmmyy.;
datalines; 
1 A 01/01/20 10
2 A 01/02/20 20
3 A 01/04/20 30
4 A 01/07/20 40
5 A 01/11/20 50
6 A 01/01/21 60
7 A 01/07/21 70
8 B 01/08/20 80
9 B 01/10/20 90
10 B 01/02/21 100
;
run;

proc sort data=have;
	by key id;
run;

data want(drop= prev_dt diff);
	length prev_dt 8;

	set have;
	by key id;

	/* Reset sum_tag to 1 */
	if ((_n_=1) OR (lag(key) NE key)) then sum_tag=1;

	prev_dt = lag(date); /* Find Previous date */
diff= intck('month',prev_dt,date); /* Increase sum_tag whenever diff exceeds 3 */ if (diff > 3) then sum_tag + 1; run;

Hope this helps,

Ahmed

Albert0
Quartz | Level 8
Thank you. Your suggestion works as I wanted..
s_lassen
Meteorite | Level 14

Something like this:

data want;
  set have;
  by Key ID;
  if intck('month',lag(date),date)>3 then
    sum_tag+1;
  if first.Key then
    sum_tag=1;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 810 views
  • 0 likes
  • 6 in conversation