BookmarkSubscribeRSS Feed
Sk1_SAS
Obsidian | Level 7

Hi Team,

 

I need help, i'll try to explain, this is my table:

 

DATA TEST;

INPUT ID 1-2 MMYY $3-7 VIOLATION $9;

cards;

 

2 06/17 A

2 05/17 A

2 04/17 A

2 03/17 A

2 02/17 B

1 11/18 A

1 10/18 A

1 09/18 B

1 08/18 A

1 07/18 A

1 06/18 A

1 05/18 A

1 03/18 A

;

RUN;

 

What i need to do is identify the id with the most recent date, and verify if the ID violated the same Violation type more than three times  consecutive before the recent date.
Note that ID 2 is OK, because the most recent date is 06/17 and he got 3 consecutives date with the same violation before the recent date. (03/17 to 05/17).

 

The ID 1 the most recent date is 11/18 but he dont have the three consecutive violations before the most recent date.

How can i identify this cases? with three consecutives violations before the most recent date?

Thanks in advance!!!

 

 

 

 

9 REPLIES 9
novinosrin
Tourmaline | Level 20

 

What is your expected output  I am unable to vision that from what you wrote. For example, 

" ID 2 is OK," - Does this mean count of violation A being the type on most recent date is 3 but not more than 3 as you have stated more than 3? If yes, what happens to ID 2?

 

Ideally, if you could provide an output sample explaining the "logic/why" would help

 

 


@Sk1_SAS wrote:

Hi Team,

 

I need help, i'll try to explain, this is my table:

 

DATA TEST;

INPUT ID 1-2 MMYY $3-7 VIOLATION $9;

cards;

 

2 06/17 A

2 05/17 A

2 04/17 A

2 03/17 A

2 02/17 B

1 11/18 A

1 10/18 A

1 09/18 B

1 08/18 A

1 07/18 A

1 06/18 A

1 05/18 A

1 03/18 A

;

RUN;

 

What i need to do is identify the id with the most recent date, and verify if the ID violated the same Violation type more than three times  consecutive before the recent date.
Note that ID 2 is OK, because the most recent date is 06/17 and he got 3 consecutives date with the same violation before the recent date. (03/17 to 05/17).

 

The ID 1 the most recent date is 11/18 but he dont have the three consecutive violations before the most recent date.

How can i identify this cases? with three consecutives violations before the most recent date?

Thanks in advance!!!

 

 

 

 


 

Sk1_SAS
Obsidian | Level 7

Hi novinosrin, i'll try to clarify, this base that i created using CARDS is an output sampling of a Big Base, the ID 2 may have three or more consecutive violations.

Astounding
PROC Star

First, a suggestion.  Instead of reading in MMYY as one variable, break it out into two variables.  Having a separate month and year will let you sort your data, and guarantee that it is in the proper order.

 

Next, the questions.

 

For ID 2, you only have 3 consecutive violations, prior to the latest one.  Do you only want 3, not more than 3?  Do you count the most recent date when counting consecutive violations?

 

For ID 1, you have 5 consecutive violations prior to the most recent.  Are you not counting those because there's a "B" in between all the "A" values?  There must be a rule or two missing from the description of the problem.  If you had a "B" (most recent) followed by 5 "A" would that count?

Sk1_SAS
Obsidian | Level 7

Thank you for the suggestion, in my original base i did that to sort the date.

Yes, can be three or more. No i do not count the most recent date, i start counting from the most recent date forward if the date is consecutive and the violation is the same.

For the ID1: Yes, i stop count because there are a B between the A violations.

What i need to know is if the ID will have the same violation in three consecutive dates before the most recent date. 

Astounding
PROC Star

OK, here's what I think you are after.  It assumes your data is already in the proper order.

 

data want;

set have;

by id violation notsorted;

if first.id then do;

   consecutive=-1;

   latest_date = mmyy;

   retain latest_date;

run;

else if first.violation then consecutive=5;

consecutive + 1;

if consecutive = 3;

drop consecutive;

run;

 

This gives you one observation per ID, when the ID had 3+ consecutive earlier violations.  It actually outputs the observation that is the third of the 3+, so there might be more MMYY values that belong in the same group.

ballardw
Super User

First step is to get actual DATE values so that sorting and such works correctly. Example:

DATA TEST;
INPUT ID 1-2 MMYY $3-7 VIOLATION $9;
date = input(catt(scan(mmyy,2,'/'),scan(mmyy,1,'/')),yymmn4.);
format date date9.;
cards;
2 06/17 A
2 05/17 A
2 04/17 A
2 03/17 A
2 02/17 B
1 11/18 A
1 10/18 A
1 09/18 B
1 08/18 A
1 07/18 A
1 06/18 A
1 05/18 A
1 03/18 A
;
RUN;

The SAS YYMMN4 informat reads values such as 1706 into the first day of the month for June 2017 (depending you your YEARCUTOFF system option). So the above code rearranges your MM/YY to a YYMM value that YYMMN4 will read.

 

I assigned a date9 format just to show the result. You could use any date format you prefer.

 

Then one way to identify "most recent" is largest value for the date so sorting by ID and Date would have the last ID with the "most recent date".

 

You really need to show what your final output should be. You say

The ID 1 the most recent date is 11/18 but he dont have the three consecutive violations before the most recent date.

How can i identify this cases? with three consecutives violations before the most recent date?

But I see that ID=1 has violations on 05/18, 06/18, 07/18 and  08/18 of A which look consecutive to me. So you may need to explain your rule a bit more clearly.

 

You likely should include date with date gaps in your example and the worked result to demonstrate what should be done with them.

 

This may help get started as it marks each record for whether consecutive or not (when 3 records are available for the same ID);

But with out knowing the actual desired result can't go much further. Consecutive=1 when there are 3 consecutive matching violations and 0 when not matching but there are 3 available records, missing when there are not 3 available records.

 

Proc sort data=test;
   by id date;
run;

data want;
   set test;
   by id;
   lv1=lag1(violation);
   lv2=lag2(violation);
   ld1=lag1(date);
   ld2=lag2(date);
   if first.id then rec=0;
   else rec+1;
   if rec ge 3 then consecutive = (violation= lv1=lv2) AND (date=intnx('month',ld1,1,'B')=intnx('month',ld2,2,'B'));
   
   drop lv1 lv2 ld1 ld2;

run;
Sk1_SAS
Obsidian | Level 7

Hi, yes! this is the problem that i got, the ID 1 has consecutives dates(18/05, 18/06, 18/07 e 18/08) but my most recent date in this case is 11/18 and in 09/18 the violation is B, so this case have three consecutive violations but not three before the most recent date.

I'll try your suggestion.

Thanks.

ballardw
Super User

@Sk1_SAS wrote:

Hi, yes! this is the problem that i got, the ID 1 has consecutives dates(18/05, 18/06, 18/07 e 18/08) but my most recent date in this case is 11/18 and in 09/18 the violation is B, so this case have three consecutive violations but not three before the most recent date.

I'll try your suggestion.

Thanks.


So is the rule "3 consecutive identical violations in the immediately preceding 3 months"? You did not specify previously that the 3 violations have to be in a specific time frame in relation to the latest date (or at least not very clearly).

Sk1_SAS
Obsidian | Level 7

Hi Ballard, the rule is: 3 consecutive identical violations in the preceding 3 months. I Tried your suggestion, look the result: I just modified the sort date, to descending the date:

 

Proc sort data=test;

by id DESCENDING date;

run;

 

the result:

 

IDMMYYVIOLATIONdaterecconsecutive
111/18A01Nov20180 
110/18A01Oct20181 
109/18B01Sep20182 
108/18A01Aug201830
107/18A01Jul201840
106/18A01Jun201850
105/18A01May201860
103/18A01Mar201870
206/17A01Jun20170 
205/17A01May20171 
204/17A01Apr20172 
203/17A01Mar201730
202/17B01Feb201740

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1339 views
  • 0 likes
  • 4 in conversation