DATA Step, Macro, Functions and more

Efficient way to pattern match strings in SAS

Reply
Contributor
Posts: 36

Efficient way to pattern match strings in SAS

I have some problem regarding pattern detecting and matching in SAS.

Imagine I have some clients, at the point of beginning, 2014Q4, all the clients belong to Segment ST0.

Then one year later at 2015Q4, some belong to ST0, some to ST, some empty, some belong to Segment A.

Then in the following four quarters in 2016, the segment further changed.

 

The starting table looks like:

data TEST1;
input ClientID SEG_2014Q4 $ SEG_2015Q4 $ SEG_2016Q1 $ SEG_2016Q2 $ SEG_2016Q3 $ SEG_2016Q4 $;
cards;
1001 ST0 ST0 ST0 ST0 ST0 ST0
1002 ST0 ST0 ST , , ,
1003 ST0 ST0 , , , ,
1004 ST0 ST0 A A , ,
1005 ST0 A A A A A
1006 ST0 , BX BX , ,
1007 ST0 A A A A BD
1008 ST0 , , , , ,
1009 ST0 C C ST0 ST0 ,
;
run;

 

What I want to do is to find out the End Segment at 2015Q4 (Seg_2015q4). But I have reason to believe that the values under SEG_2015Q4 are not completely updated. Therefore I will also look into the following 4 quarters in 2016 to see if the Segnent has changed in 2016, and use that information to update Seg_2015Q4 retroactively.

 

A client always start from Seg_2014Q4 with a initial Segment=ST0. At 2015Q4 and afterwards, that client can either retain ST0, or acquired by mistake something like ST (where 0 was dropped in the delivery, but they are the same). A client can also temporarily or permanently disappeared from the list, so the Segment can be empty, missing.

 

If a client has acquired a new segment at 2015Q4, such as A, B or whatever, this segment information will stick, no matter what it may change into in the following quarters.

 

The problem lies at, for those clients having same ST0, or ST, or missing at 2015Q4, this information can not be trusted. One must look into the future to see if this client has acquired any segment other than ST0, ST or missing. If it does, the first segment information will be used to replace the old ones for 2015Q4. 

 

All the newly mapped information will be stored in SEG_2015Q4_NEW.

 

 

 

The result should look like this:

 

data TEST2;
input ClientID SEG_2014Q4 $ SEG_2015Q4_NEW $;
cards;
1001 ST0 ST0
1002 ST0 ST
1003 ST0 ST0
1004 ST0 A
1005 ST0 A
1006 ST0 BX
1007 ST0 A
1008 ST0 ,
1009 ST0 C
;
run;

 

Results:

Client SEG_2014Q4  SEG_2015Q4_NEW

1001 ST0 ST0
1002 ST0 ST
1003 ST0 ST0
1004 ST0 A
1005 ST0 A
1006 ST0 BX
1007 ST0 A
1008 ST0 ,
1009 ST0 C

 

I can use a lot of if-else logic to program this. But I am more willing to use pattern match functions in SAS, to detect those patterns not in (ST0, ST, missing) and take their value instead.

Thanks for any help.

 

Ken

Trusted Advisor
Posts: 1,630

Re: Efficient way to pattern match strings in SAS

While I understand why you have letter codes for each quarter ... would it be easier if you kept the entire sequence of letter codes as a single string for the purposes of this pattern matching? Then, the FIND or FINDW command can determine if there is the letter/word A somewhere in the string; etc.

Super User
Posts: 5,092

Re: Efficient way to pattern match strings in SAS

This ought to work:

 

data want;

set have;

length seg_2015Q4_New $ 8;

do seg_2015Q4_new = SEG_2015Q4, SEG_2016Q1, SEG_2016Q2, SEG_2016Q3, SEG_2016Q4 

until (seg_2015Q4_new in : ('A', 'B', 'C'));

end;

run;

 

If SAS complains about the syntax, we might have to put the variables into an array instead of naming them individually here.  But I'm expecting the combination of a list with DO UNTIL will work.

Ask a Question
Discussion stats
  • 2 replies
  • 103 views
  • 0 likes
  • 3 in conversation