BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS_New_User1
Obsidian | Level 7

Hi, 

I have the below 3 tables

 

Table 1 (Main Table)

ID    message_1                          message_2

1    catcansiton:50:31123nearby        catcansiton:50:MJKXY32nearby

2    dogcaneat:60:54637fruit              dogcaneat:60:DMF67XXMfruit

3    beesand:30:BOTKXXM                beesand:30:64590

4    colorofsky:40:DMF67XXMblue     colorofsky:40:54637blue

 

Table 2 (Reference table)

Object_1      Object_2

31123             MJKXY32

54637            DMF67XXM

64590            BOTKXXM 

 

Output Table (Output Desired)

ID    message_1                                  message_2

1    catcansiton:50:31123nearby        catcansiton:50:31123nearby

2    dogcaneat:60:54637fruit              dogcaneat:60:54637fruit

3    beesand:30:64590                       beesand:30:64590

4    colorofsky:40:54637blue             colorofsky:40:54637blue

 

I have a main table which has message_1 and message_2 fields which are partially same, except for the codes in the middle which start with either of the four :30:, :40:, :50: and :60: strings before them.

 

 I have a separate reference table which I want to use to look up the the objects and replace them by other object in the main table either to message1 or message_2 and get the desired output. Can you please guide me with how to use a reference table to find the data in the main table ?

 

Thank you 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

1) I suppose that each message has 3 substrings delimited by colon ':', in such case you can 

    subtract it by:   third_str = scan(message,3,':');  

2) Does data have messages that do not contain the ':20:', ':30:' ,':40:', ':60:' codes.

    if positive you need skip those messages.

3) I do not understand the rules for replacing the substrings ,

    Next code joines the two tables up to point for replacing substrings:

data main;  /* MAIN table */
   length message_1 message_2 $40; /* adapt length to data */
   infile cards truncover;
   input ID message_1 $ message_2 $;
cards;
1    catcansiton:50:31123nearby    catcansiton:50:MJKXY32nearby
2    dogcaneat:60:54637fruit       dogcaneat:60:DMF67XXMfruit
3    beesand:30:BOTKXXM            beesand:30:64590
4    colorofsky:40:DMF67XXMblue    colorofsky:40:54637blue
; run;

data refer;  /* Reference table */
   length object_1 object_2 $20; /* adapt length to expected data */
   infile cards truncover;
   input object_1 $ object_2 $;
cards;
31123   MJKXY32
54637   DMF67XXM
64590   BOTKXXM 
; run;

data temp1;  /* Extract substring to check vs refernece */
 set main;
   length submsg1 submsg2 $20;
   submsg1 = scan(message_1,3,':');
   submsg2 = scan(message_2,3,':');
run;

proc sql;
   create table temp2 as
   select a.* , b.*
   from temp1 as a 
   left join refer as b 
   on index(a.message_1, submsg1) > 0 or
      index(a.message_1, submsg2) > 0 or
	  index(a.message_2, submsg1) > 0 or
	  index(a.message_2, submsg2) > 0 ;
quit;

  next step need to complete:

data want;
 set temp2;
     if index(a.submsg1, object_1) > 0 then do;
         /* replace substring and output */
     end;
     if index(a.submsg1, object_2) > 0 then do;
         /* replace substring and output */
     end;
     if index(a.submsg2, object_1) > 0 then do;
         /* replace substring and output */
     end;
     if index(a.submsg2, object_2) > 0 then do;
         /* replace substring and output */
     end;
      
run;

 

 

 

 

View solution in original post

9 REPLIES 9
PeterClemmensen
Tourmaline | Level 20

So you know what preceeds the strings of interest? Do you also know what comes after?

PhilC
Rhodochrosite | Level 12

Good questions.  Also are the object strings always capital and are the "words" after the object strings always lowercase, containing no numbers?

SAS_New_User1
Obsidian | Level 7

After the desired string could be any characters, there are all small or caps and alphanumeric. but the :30:, :40:, :50: and :60: are fixed preset codes to my desired reference codes in the main table. 

PhilC
Rhodochrosite | Level 12

It would have been much nicer to have the delimiter character after the object string.  That's a messy problem.

  • I would refer you to PRXMATCH.  Regular expressions save you the need to write lots of SAS code.  Your regular expression might be "\:/d/d:(object)\", replacing (object) for each know object string.
  • If you don't you can use FIND.  More complicated find ":" then verify the next two characters a numbers and the third is ":" again.  then then string after matches one of your object strings.  Much more code in this case.

 

SAS Help Center: PRXMATCH Function

SAS Help Center: FIND Function

Shmuel
Garnet | Level 18

1) I suppose that each message has 3 substrings delimited by colon ':', in such case you can 

    subtract it by:   third_str = scan(message,3,':');  

2) Does data have messages that do not contain the ':20:', ':30:' ,':40:', ':60:' codes.

    if positive you need skip those messages.

3) I do not understand the rules for replacing the substrings ,

    Next code joines the two tables up to point for replacing substrings:

data main;  /* MAIN table */
   length message_1 message_2 $40; /* adapt length to data */
   infile cards truncover;
   input ID message_1 $ message_2 $;
cards;
1    catcansiton:50:31123nearby    catcansiton:50:MJKXY32nearby
2    dogcaneat:60:54637fruit       dogcaneat:60:DMF67XXMfruit
3    beesand:30:BOTKXXM            beesand:30:64590
4    colorofsky:40:DMF67XXMblue    colorofsky:40:54637blue
; run;

data refer;  /* Reference table */
   length object_1 object_2 $20; /* adapt length to expected data */
   infile cards truncover;
   input object_1 $ object_2 $;
cards;
31123   MJKXY32
54637   DMF67XXM
64590   BOTKXXM 
; run;

data temp1;  /* Extract substring to check vs refernece */
 set main;
   length submsg1 submsg2 $20;
   submsg1 = scan(message_1,3,':');
   submsg2 = scan(message_2,3,':');
run;

proc sql;
   create table temp2 as
   select a.* , b.*
   from temp1 as a 
   left join refer as b 
   on index(a.message_1, submsg1) > 0 or
      index(a.message_1, submsg2) > 0 or
	  index(a.message_2, submsg1) > 0 or
	  index(a.message_2, submsg2) > 0 ;
quit;

  next step need to complete:

data want;
 set temp2;
     if index(a.submsg1, object_1) > 0 then do;
         /* replace substring and output */
     end;
     if index(a.submsg1, object_2) > 0 then do;
         /* replace substring and output */
     end;
     if index(a.submsg2, object_1) > 0 then do;
         /* replace substring and output */
     end;
     if index(a.submsg2, object_2) > 0 then do;
         /* replace substring and output */
     end;
      
run;

 

 

 

 

SAS_New_User1
Obsidian | Level 7

Thank you, the answer to the questions.

1. Each message does not have fixed number of substrings that can be repeated by : , hence the only way to determine the position of the string that needs to be replaced is between those fixed 4 code i.e. :30:, :40:, :50:, :60:

2. All the messages have either of these codes which come before and after the string that needs to be replaced. 

e.g.  message_1                                          message 2                               

        abcs:30:34629:60:hdjckhd                    abcs:30:BTKXX347:60:hdjckhd          

so the string in bold is the reference table string, so I have to convert the string in Message_2 using reference table to make it look like message_1 for comparison.      

 

 

PhilC
Rhodochrosite | Level 12

This problem became much easier.

Shmuel
Garnet | Level 18

@SAS_New_User1 wrote:

Thank you, the answer to the questions.

1. Each message does not have fixed number of substrings that can be repeated by : , hence the only way to determine the position of the string that needs to be replaced is between those fixed 4 code i.e. :30:, :40:, :50:, :60:

2. All the messages have either of these codes which come before and after the string that needs to be replaced. 

e.g.  message_1                                          message 2                               

        abcs:30:34629:60:hdjckhd                    abcs:30:BTKXX347:60:hdjckhd          

so the string in bold is the reference table string, so I have to convert the string in Message_2 using reference table to make it look like message_1 for comparison.      

 

 


1) In your 1st post the message contains only one code not two.

2) You can replace codes to one character to be used as a delimiter - let's say '09'x (a tab)

     by using tranw() function for each of the 4 codes.

     That enables use scan(message,2,'09'x) to extract the mid substring.

PhilC
Rhodochrosite | Level 12

@Shmuel has a good solution.  I was following and I wrote this code.  If it adds to the discussion.

data have;  /* MAIN table */
   length ID 8 message_1 message_2 $40; /* adapt length to data */
   infile cards truncover;
   input ID message_1 $ message_2 $;
cards;
1    catcansiton:50:31123:60:nearby    catcansiton:50:MJKXY32:40:nearby
2    dogcaneat:60:54637:30:fruit       dogcaneat:60:DMF67XXM:40:fruit
3    beesand:30:BOTKXXM:60:            beesand:30:64590:30:
4    colorofsky:40:DMF67XXM:30:blue    colorofsky:40:54637:30:blue
; run;

proc format ;
  value $object_correct
    "MJKXY32"  = "31123"
    "DMF67XXM" = "54637"
    "BOTKXXM"  = "64590"
    other = " "
    ;
run;

data want (Keep= ID message_1	message_2)
;  /* Extract substring to check vs refernece */
/*  if 0 then set refer;*/
  set main;
   array m[2,5] $ m11 m12 object_1 m14 m15 
                  m21 m22 object_2 m24 m25 ;
   
   do i=1 to 5; drop i;
     m[1,i]=scan(message_1,i,':');
     m[2,i]=scan(message_2,i,':');
   end;

   if put(object_1,$object_correct.)= object_2
     then message_1=catx(":",m11, m12, object_2, m14, m15);
   if put(object_2,$object_correct.)= object_1
     then message_2=catx(":",m21, m22, object_1, m24, m25);

run;

If you have a whole dataset of "objects"-"object" pairs then you can follow the below example to create a format from a dataset.

 

SAS Help Center: Creating a Format from a CNTLIN= Data Set

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2666 views
  • 1 like
  • 4 in conversation