BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS_New_User1
Obsidian | Level 7

Hi, 

I have the below 3 tables

 

Table 1 (Main Table)

ID    message_1                          message_2

1    catcansiton:50:31123nearby        catcansiton:50:MJKXY32nearby

2    dogcaneat:60:54637fruit              dogcaneat:60:DMF67XXMfruit

3    beesand:30:BOTKXXM                beesand:30:64590

4    colorofsky:40:DMF67XXMblue     colorofsky:40:54637blue

 

Table 2 (Reference table)

Object_1      Object_2

31123             MJKXY32

54637            DMF67XXM

64590            BOTKXXM 

 

Output Table (Output Desired)

ID    message_1                                  message_2

1    catcansiton:50:31123nearby        catcansiton:50:31123nearby

2    dogcaneat:60:54637fruit              dogcaneat:60:54637fruit

3    beesand:30:64590                       beesand:30:64590

4    colorofsky:40:54637blue             colorofsky:40:54637blue

 

I have a main table which has message_1 and message_2 fields which are partially same, except for the codes in the middle which start with either of the four :30:, :40:, :50: and :60: strings before them.

 

 I have a separate reference table which I want to use to look up the the objects and replace them by other object in the main table either to message1 or message_2 and get the desired output. Can you please guide me with how to use a reference table to find the data in the main table ?

 

Thank you 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

1) I suppose that each message has 3 substrings delimited by colon ':', in such case you can 

    subtract it by:   third_str = scan(message,3,':');  

2) Does data have messages that do not contain the ':20:', ':30:' ,':40:', ':60:' codes.

    if positive you need skip those messages.

3) I do not understand the rules for replacing the substrings ,

    Next code joines the two tables up to point for replacing substrings:

data main;  /* MAIN table */
   length message_1 message_2 $40; /* adapt length to data */
   infile cards truncover;
   input ID message_1 $ message_2 $;
cards;
1    catcansiton:50:31123nearby    catcansiton:50:MJKXY32nearby
2    dogcaneat:60:54637fruit       dogcaneat:60:DMF67XXMfruit
3    beesand:30:BOTKXXM            beesand:30:64590
4    colorofsky:40:DMF67XXMblue    colorofsky:40:54637blue
; run;

data refer;  /* Reference table */
   length object_1 object_2 $20; /* adapt length to expected data */
   infile cards truncover;
   input object_1 $ object_2 $;
cards;
31123   MJKXY32
54637   DMF67XXM
64590   BOTKXXM 
; run;

data temp1;  /* Extract substring to check vs refernece */
 set main;
   length submsg1 submsg2 $20;
   submsg1 = scan(message_1,3,':');
   submsg2 = scan(message_2,3,':');
run;

proc sql;
   create table temp2 as
   select a.* , b.*
   from temp1 as a 
   left join refer as b 
   on index(a.message_1, submsg1) > 0 or
      index(a.message_1, submsg2) > 0 or
	  index(a.message_2, submsg1) > 0 or
	  index(a.message_2, submsg2) > 0 ;
quit;

  next step need to complete:

data want;
 set temp2;
     if index(a.submsg1, object_1) > 0 then do;
         /* replace substring and output */
     end;
     if index(a.submsg1, object_2) > 0 then do;
         /* replace substring and output */
     end;
     if index(a.submsg2, object_1) > 0 then do;
         /* replace substring and output */
     end;
     if index(a.submsg2, object_2) > 0 then do;
         /* replace substring and output */
     end;
      
run;

 

 

 

 

View solution in original post

9 REPLIES 9
PeterClemmensen
Tourmaline | Level 20

So you know what preceeds the strings of interest? Do you also know what comes after?

PhilC
Rhodochrosite | Level 12

Good questions.  Also are the object strings always capital and are the "words" after the object strings always lowercase, containing no numbers?

SAS_New_User1
Obsidian | Level 7

After the desired string could be any characters, there are all small or caps and alphanumeric. but the :30:, :40:, :50: and :60: are fixed preset codes to my desired reference codes in the main table. 

PhilC
Rhodochrosite | Level 12

It would have been much nicer to have the delimiter character after the object string.  That's a messy problem.

  • I would refer you to PRXMATCH.  Regular expressions save you the need to write lots of SAS code.  Your regular expression might be "\:/d/d:(object)\", replacing (object) for each know object string.
  • If you don't you can use FIND.  More complicated find ":" then verify the next two characters a numbers and the third is ":" again.  then then string after matches one of your object strings.  Much more code in this case.

 

SAS Help Center: PRXMATCH Function

SAS Help Center: FIND Function

Shmuel
Garnet | Level 18

1) I suppose that each message has 3 substrings delimited by colon ':', in such case you can 

    subtract it by:   third_str = scan(message,3,':');  

2) Does data have messages that do not contain the ':20:', ':30:' ,':40:', ':60:' codes.

    if positive you need skip those messages.

3) I do not understand the rules for replacing the substrings ,

    Next code joines the two tables up to point for replacing substrings:

data main;  /* MAIN table */
   length message_1 message_2 $40; /* adapt length to data */
   infile cards truncover;
   input ID message_1 $ message_2 $;
cards;
1    catcansiton:50:31123nearby    catcansiton:50:MJKXY32nearby
2    dogcaneat:60:54637fruit       dogcaneat:60:DMF67XXMfruit
3    beesand:30:BOTKXXM            beesand:30:64590
4    colorofsky:40:DMF67XXMblue    colorofsky:40:54637blue
; run;

data refer;  /* Reference table */
   length object_1 object_2 $20; /* adapt length to expected data */
   infile cards truncover;
   input object_1 $ object_2 $;
cards;
31123   MJKXY32
54637   DMF67XXM
64590   BOTKXXM 
; run;

data temp1;  /* Extract substring to check vs refernece */
 set main;
   length submsg1 submsg2 $20;
   submsg1 = scan(message_1,3,':');
   submsg2 = scan(message_2,3,':');
run;

proc sql;
   create table temp2 as
   select a.* , b.*
   from temp1 as a 
   left join refer as b 
   on index(a.message_1, submsg1) > 0 or
      index(a.message_1, submsg2) > 0 or
	  index(a.message_2, submsg1) > 0 or
	  index(a.message_2, submsg2) > 0 ;
quit;

  next step need to complete:

data want;
 set temp2;
     if index(a.submsg1, object_1) > 0 then do;
         /* replace substring and output */
     end;
     if index(a.submsg1, object_2) > 0 then do;
         /* replace substring and output */
     end;
     if index(a.submsg2, object_1) > 0 then do;
         /* replace substring and output */
     end;
     if index(a.submsg2, object_2) > 0 then do;
         /* replace substring and output */
     end;
      
run;

 

 

 

 

SAS_New_User1
Obsidian | Level 7

Thank you, the answer to the questions.

1. Each message does not have fixed number of substrings that can be repeated by : , hence the only way to determine the position of the string that needs to be replaced is between those fixed 4 code i.e. :30:, :40:, :50:, :60:

2. All the messages have either of these codes which come before and after the string that needs to be replaced. 

e.g.  message_1                                          message 2                               

        abcs:30:34629:60:hdjckhd                    abcs:30:BTKXX347:60:hdjckhd          

so the string in bold is the reference table string, so I have to convert the string in Message_2 using reference table to make it look like message_1 for comparison.      

 

 

PhilC
Rhodochrosite | Level 12

This problem became much easier.

Shmuel
Garnet | Level 18

@SAS_New_User1 wrote:

Thank you, the answer to the questions.

1. Each message does not have fixed number of substrings that can be repeated by : , hence the only way to determine the position of the string that needs to be replaced is between those fixed 4 code i.e. :30:, :40:, :50:, :60:

2. All the messages have either of these codes which come before and after the string that needs to be replaced. 

e.g.  message_1                                          message 2                               

        abcs:30:34629:60:hdjckhd                    abcs:30:BTKXX347:60:hdjckhd          

so the string in bold is the reference table string, so I have to convert the string in Message_2 using reference table to make it look like message_1 for comparison.      

 

 


1) In your 1st post the message contains only one code not two.

2) You can replace codes to one character to be used as a delimiter - let's say '09'x (a tab)

     by using tranw() function for each of the 4 codes.

     That enables use scan(message,2,'09'x) to extract the mid substring.

PhilC
Rhodochrosite | Level 12

@Shmuel has a good solution.  I was following and I wrote this code.  If it adds to the discussion.

data have;  /* MAIN table */
   length ID 8 message_1 message_2 $40; /* adapt length to data */
   infile cards truncover;
   input ID message_1 $ message_2 $;
cards;
1    catcansiton:50:31123:60:nearby    catcansiton:50:MJKXY32:40:nearby
2    dogcaneat:60:54637:30:fruit       dogcaneat:60:DMF67XXM:40:fruit
3    beesand:30:BOTKXXM:60:            beesand:30:64590:30:
4    colorofsky:40:DMF67XXM:30:blue    colorofsky:40:54637:30:blue
; run;

proc format ;
  value $object_correct
    "MJKXY32"  = "31123"
    "DMF67XXM" = "54637"
    "BOTKXXM"  = "64590"
    other = " "
    ;
run;

data want (Keep= ID message_1	message_2)
;  /* Extract substring to check vs refernece */
/*  if 0 then set refer;*/
  set main;
   array m[2,5] $ m11 m12 object_1 m14 m15 
                  m21 m22 object_2 m24 m25 ;
   
   do i=1 to 5; drop i;
     m[1,i]=scan(message_1,i,':');
     m[2,i]=scan(message_2,i,':');
   end;

   if put(object_1,$object_correct.)= object_2
     then message_1=catx(":",m11, m12, object_2, m14, m15);
   if put(object_2,$object_correct.)= object_1
     then message_2=catx(":",m21, m22, object_1, m24, m25);

run;

If you have a whole dataset of "objects"-"object" pairs then you can follow the below example to create a format from a dataset.

 

SAS Help Center: Creating a Format from a CNTLIN= Data Set

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2787 views
  • 1 like
  • 4 in conversation