Hi,
I have the below 3 tables
Table 1 (Main Table)
ID message_1 message_2
1 catcansiton:50:31123nearby catcansiton:50:MJKXY32nearby
2 dogcaneat:60:54637fruit dogcaneat:60:DMF67XXMfruit
3 beesand:30:BOTKXXM beesand:30:64590
4 colorofsky:40:DMF67XXMblue colorofsky:40:54637blue
Table 2 (Reference table)
Object_1 Object_2
31123 MJKXY32
54637 DMF67XXM
64590 BOTKXXM
Output Table (Output Desired)
ID message_1 message_2
1 catcansiton:50:31123nearby catcansiton:50:31123nearby
2 dogcaneat:60:54637fruit dogcaneat:60:54637fruit
3 beesand:30:64590 beesand:30:64590
4 colorofsky:40:54637blue colorofsky:40:54637blue
I have a main table which has message_1 and message_2 fields which are partially same, except for the codes in the middle which start with either of the four :30:, :40:, :50: and :60: strings before them.
I have a separate reference table which I want to use to look up the the objects and replace them by other object in the main table either to message1 or message_2 and get the desired output. Can you please guide me with how to use a reference table to find the data in the main table ?
Thank you
1) I suppose that each message has 3 substrings delimited by colon ':', in such case you can
subtract it by: third_str = scan(message,3,':');
2) Does data have messages that do not contain the ':20:', ':30:' ,':40:', ':60:' codes.
if positive you need skip those messages.
3) I do not understand the rules for replacing the substrings ,
Next code joines the two tables up to point for replacing substrings:
data main; /* MAIN table */
length message_1 message_2 $40; /* adapt length to data */
infile cards truncover;
input ID message_1 $ message_2 $;
cards;
1 catcansiton:50:31123nearby catcansiton:50:MJKXY32nearby
2 dogcaneat:60:54637fruit dogcaneat:60:DMF67XXMfruit
3 beesand:30:BOTKXXM beesand:30:64590
4 colorofsky:40:DMF67XXMblue colorofsky:40:54637blue
; run;
data refer; /* Reference table */
length object_1 object_2 $20; /* adapt length to expected data */
infile cards truncover;
input object_1 $ object_2 $;
cards;
31123 MJKXY32
54637 DMF67XXM
64590 BOTKXXM
; run;
data temp1; /* Extract substring to check vs refernece */
set main;
length submsg1 submsg2 $20;
submsg1 = scan(message_1,3,':');
submsg2 = scan(message_2,3,':');
run;
proc sql;
create table temp2 as
select a.* , b.*
from temp1 as a
left join refer as b
on index(a.message_1, submsg1) > 0 or
index(a.message_1, submsg2) > 0 or
index(a.message_2, submsg1) > 0 or
index(a.message_2, submsg2) > 0 ;
quit;
next step need to complete:
data want;
set temp2;
if index(a.submsg1, object_1) > 0 then do;
/* replace substring and output */
end;
if index(a.submsg1, object_2) > 0 then do;
/* replace substring and output */
end;
if index(a.submsg2, object_1) > 0 then do;
/* replace substring and output */
end;
if index(a.submsg2, object_2) > 0 then do;
/* replace substring and output */
end;
run;
So you know what preceeds the strings of interest? Do you also know what comes after?
Good questions. Also are the object strings always capital and are the "words" after the object strings always lowercase, containing no numbers?
After the desired string could be any characters, there are all small or caps and alphanumeric. but the :30:, :40:, :50: and :60: are fixed preset codes to my desired reference codes in the main table.
It would have been much nicer to have the delimiter character after the object string. That's a messy problem.
1) I suppose that each message has 3 substrings delimited by colon ':', in such case you can
subtract it by: third_str = scan(message,3,':');
2) Does data have messages that do not contain the ':20:', ':30:' ,':40:', ':60:' codes.
if positive you need skip those messages.
3) I do not understand the rules for replacing the substrings ,
Next code joines the two tables up to point for replacing substrings:
data main; /* MAIN table */
length message_1 message_2 $40; /* adapt length to data */
infile cards truncover;
input ID message_1 $ message_2 $;
cards;
1 catcansiton:50:31123nearby catcansiton:50:MJKXY32nearby
2 dogcaneat:60:54637fruit dogcaneat:60:DMF67XXMfruit
3 beesand:30:BOTKXXM beesand:30:64590
4 colorofsky:40:DMF67XXMblue colorofsky:40:54637blue
; run;
data refer; /* Reference table */
length object_1 object_2 $20; /* adapt length to expected data */
infile cards truncover;
input object_1 $ object_2 $;
cards;
31123 MJKXY32
54637 DMF67XXM
64590 BOTKXXM
; run;
data temp1; /* Extract substring to check vs refernece */
set main;
length submsg1 submsg2 $20;
submsg1 = scan(message_1,3,':');
submsg2 = scan(message_2,3,':');
run;
proc sql;
create table temp2 as
select a.* , b.*
from temp1 as a
left join refer as b
on index(a.message_1, submsg1) > 0 or
index(a.message_1, submsg2) > 0 or
index(a.message_2, submsg1) > 0 or
index(a.message_2, submsg2) > 0 ;
quit;
next step need to complete:
data want;
set temp2;
if index(a.submsg1, object_1) > 0 then do;
/* replace substring and output */
end;
if index(a.submsg1, object_2) > 0 then do;
/* replace substring and output */
end;
if index(a.submsg2, object_1) > 0 then do;
/* replace substring and output */
end;
if index(a.submsg2, object_2) > 0 then do;
/* replace substring and output */
end;
run;
Thank you, the answer to the questions.
1. Each message does not have fixed number of substrings that can be repeated by : , hence the only way to determine the position of the string that needs to be replaced is between those fixed 4 code i.e. :30:, :40:, :50:, :60:
2. All the messages have either of these codes which come before and after the string that needs to be replaced.
e.g. message_1 message 2
abcs:30:34629:60:hdjckhd abcs:30:BTKXX347:60:hdjckhd
so the string in bold is the reference table string, so I have to convert the string in Message_2 using reference table to make it look like message_1 for comparison.
This problem became much easier.
@SAS_New_User1 wrote:
Thank you, the answer to the questions.
1. Each message does not have fixed number of substrings that can be repeated by : , hence the only way to determine the position of the string that needs to be replaced is between those fixed 4 code i.e. :30:, :40:, :50:, :60:
2. All the messages have either of these codes which come before and after the string that needs to be replaced.
e.g. message_1 message 2
abcs:30:34629:60:hdjckhd abcs:30:BTKXX347:60:hdjckhd
so the string in bold is the reference table string, so I have to convert the string in Message_2 using reference table to make it look like message_1 for comparison.
1) In your 1st post the message contains only one code not two.
2) You can replace codes to one character to be used as a delimiter - let's say '09'x (a tab)
by using tranw() function for each of the 4 codes.
That enables use scan(message,2,'09'x) to extract the mid substring.
@Shmuel has a good solution. I was following and I wrote this code. If it adds to the discussion.
data have; /* MAIN table */
length ID 8 message_1 message_2 $40; /* adapt length to data */
infile cards truncover;
input ID message_1 $ message_2 $;
cards;
1 catcansiton:50:31123:60:nearby catcansiton:50:MJKXY32:40:nearby
2 dogcaneat:60:54637:30:fruit dogcaneat:60:DMF67XXM:40:fruit
3 beesand:30:BOTKXXM:60: beesand:30:64590:30:
4 colorofsky:40:DMF67XXM:30:blue colorofsky:40:54637:30:blue
; run;
proc format ;
value $object_correct
"MJKXY32" = "31123"
"DMF67XXM" = "54637"
"BOTKXXM" = "64590"
other = " "
;
run;
data want (Keep= ID message_1 message_2)
; /* Extract substring to check vs refernece */
/* if 0 then set refer;*/
set main;
array m[2,5] $ m11 m12 object_1 m14 m15
m21 m22 object_2 m24 m25 ;
do i=1 to 5; drop i;
m[1,i]=scan(message_1,i,':');
m[2,i]=scan(message_2,i,':');
end;
if put(object_1,$object_correct.)= object_2
then message_1=catx(":",m11, m12, object_2, m14, m15);
if put(object_2,$object_correct.)= object_1
then message_2=catx(":",m21, m22, object_1, m24, m25);
run;
If you have a whole dataset of "objects"-"object" pairs then you can follow the below example to create a format from a dataset.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.