Hi, I'd appreciate your help regarding following:
My data
Personid | CharCode |
1111 | 88888886666 |
2222 | 92000000001 |
333 | 92000000002 |
Mylist: ('8888888', '92000000001') -> this is a very long list in actual dataset consisting of character variable
If Charcode (character variable) in my data contains anything in Mylist (even partially, not exact match), then I want to code a new variable Myfind, where Myfind=1. If Charcode does not match anything in mylist, then Myfind will be 0.
Desired Output
Personid | CharCode | Myfind |
1111 | 88888886666 | 1 |
2222 | 92000000001 | 1 |
333 | 92000000002 | 0 |
What I've tried:
data mydata;
ATTRIB Myfind Length=3 Label='Yes/no indicator';
set pharmdat2;
If index ((CharCode), '8888888')>0 Or If index ((CharCode), '92000000001')>0
THEN Myfind=1;
ELSE Myfind=0;
run;
This works, but very tedious. The fourth line is where I need to loop through without me going in and specifying every item in Mylist. Is there a way to modify just this part?
I've also tried:
%Let Mylist= ('8888888', '92000000001')
data mydata;
ATTRIB Myfind Length=3 Label='Yes/no indicator';
set pharmdat2;
If CharCode in: &Mylist
THEN Myfind=1;
ELSE Myfind=0;
run;
But this code returns Myfind=0 for all rows, although there is a Charcode to Mylist match that I know exists in this dataset dataset.
I also couldn't use like statement combined with '%Mylistitem%' because this is not a where statement.
hi @shyunk
This could be bone using several techniques besides the tedious writing of all check values.
Using a loop as you suggest is complicated. All check values should be read into the data step as a text array and the the actual values in each observation should be matched against all array values.
Using a hash object would probably possible also, but it never got into my personal toolbox, so I will leave that to one of the hash gurus in this forum.
My preferred solution in cases like this is a SQL join. A SAS ifn-function is used to provide the value of MyFind based on a match or not. In this example the match is made on the shortest string from start, but the EQT operator could be switched to a CONTAINS to cover partial matched within the string as well.
data mydata;
PersonID = 1111; CharCode = '88888886666'; output;
PersonID = 1122; CharCode = '88888886789'; output;
PersonID = 1133; CharCode = '88888856780'; output;
PersonID = 2222; CharCode = '92000000001'; output;
PersonID = 333; CharCode = '92000000002'; output;
run;
data mylist;
length CheckCode $20;
CheckCode = '8888888'; output;
CheckCode = '92000000001'; output;
run;
proc sql;
create table want as
select distinct
a.PersonID,
a.CharCode,
ifn(b.CheckCode ne '',1,0) as MyFind
from mydata as a
left join mylist as b
on a.CharCode eqt b.CheckCode;
quit;
Result:
hi @shyunk
This could be bone using several techniques besides the tedious writing of all check values.
Using a loop as you suggest is complicated. All check values should be read into the data step as a text array and the the actual values in each observation should be matched against all array values.
Using a hash object would probably possible also, but it never got into my personal toolbox, so I will leave that to one of the hash gurus in this forum.
My preferred solution in cases like this is a SQL join. A SAS ifn-function is used to provide the value of MyFind based on a match or not. In this example the match is made on the shortest string from start, but the EQT operator could be switched to a CONTAINS to cover partial matched within the string as well.
data mydata;
PersonID = 1111; CharCode = '88888886666'; output;
PersonID = 1122; CharCode = '88888886789'; output;
PersonID = 1133; CharCode = '88888856780'; output;
PersonID = 2222; CharCode = '92000000001'; output;
PersonID = 333; CharCode = '92000000002'; output;
run;
data mylist;
length CheckCode $20;
CheckCode = '8888888'; output;
CheckCode = '92000000001'; output;
run;
proc sql;
create table want as
select distinct
a.PersonID,
a.CharCode,
ifn(b.CheckCode ne '',1,0) as MyFind
from mydata as a
left join mylist as b
on a.CharCode eqt b.CheckCode;
quit;
Result:
That worked great, Thank you!
Hi @ErikLund_Jensen.
Thank you again for your help. I realized the code isn't doing exactly what I intended, when I ran into this issue. Do you have any thoughts on why this may be happening?
data mydata;
PersonID = 11; CharCode = '68000000000'; output;
PersonID = 22; CharCode = '70710120901'; output;
PersonID = 33; CharCode = '69238131906'; output;
PersonID = 44; CharCode = '31722065832'; output;
PersonID = 55; CharCode = '43598043060'; output;
run;
data mylist;
INPUT CheckCode $;
length CheckCode $20;
Datalines;
68462030868
70710120407
69238131906
31722065832
43598043060
;;
run;
proc sql;
create table want as
select distinct
a.PersonID,
a.CharCode,
ifn(b.CheckCode ne '',1,0) as MyFind
from mydata as a
left join mylist as b
on a.CharCode contains b.CheckCode;
quit;
This returns below, where CharCode is trimmed to 8 letters, and it cuts out the last three digits of everything. Therefore, although PersonID 22, 33 in mydata is not a match because of its last three digits, it has 1 for Myfind. I checked the length of Charcode, and it's set to 30 now. This is the same for my real (longer) dataset and my mock dataset. I would appreciate some guidance why SAS (or SQL) is automatically trimming it to 8 letters here. I want to have full 11 letters.
PersonID | CharCode | Myfind |
11 | 68000000 | 0 |
22 | 70710120 | 1 |
33 | 69238131 | 1 |
44 | 31722065 | 1 |
55 | 43598043 | 1 |
Output Table 1
When I use eqt or = instead of contains, using my real dataset, the output is 0 for all myfind where it is unable to find any match (Output table2). Interestingly, when I use my mock dataset I get the same 8 letter truncated table as Output table 1
on a.CharCode eqt b.CheckCode;
on a.CharCode = b.CheckCode;
PersonID | CharCode | Myfind |
11 | 68000000 | 0 |
22 | 70710120 | 0 |
33 | 69238131 | 0 |
44 | 31722065 | 0 |
55 | 43598043 | 0 |
Output Table 2
Your help is much appreciated!
Hi @shyunk
I made two small changes. The first is to get the full check value by using a $20 informat. The other reverses the comparison so checkcode contains charcodee instead of the reverse. I Hope that helps.
data mydata;
PersonID = 11; CharCode = '68000000000'; output;
PersonID = 22; CharCode = '70710120901'; output;
PersonID = 33; CharCode = '69238131906'; output;
PersonID = 44; CharCode = '31722065832'; output;
PersonID = 55; CharCode = '43598043060'; output;
run;
data mylist;
INPUT CheckCode $20.;
/*length CheckCode $20;*/
Datalines;
68462030868
70710120407
69238131906
31722065832
43598043060
;;
run;
proc sql;
create table want as
select distinct
a.PersonID,
a.CharCode,
ifn(b.CheckCode ne '',1,0) as MyFind
from mydata as a
left join mylist as b
/*on a.CharCode contains b.CheckCode;*/
on b.CheckCode contains a.CharCode;
quit;
@ErikLund_Jensen Thank you, I realized I was setting the informat incorrectly for mydata (it was too long), which was why SAS was trying to truncate it. Now it's doing what I intended it to do. I learned a lot from your answers and cannot thank you enough!
You have these two statements backwards:
INPUT CheckCode $;
length CheckCode $20;
By referencing CHECKCODE in the INPUT statement before you have defined it SAS will have to use the default length of $8 for the variable.
Also these lines should NOT be indented. It will just confuse you and make it harder for you to notice how long the values actual are to intend those lines.
datalines;
68462030868
70710120407
69238131906
31722065832
43598043060
;
The extra RUN: statement between the end of the data step and the start of the PROC SQL is also not useful, but it does not really do much harm other than it might confuse some programmers into thinking they can include statements for the data step in between the end of the in-line data and that superfluous RUN statement.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.