I want to merge a couple columns from dataset B onto dataset A, but only if A.column contains the substring B.column1. If that condition is met, I want to add B.column2 to dataset A.
I feel like I should be able to do this in proc sql but I'm not sure how to implement the conditional substring search.
Welcome to the SAS communities 🙂
Sounds like a job that can be done in several ways. Can you be a bit more specific in your requirements?
And can you provide some small example data set and what you want the result to be? Makes it much easier for us to help you 🙂
The relevant column in dataset A contains a bunch of serial numbers that are delimited with "|", so something like "12345|23434|098543".
For dataset B, column 1 contains single serial numbers (e.g. "12345") and column 2 contains dates.
If A.column contains the serial number in B.column1, I want to add B.column2 to A.
Conceptually, it's something like: if find(A.column, B.column1) ~= 0 then A left join B.column2
Post test data in the form of a datastep, see:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Tip 1, never keep multiple data items in one variable, its not a good way to store data and will make your life harder.
Tip 2, Read the documentation, to find one string in another you could use index() SAS function,
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212242.htm
So long as its not pass through SQL that will work fine.
Regarding your first tip, I would normally agree. Unfortunately, the nature of the data I'm using requires me to put multiple items in 1 variable.
For tip 2, if I can't use index or find or contains within SQL, how could I do this conditional join? I've played around using DATA and MERGE but it doesn't seem to produce the desired result.
If you need example data, I suppose this could work
data set1;
input serials:$20.;
datalines;
12345|124|5498
4398|29893|3409|324
3498|0989
;;;
data set2;
input serial:$5. stuff;
datalines;
12345 metadata1
98 metadata2
0989 metadata3
;;;;
The desired output would look like:
serials stuff
12345|124|5498 metadata1
4398|29893|3409|324
3498|0989 metadata3
What if set2 also contains row 2 like below?
data set2;
input serial:$5. stuff;
datalines;
12345 metadata1
124 metadata2
98 metadata2
0989 metadata3
;;;;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.