Dear,
I have to merge two datasets based on one variable(test). The values in the variables mean same thing but the values are not exactly similar. EG; EXCL01=EX1.
Please help. Thanks
data1;
test VALUE
EXCL01 A
EXCL07A B
EXCL11 C
EXCL07 D
INCL01 E
INCL22 H
INCL03A J
INCL03B H
data2
ID TEST
1 EC1
2 EC11
3 EC7
4 IN01
5 IN03a
6 IN03b
7 IN22
OUTPUT NEED;
ID test VALUE
1 EXCL01 A
2 EXCL11 C
3 EXCL07 D
4 INCL01 E
5 INCL03A J
6 INCL03B H
7 INCL22 H
The term is fuzzy matching. But you have to have some rules. Programs basically implement rules 🙂
So how do you want to match? What's a close enough match? Does it matter how long the strings are?
Look at compged/spedis/sounds like operators for matching and distances between strings.
If you search fuzzy matching at lexjansen.com you'll find a lot of different approaches.
It looks relatively straightforward to add a third variable to DATA1, an abbreviated version of TEST that would match the values found in TEST within DATA2. Then sort and merge by that new variable.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.