Hi! Sorry if this has already been posted but cannot seem to find a previous post on this. I have a variable in a large data set (~50,000 obs so too many to manually correct) that can take on 1 of 3 possible structures: 1. name (example: Hartsfield airport) 2. xxxx - name (example: 12345 - Hartsfield airport) 3. name - xxxx (example: Hartsfield airport - 12345) Here is some code to create a sample in SAS. data trial;
input have $;
datalines;
"12345 - hartsfield airport"
"hartsfield airport - 12345"
"hartsfield airport"
;
The strings are of many differing lengths and the numbers can be any combo and length so my sample may be oversimplified. I am wanting just the name out of the variable's string. There is no rhyme or reason for there being numbers and a dash before or after the name so I cannot subset the datasets based on another variable and use a simple strip(scan(have, 1, "-")) or strip(scan(have, -1, "-")) to extract just the name. I really wish it was that simple. Is there a way to modify the strip and scan functions or another function I should use to extract name? I have thought about using something like the FIND, INDEX, or ANYDIGIT function to do a subset and then apply the strip/scan functions but because the numbers can take on any value and the strings have varying lengths, I don't think those will work. Any advice is appreciated. Thanks!
... View more