The address_1, address_2 and address_3 are the raw data collected in hand and they summed up together to be the complete whole address. And now I have to like sort the whole address into the corresponding data columns like flat, floor, building, street and district. I have tried to use the if then statement but it was too troublesome to do so and took me long time to execute like the whole data set contains like thousands of items. Can array and do loop help to do the task?
Input:
ADDRESS_1 | ADDRESS_2 | ADDRESS_3 |
ROOM 11 23/F | 61-63 YYY RD | District A |
ROOM 10 1/F ABC BLDG | 33-37 ZZZ STREET | District B |
UNIT A1 G/F | QWE SQUARE | 29 XYZ RD District A |
ROOM 16 3/F WATER HOUSE | 46 XXX ROAD | District C |
Output:
floor | flat | building | street | district |
23 | 11 | 61-63 YYY RD | A | |
1 | 10 | ABC BLDG | 33-37 ZZZ STREET | B |
G | A1 | QWE SQUARE | 29 XYZ RD | A |
3 | 16 | WATER HOUSE | 46 XXX ROAD | C |
Thanks!
This seems to be quite a messy task. Some if it is pretty straight forward. However, other parts are difficult to put into programming logic.
For example, how exactly is the street defined? Or building?
See if this example gives you enough clues to get the floor, flat and building information.
If this works the district should be easy enough using the patterns here. (Hint: index looking for 'District')
This relies on the Floor information ALWAYS as the 3rd element, always has a / in the middle, room is always second item, that building is the bit after the floor information and spaces always separate them.
data example; address_1='ROOM 10 1/F ABC BLDG'; length floor flat $ 5 ; floor= scan(scan(address_1,3,' '),1,'/'); flat = scan(address_1,2,' '); building = substr(address_1, index(address_1,scan(address_1,4,' '))); run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.