BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jeremy4
Quartz | Level 8

Hi,

 

I have the following example, which is only a partial extract of all of the observations for the variable "address":

 

Dataset:

Two variables are in the project.address dataset - account_number and Address. Note: the example below just contains a partial extract of observations in the Address variable.

 

                                                                     Address

FULLNAMEMackenzie M. ThompsonADDRESS60 Boroughbridge RoadTOWNBIRKHALLPOSTCODEAB35 3LW

FULLNAMEEvelyn F. HillADDRESS18 St James BoulevardTOWNHORSEBROOKPOSTCODEST19 5EQ

FULLNAMEElise R. HayADDRESS15 Sloe LaneTOWNCROSTHWAITEPOSTCODELA8 6TP

FULLNAMEAntonia M. HendersonADDRESS3 Exning RoadTOWNHARLEYHOLMPOSTCODEML12 2JZ

 

Can someone please help me extract the house number and post code, using substring/find/scan/index for the "address" variable, ie. house_number = substr(address, , ) where:

 

House number

Are the numbers from the third word, split by the space delimiter (this can be either 1 or 2 numbers).

 

Would the best approach be to find the number(s) coming after 'ADDRESS' in the third word based on the space delimiter?

 

Post code

Last three or four characters and numbers from the end of the second last word, plus (using the catx function?) the last three characters and numbers from the last word.

 

Would the best approach be to find what comes after 'POSTCODE' and use substr to find a length of 8 (as this would include 1 or 2 numbers for the start of the post code)?

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

No problem 🙂

 

housenum from the inside out:

 

scan(address, 3😞 The third word in the string

compress(scan(address, 3), '', 'kd'😞 keep only digits ('kd') from the third word of the string.

Input(..., 8.): Convert the above found string to a numeric value.

 

p: index(address,"POSTCODE") finds the position in the string where the substring POSTCODE first appears.

 

postcode: substr(address, p+9, length(address)-p) extracts a substring from address starting from p+9 (9 because POSTCODE has 8 characters and we want the next character from there). From that character we read forward length(address)-p

View solution in original post

17 REPLIES 17
Jagadishkatam
Amethyst | Level 16

please try the perl regular expressions as below

 

data want;
input text&:$500.;
housenumber=prxchange('s/(.*address)(\d{1,2})(.*)/$2/i',-1,strip(text));
postcode=prxchange('s/(.*postcode)(.*)/$2/i',-1,strip(text));
cards;
FULLNAMEMackenzie M. ThompsonADDRESS60 Boroughbridge RoadTOWNBIRKHALLPOSTCODEAB35 3LW
FULLNAMEEvelyn F. HillADDRESS18 St James BoulevardTOWNHORSEBROOKPOSTCODEST19 5EQ
FULLNAMEElise R. HayADDRESS15 Sloe LaneTOWNCROSTHWAITEPOSTCODELA8 6TP
FULLNAMEAntonia M. HendersonADDRESS3 Exning RoadTOWNHARLEYHOLMPOSTCODEML12 2JZ
;
Thanks,
Jag
PeterClemmensen
Tourmaline | Level 20

Assuming your data is representable and you insist on not using Regular Expressions

 

data have;
input string $200.;
datalines;
FULLNAMEMackenzie M. ThompsonADDRESS60 Boroughbridge RoadTOWNBIRKHALLPOSTCODEAB35 3LW
FULLNAMEEvelyn F. HillADDRESS18 St James BoulevardTOWNHORSEBROOKPOSTCODEST19 5EQ
FULLNAMEElise R. HayADDRESS15 Sloe LaneTOWNCROSTHWAITEPOSTCODELA8 6TP
FULLNAMEAntonia M. HendersonADDRESS3 Exning RoadTOWNHARLEYHOLMPOSTCODEML12 2JZ
;

data want(drop=p);
    set have;
    housenum=input(compress(scan(string, 3), '', 'kd'), 8.);
    p=index(string,"POSTCODE");
    postcode=substr(string, p+9, length(string)-p);
run;
jeremy4
Quartz | Level 8
Thanks for your reply, how would the code change if datalines were not used (there are thousands of observations in my dataset), as my example was just a partial extract of observations from the "address" variable, so could it be changed to something along the lines of house_number = substr(address, , )? Thanks!
Jagadishkatam
Amethyst | Level 16

Considering that the data is in a dataset then you can try set statement as below

 

data want;
set mydata;
text2=tranwrd(tranwrd(lowcase(text),'address','|'),'postcode','#');
housenumber=scan(scan(lowcase(strip(text2)),2,'|'),1,' ');
postcode=upcase(scan(lowcase(strip(text2)),2,'#'));
run;
Thanks,
Jag
jeremy4
Quartz | Level 8

Thanks for your reply, I've just run your code and it creates two new variables, address and postcode. However, there are just blank observations for the corresponding original examples - is there anything I could change in the code (my dataset just contains an "Account_number" and "address" variable (partial extract was shown in my original post))?

Jagadishkatam
Amethyst | Level 16

Could you please post the code you attempted, you need to replace the text with the variable which has address info per your data.

Thanks,
Jag
jeremy4
Quartz | Level 8

The variable containing all of the information is called variable.

 

data project;
set project.address;
text2=tranwrd(tranwrd(lowcase(text),'ADDRESS','|'),'POSTCODE','#');
housenumber=scan(scan(lowcase(strip(text2)),2,'|'),1,' ');
postcode=upcase(scan(lowcase(strip(text2)),2,'#'));
run;

PeterClemmensen
Tourmaline | Level 20

So what variable contains the string you post above? 

jeremy4
Quartz | Level 8
It's called Address - I've updated my original post to include extra information.
PeterClemmensen
Tourmaline | Level 20

Simply replace 'string' with 'address'

 

data have;
input address $200.;
datalines;
FULLNAMEMackenzie M. ThompsonADDRESS60 Boroughbridge RoadTOWNBIRKHALLPOSTCODEAB35 3LW
FULLNAMEEvelyn F. HillADDRESS18 St James BoulevardTOWNHORSEBROOKPOSTCODEST19 5EQ
FULLNAMEElise R. HayADDRESS15 Sloe LaneTOWNCROSTHWAITEPOSTCODELA8 6TP
FULLNAMEAntonia M. HendersonADDRESS3 Exning RoadTOWNHARLEYHOLMPOSTCODEML12 2JZ
;

data want(drop=p);
    set have;
    housenum=input(compress(scan(address, 3), '', 'kd'), 8.);
    p=index(address,"POSTCODE");
    postcode=substr(address, p+9, length(address)-p);
run;
jeremy4
Quartz | Level 8
Thanks for your reply - my example was just a partial extract of the variable, Address, which contains thousands of observations and means that using datalines may not be the best method for me. Is there a method that looks at an observation from the Address variable, instead of using datalines? Thanks!
PeterClemmensen
Tourmaline | Level 20

I do not suggest that you use datalines in your actual case. Simply use my code as a template and do:

 

- insert your actual data set name instead of have in the set statement

- Insert the appropriate variable name (if your actual variable is named Address, then you're good.

jeremy4
Quartz | Level 8

Thanks for your code, I've excluded your first part (with the datalines) and it works. Do you mean explaining what is happening for each line of code i.e. housenum (in particular what '' is doing from input(compress(scan(address, 3), '', 'kd'), 8.);), p (what is the purpose of the variable p, as it gives an output of 70 and 65 for the first two observations, for example) and postcode are doing in plain English, so I can understand what processes are happening? Thanks!

PeterClemmensen
Tourmaline | Level 20

No problem 🙂

 

housenum from the inside out:

 

scan(address, 3😞 The third word in the string

compress(scan(address, 3), '', 'kd'😞 keep only digits ('kd') from the third word of the string.

Input(..., 8.): Convert the above found string to a numeric value.

 

p: index(address,"POSTCODE") finds the position in the string where the substring POSTCODE first appears.

 

postcode: substr(address, p+9, length(address)-p) extracts a substring from address starting from p+9 (9 because POSTCODE has 8 characters and we want the next character from there). From that character we read forward length(address)-p

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 2914 views
  • 0 likes
  • 3 in conversation