## How to extract desired numericals in irregular cells?

Occasional Contributor
Posts: 8

# How to extract desired numericals in irregular cells?

Hi all,

I have an excel data file. In that file there are two columns I need to use and one column of it is irregular. As you guess I need to extract some numerical variables from that irregular contained column. For example I need to find the zipcodes which starts with "60". I have managed to write the formula of it in excel which is

"=IFERROR(IF(AND(LEN(TRIM(SUBSTITUTE(MID(K2,FIND(TEXT(\$L\$1,0),K2,1)-1,1)&MID(K2,FIND(TEXT(\$L\$1,0),K2,1)+5,1),CHAR(160),"")))=0,MID(K2,FIND(TEXT(\$L\$1,0),K2,1),2)<>RIGHT(K2,2)),\$L\$1,"X"),"X")"

*Zip codes are 5 digits.

What does this formula do? : It finds the starting position of desired 2 digit then

1- Check if the letter before that 2 zip codes is empty and
2- Check if the 3 letter after that 2 zip code is empty and
3- Check if there is any letter after 2 zip code just in case those digits are the last letter of that cell.

You can see some lines of the source excel file. If you are not able to download it I added some lines as well.

 Irregular Column Desired zip codes Starting with a-haUKxUt 60000 WawawxwK tIhwahwTw 60 60 KxIwaww 66000 yawtIh yInbaTwTw 66-60 60 wtwwwxhh 00666 WaIwxw twIawahwwTw 6 60 wUwt Vth 060006 00000 *** *** 60 awaKx 06060 IhtttwTxwT tawtttIhtaw wTw  666 60 KxIwaww 60066 yawtIh YtwnbKwTw 00 60 wUwt Vth 660666 00000 *** *** 60 wUwt Vth 600660 00000 *** *** 60 awaKx twUyta 00666 yxw atwTaw yxtawwTw 60 60 awaKx twtwwaw 06600 xUtwyUwt ytyIhtaw wTw  66 60 TatUT-hxy+tUT 00006 wUaytyxUwah/TyUawIh  txhtahwxtnbxaw tx 60 Iwwx-wxwKT 66660 ttwahyUwt yttyawtatwaw  wTw 600 60

Any help or idea is appreciated. Thanks in advance, have a nice day.

Super User
Posts: 9,599

## Re: How to extract desired numericals in irregular cells?

Hi,

You seem to be asking for advice on Excel issues, maybe post it on an Excel forum?  To do this in SAS then you could use several functions, or perl regular expressions.

Occasional Contributor
Posts: 8

## Re: How to extract desired numericals in irregular cells?

Hi,

Sorry if I caused any missunderstanding. I don't ask for excel advices. I need to use that excel source file in SAS as it is.

I have written the excel solution that i used in order to give some ideas about my questions, also I thought that some SAS codes that i don't know yet can be written by understanding the rules that i used in excel formulas.

So all i want to learn is those several functions, or perl regular expressions that you mentioned.

Super User
Posts: 10,279

## Re: How to extract desired numericals in irregular cells?

Use the scan() function to extract the "words" from the long string (do it iteratively from 1 to countw()).

Then check each "word" for a length of 5 and that it is numeric (use the notdigit() function).

Since you now know that you have 5 digits, you only need to compare the substr(string_var,1,2) to your reference value ('60').

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 6,785

## Re: How to extract desired numericals in irregular cells?

data want;

set have;

length nextword \$ 6;  /* don't need to extract any more than 6 characters */

length zip_code \$ 5;

if irregular_string > ' ' then do i=1 to countw(irregular_string, ' ');

nextword = scan(irregular_string, i, ' ');

if length(nextword)=5 and notdigit(next_word) = 6 then do;

if next_word =: '60' then zip_code = next_word;

end;

end;

drop next_word i;

run;

The logic might be a bit trickier than it looks.  Once the length of a word is established as 5, the 6th character must be a blank.  So the NOTDIGIT function must return 6 (since blanks are not digits) to identify a 5-digit number.  Also, by defining the length of zip_code as \$ 5, that trailing blank is automatically dropped from its value.  Also note that COUNTW uses a set of characters as default delimiters, so specify a blank as the only allowable delimiter.  Using =: is a faster way to examine the beginning of a character string (instead of substr).

Good luck.

Discussion stats
• 4 replies
• 365 views
• 0 likes
• 4 in conversation