Help using Base SAS procedures

How to extract desired numericals in irregular cells?

Reply
Occasional Contributor
Posts: 8

How to extract desired numericals in irregular cells?

Hi all,

I have an excel data file. In that file there are two columns I need to use and one column of it is irregular. As you guess I need to extract some numerical variables from that irregular contained column. For example I need to find the zipcodes which starts with "60". I have managed to write the formula of it in excel which is


"=IFERROR(IF(AND(LEN(TRIM(SUBSTITUTE(MID(K2,FIND(TEXT($L$1,0),K2,1)-1,1)&MID(K2,FIND(TEXT($L$1,0),K2,1)+5,1),CHAR(160),"")))=0,MID(K2,FIND(TEXT($L$1,0),K2,1),2)<>RIGHT(K2,2)),$L$1,"X"),"X")"

*Zip codes are 5 digits.

What does this formula do? : It finds the starting position of desired 2 digit then

1- Check if the letter before that 2 zip codes is empty and
2- Check if the 3 letter after that 2 zip code is empty and
3- Check if there is any letter after 2 zip code just in case those digits are the last letter of that cell.

You can see some lines of the source excel file. If you are not able to download it I added some lines as well.

Irregular ColumnDesired zip codes Starting with
a-haUKxUt 60000 WawawxwK tIhwahwTw 6060
KxIwaww 66000 yawtIh yInbaTwTw 66-6060
wtwwwxhh 00666 WaIwxw twIawahwwTw 660
wUwt Vth 060006 00000 *** ***60
awaKx 06060 IhtttwTxwT tawtttIhtaw wTw
  666
60
KxIwaww 60066 yawtIh YtwnbKwTw 0060
wUwt Vth 660666 00000 *** ***60
wUwt Vth 600660 00000 *** ***60
awaKx twUyta 00666 yxw atwTaw yxtawwTw 6060
awaKx twtwwaw 06600 xUtwyUwt ytyIhtaw wTw
  66
60
TatUT-hxy+tUT 00006 wUaytyxUwah/TyUawIh
  txhtahwxtnbxaw tx
60
Iwwx-wxwKT 66660 ttwahyUwt yttyawtatwaw
  wTw 600
60

Any help or idea is appreciated. Thanks in advance, have a nice day.

Super User
Super User
Posts: 7,942

Re: How to extract desired numericals in irregular cells?

Hi,

You seem to be asking for advice on Excel issues, maybe post it on an Excel forum?  To do this in SAS then you could use several functions, or perl regular expressions.

Occasional Contributor
Posts: 8

Re: How to extract desired numericals in irregular cells?

Hi,


Sorry if I caused any missunderstanding. I don't ask for excel advices. I need to use that excel source file in SAS as it is.

I have written the excel solution that i used in order to give some ideas about my questions, also I thought that some SAS codes that i don't know yet can be written by understanding the rules that i used in excel formulas.


So all i want to learn is those several functions, or perl regular expressions that you mentioned.

Super User
Posts: 7,771

Re: How to extract desired numericals in irregular cells?

Use the scan() function to extract the "words" from the long string (do it iteratively from 1 to countw()).

Then check each "word" for a length of 5 and that it is numeric (use the notdigit() function).

Since you now know that you have 5 digits, you only need to compare the substr(string_var,1,2) to your reference value ('60').

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 5,500

Re: How to extract desired numericals in irregular cells?

Posted in reply to KurtBremser

To add a bit of context to KurtBremser's answer:

data want;

   set have;

   length nextword $ 6;  /* don't need to extract any more than 6 characters */

   length zip_code $ 5;

   if irregular_string > ' ' then do i=1 to countw(irregular_string, ' ');

      nextword = scan(irregular_string, i, ' ');

      if length(nextword)=5 and notdigit(next_word) = 6 then do;

         if next_word =: '60' then zip_code = next_word;

      end;

   end;

   drop next_word i;

run;

The logic might be a bit trickier than it looks.  Once the length of a word is established as 5, the 6th character must be a blank.  So the NOTDIGIT function must return 6 (since blanks are not digits) to identify a 5-digit number.  Also, by defining the length of zip_code as $ 5, that trailing blank is automatically dropped from its value.  Also note that COUNTW uses a set of characters as default delimiters, so specify a blank as the only allowable delimiter.  Using =: is a faster way to examine the beginning of a character string (instead of substr).

Good luck.

Ask a Question
Discussion stats
  • 4 replies
  • 349 views
  • 0 likes
  • 4 in conversation