BookmarkSubscribeRSS Feed
agodba
Calcite | Level 5

Hi all,

I have an excel data file. In that file there are two columns I need to use and one column of it is irregular. As you guess I need to extract some numerical variables from that irregular contained column. For example I need to find the zipcodes which starts with "60". I have managed to write the formula of it in excel which is


"=IFERROR(IF(AND(LEN(TRIM(SUBSTITUTE(MID(K2,FIND(TEXT($L$1,0),K2,1)-1,1)&MID(K2,FIND(TEXT($L$1,0),K2,1)+5,1),CHAR(160),"")))=0,MID(K2,FIND(TEXT($L$1,0),K2,1),2)<>RIGHT(K2,2)),$L$1,"X"),"X")"

*Zip codes are 5 digits.

What does this formula do? : It finds the starting position of desired 2 digit then

1- Check if the letter before that 2 zip codes is empty and
2- Check if the 3 letter after that 2 zip code is empty and
3- Check if there is any letter after 2 zip code just in case those digits are the last letter of that cell.

You can see some lines of the source excel file. If you are not able to download it I added some lines as well.

Irregular ColumnDesired zip codes Starting with
a-haUKxUt 60000 WawawxwK tIhwahwTw 6060
KxIwaww 66000 yawtIh yInbaTwTw 66-6060
wtwwwxhh 00666 WaIwxw twIawahwwTw 660
wUwt Vth 060006 00000 *** ***60
awaKx 06060 IhtttwTxwT tawtttIhtaw wTw
  666
60
KxIwaww 60066 yawtIh YtwnbKwTw 0060
wUwt Vth 660666 00000 *** ***60
wUwt Vth 600660 00000 *** ***60
awaKx twUyta 00666 yxw atwTaw yxtawwTw 6060
awaKx twtwwaw 06600 xUtwyUwt ytyIhtaw wTw
  66
60
TatUT-hxy+tUT 00006 wUaytyxUwah/TyUawIh
  txhtahwxtnbxaw tx
60
Iwwx-wxwKT 66660 ttwahyUwt yttyawtatwaw
  wTw 600
60

Any help or idea is appreciated. Thanks in advance, have a nice day.

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

You seem to be asking for advice on Excel issues, maybe post it on an Excel forum?  To do this in SAS then you could use several functions, or perl regular expressions.

agodba
Calcite | Level 5

Hi,


Sorry if I caused any missunderstanding. I don't ask for excel advices. I need to use that excel source file in SAS as it is.

I have written the excel solution that i used in order to give some ideas about my questions, also I thought that some SAS codes that i don't know yet can be written by understanding the rules that i used in excel formulas.


So all i want to learn is those several functions, or perl regular expressions that you mentioned.

Kurt_Bremser
Super User

Use the scan() function to extract the "words" from the long string (do it iteratively from 1 to countw()).

Then check each "word" for a length of 5 and that it is numeric (use the notdigit() function).

Since you now know that you have 5 digits, you only need to compare the substr(string_var,1,2) to your reference value ('60').

Astounding
PROC Star

To add a bit of context to KurtBremser's answer:

data want;

   set have;

   length nextword $ 6;  /* don't need to extract any more than 6 characters */

   length zip_code $ 5;

   if irregular_string > ' ' then do i=1 to countw(irregular_string, ' ');

      nextword = scan(irregular_string, i, ' ');

      if length(nextword)=5 and notdigit(next_word) = 6 then do;

         if next_word =: '60' then zip_code = next_word;

      end;

   end;

   drop next_word i;

run;

The logic might be a bit trickier than it looks.  Once the length of a word is established as 5, the 6th character must be a blank.  So the NOTDIGIT function must return 6 (since blanks are not digits) to identify a 5-digit number.  Also, by defining the length of zip_code as $ 5, that trailing blank is automatically dropped from its value.  Also note that COUNTW uses a set of characters as default delimiters, so specify a blank as the only allowable delimiter.  Using =: is a faster way to examine the beginning of a character string (instead of substr).

Good luck.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 937 views
  • 0 likes
  • 4 in conversation