Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR3)

Reply
Occasional Contributor
Posts: 5

Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR3)

Hi All,

 

I have two Character Fields (ADDR1,ADDR2) which i am reading from a PIPE DELIMITED file (using Infile) maximum length for these can be $60.

 

I need to load these two fields into one Table which has columns as below:

ADDRESS_LN1    (Can take upto 30 characters)

ADDRESS_LN2    (Can take upto 30 characters)

ADDRESS_LN3    (Can take upto 30 characters)

 

So if LENGTH of my variables (ADDR1,ADDR2)  is less than 30 than in that case they will be simply loaded to ADDRESS_LN1 ,ADDRESS_LN2 respectively.

 

But now as per my requirement if the LENGTH of any of the fields (ADDR1,ADDR2) is more than 30 Bytes then we need to move the extra characters after 30 byte to subsequent ADDRESS_LN i.e if AADR1 has 35 byte data then 30 will go to ADDRESS_LN1 and 5 byte will go to ADDRESS_LN2. And now if ADDRESS_LN2 exceeds 30 bytes its remaing address info will go to ADDRESS_LN3.

 

Now the problem is we cannot just SUBSTR the variable for first 30 bytes because there might be a scenarion that the 30th Byte might be in between a WORD i.e say ADDR1 = 'ABCDEFGHI JKLMNOP QRSTUV WXYZRTT' ...so the 30th byte falls at R, so we need to move the complete string to ADDRESS_LN2  i.e entire word WXYZRTT , not only TT.So we need to check for LAST SPACE in the field and truncate accordingly.

 

Can anyone help me guide through building a logic for this.Any help will be appreciated.

 

Thanks!!

Respected Advisor
Posts: 4,609

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Can be done with regular expressions

 

data test;
length ADDR1 ADDR2 $60 ADDR $121;
length line1-line3 $30;
ADDR1 = 'ABCDEFGHI JKLMNOP QRSTUV WXYZRTT';
ADDR2 = 'ABCDEF GHIJKLMNOP QRSTUVWX YZRTT';
if not prx2 then prx2 + prxParse("/\s*(.{1,30})\b\s*(.{1,30})\b/");
if not prx3 then prx3 + prxParse("/\s*(.{1,30})\b\s*(.{1,30})\b\s*(.{1,30})\b/");
if length(ADDR1) <= 30 then do;
    line1 = ADDR1;
    if length(ADDR2) <= 30 then line2 = ADDR2;
    else do;
        if prxmatch(prx2, ADDR2) then do;
            line1 = prxposn(prx2, 1, ADDR2); 
            line2 = prxposn(prx2, 2, ADDR2); 
            end;
        end;
    end;
else do;
    ADDR = catx(" ", ADDR1, ADDR2);
    if prxmatch(prx3, ADDR) then do;
        line1 = prxposn(prx3, 1, ADDR); 
        line2 = prxposn(prx3, 2, ADDR); 
        line3 = prxposn(prx3, 3, ADDR); 
        end;
    end;
keep line1-line3;
run;

proc print data=test noobs; run;
PG
Occasional Contributor
Posts: 5

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Thank you so much for your quck response. Guess you have used Pearl language in the code ...i am running my code on ZOS mainframe Platform using SAS 9.1 .I am not aware of pear though so am having dificulty in understanding the Pearl Portion.

 

Ran you code and surprisingly it ran good on ZOS platform as well.Below is the output:

 

           LINE1           
JKLMNOP QRSTUV WXYZRTT ABCDE

 

          LINE2         
GHIJKLMNOP QRSTUVWX YZRTT

 

         LINE3      

 

The truncation did not happened as expected. Desired output is:

 

  LINE1           

ABCDEFGHI JKLMNOP QRSTUV

 

     LINE2         

WXYZRTT ABCDEF GHIJKLMNOP

 

     LINE3      

QRSTUVWX YZRTT

 

Whatever is the character after 30 bytes if it is lying in middle of a WORD then that entire word should be moved to LINE2 and so on for LINE3. Like as per your variable:

ADDR1 = 'ABCDEFGHI JKLMNOP QRSTUV WXYZRTT';

 

30 byte lies at R (3rd byte from back)...and this is a complete WORD in itself...so we want the entire word i.e WXYZRTT to be moved to LINE2 then a SPACE in b/w and then actual 30 byte of ADDR2 and so on.

 

Appreciating your help on this.

 

Thanks!!

 


             

 

Respected Advisor
Posts: 4,609

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

The code as posted prints

 

               line1                        line2                  line3

      ABCDEFGHI JKLMNOP QRSTUV    WXYZRTT ABCDEF GHIJKLMNOP    QRSTUVWX YZRTT

With SAS 9.4 on Windows 7.

PG
Occasional Contributor
Posts: 5

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

May be a different SAS version and a different platform might be causing some issue. Thanks a lot for your help on this.

Grand Advisor
Posts: 9,593

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Give us an example . Data and Output .
Occasional Contributor
Posts: 5

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Hi,

 

We are running SAS 9.1 on ZOS platform.

 

PFB the details:

If ADDRESS exceeds 30 byte the remaining WORD will be moved to subsequent address lines.In Input we have only two fileds

ADDR1   

ADDR2

We need to load them into ADDR_LINE_1,ADDR_LINE_2,ADDR_LINE_3  (each having 30 byte length)

 

Below is the input:

 

ADDR1='L-9 PEARL APARTMENT BUDDHA NAGAR GHAZIABAD UTTAR PRADESHERS';
ADDR2='NEAR CROSSING';   

 

If LENGTH of any of the above field is GREATER than 30 bytes then we Need to move the WORD that lies after 30 bytes to different varaibale ... we will move the entire WORD not only the characters that are lying after 30 bytes....and so on for ADDR2

 

We need to split the Output in three parts 30 byte each.

 

Desired Output:

ADDR_LINE_1=  L-9 PEARL APARTMENT BUDDHA      

ADDR_LINE_2= NAGAR GHAZIABAD UTTAR

ADDR_LINE_3=  PRADESHERS NEAR CROSSING

Grand Advisor
Posts: 9,593

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

data have;
ADDR1='L-9 PEARL APARTMENT BUDDHA NAGAR GHAZIABAD UTTAR PRADESHERS          ';
ADDR2='NEAR CROSSING';output;
ADDR1='L-9 PEARL APARTMENT BUDDHASSABC NAGAR GHAZIABAD UTTAR PRADESHERS';
ADDR2='NEAR CROSSING';output;
run;

data temp;
 set have;
 length temp $ 30;
 n+1;
 do i=1 to countw(ADDR1,' ');   
  temp=scan(ADDR1,i,' ');
  output;
 end;
 do i=1 to countw(ADDR2,' ');   
  temp=scan(ADDR2,i,' ');
  output;
 end;
 drop i;
run;
data temp;
 set temp;
 by n;
 if first.n then do;len=0;g=0;end;
 len+(length(temp)+1);
 if len gt 31 then do;g+1;len=length(temp)+1;end;
run;
data x;
 length x $ 30;
 do until(last.g);
  set temp;
  by n g;
  x=catx(' ',x,temp);
 end;
run;
proc transpose data=x out=want(drop=_name_) prefix=line;
by n ADDR1 ADDR2;
var x;
run;
  



Respected Advisor
Posts: 4,609

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

My code produces 

 

              line1                          line2                           line3

    L-9 PEARL APARTMENT BUDDHA    NAGAR GHAZIABAD UTTAR          PRADESHERS NEAR CROSSING
    L-9 PEARL APARTMENT           BUDDHASSABC NAGAR GHAZIABAD    UTTAR PRADESHERS NEAR CROSSING

with @Ksharp's data. Is that the expected result?

PG
Occasional Contributor
Posts: 5

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

@PG Stats 

Yes this is the expected result. Thanks a lot for your help on this. Appreciate and Thanks!!

 

@KSharp -

You made my day...thanks for posting the perfect code. It worked perfectly just had to add few fields to it as it was to be merged with other dataset based on a unique key.Thank you so much !!!!

 

Thanks a lot guys...Smiley Happy

 

Just to tell you guys below was the code i was using, but the issue with it was it was splitting the ADDR1 and ADDR2 fields into 4 parts rarther than 3.

 

 

DATA                                                           
   TEST                                                        
   ;                                                           
                                                               
   LENGTH LINE1 LINE2 LINE3 $30.;                               
                                                                
   ADDR1='L-9 PEARL APARTMENT BUDDHA NAGAR';        
   ADDR2='GHAZIABAD UTTAR PRADESHERS NEAR CROSSING';            
   ADDRL3 =' ';                                                
                                                               
    CHK_AD1 = SUBSTR(ADDRL1,31,05);                            
    CHK_AD2 = SUBSTR(ADDRL2,31,05);                            
    CHK_AD3 = SUBSTR(ADDRL3,31,05);                            
                                                               
    IF CHK_AD1 > ' ' OR                                        
       CHK_AD2 > ' ' OR                                        
       CHK_AD3 > ' ' THEN DO;                                  
                                                               

 ARRAY ADD_IN(3) $60;                                     
 ARRAY ADD_OUT(5) $30;                                    
 ARRAY WORD(50) $60;                                      
                                                          
 ADD_IN(1) = ADDRL1;                                      
 ADD_IN(2) = ADDRL2;                                      
 ADD_IN(3) = ADDRL3;                                      
                                                          
 WORD_CUT = 1;                                            
 LINE = 0;                                                
                                                          
 DO N=1 TO 3;                                             
                                                          
    IF LENGTH(ADD_IN(N)) > 30 THEN DO;                    
                                                          
       DO I=1 TO 50 UNTIL(WORD(I) = '');                  
          WORD(I) = TRIM(LEFT(SCAN(ADD_IN(N), I, ' ')));  
       END;                                               
                                                          

                                                               
      IF INDEXC(WORD(I-1),'#0123456789') > 0 THEN WORD_CUT = 2;
                                                               
      ELSE IF (LENGTH(TRIM(LEFT(ADD_IN(N)))) -                 
         LENGTH(TRIM(LEFT(WORD(I-1))))) > 30 THEN WORD_CUT = 2;
                                                               
      IF WORD_CUT = 2 THEN DO;                                 
                                                               
         NXT1 = N + LINE;                                      
         NXT2 = N + 1 + LINE;                                  
                                                               
         ADD_OUT(NXT2) =                                       
          TRIM(LEFT(WORD(I-2))) || ' ' || TRIM(LEFT(WORD(I-1)));
         ADD_OUT(NXT1) =                                       
          SUBSTR(ADD_IN(N),1,LENGTH(ADD_IN(N)) -               
          LENGTH(ADD_OUT(NXT2)) - 1);                          
                                                               
      END;                                                     
      ELSE IF WORD_CUT = 1 THEN DO;              

                                                  
          NXT1 = N + LINE;                        
          NXT2 = N + 1 + LINE;                    
                                                  
          ADD_OUT(NXT2) =                         
           TRIM(LEFT(WORD(I-1)));                 
          ADD_OUT(NXT1) =                         
           SUBSTR(ADD_IN(N),1,LENGTH(ADD_IN(N)) - 
           LENGTH(ADD_OUT(NXT2)) - 1);            
                                                  
       END;                                       
                                                  
       LINE + 1;                                  
                                                  
    END;                                          
                                                  
    ELSE ADD_OUT(N + LINE) = ADD_IN(N);           

                             
       END;                  
                             
    END;                     
    ELSE DO;                 
       ADD_OUT(1) = ADDRL1;  
       ADD_OUT(2) = ADDRL2;  
       ADD_OUT(3) = ADDRL3;  
       ADD_OUT(4) = ' ';     
       ADD_OUT(5) = ' ';     
    END;                     
                             
    LINE1 = ADD_OUT(1);      
    LINE2 = ADD_OUT(2);      
    LINE3 = ADD_OUT(3);      
RUN;                                        

 

Ask a Question
Discussion stats
  • 9 replies
  • 416 views
  • 0 likes
  • 3 in conversation