Need help in Splitting two address Address Variables(ADDR1,ADDR2) into...

vaibhavpratap · Posted 06-09-2016 10:54 PM

Hi All,

I have two Character Fields (ADDR1,ADDR2) which i am reading from a PIPE DELIMITED file (using Infile) maximum length for these can be $60.

I need to load these two fields into one Table which has columns as below:

ADDRESS_LN1 (Can take upto 30 characters)

ADDRESS_LN2 (Can take upto 30 characters)

ADDRESS_LN3 (Can take upto 30 characters)

So if LENGTH of my variables (ADDR1,ADDR2) is less than 30 than in that case they will be simply loaded to ADDRESS_LN1 ,ADDRESS_LN2 respectively.

But now as per my requirement if the LENGTH of any of the fields (ADDR1,ADDR2) is more than 30 Bytes then we need to move the extra characters after 30 byte to subsequent ADDRESS_LN i.e if AADR1 has 35 byte data then 30 will go to ADDRESS_LN1 and 5 byte will go to ADDRESS_LN2. And now if ADDRESS_LN2 exceeds 30 bytes its remaing address info will go to ADDRESS_LN3.

Now the problem is we cannot just SUBSTR the variable for first 30 bytes because there might be a scenarion that the 30th Byte might be in between a WORD i.e say ADDR1 = 'ABCDEFGHI JKLMNOP QRSTUV WXYZRTT' ...so the 30th byte falls at R, so we need to move the complete string to ADDRESS_LN2 i.e entire word WXYZRTT , not only TT.So we need to check for LAST SPACE in the field and truncate accordingly.

Can anyone help me guide through building a logic for this.Any help will be appreciated.

Thanks!!

PGStats · Posted 06-09-2016 11:34 PM

Can be done with regular expressions

data test;
length ADDR1 ADDR2 $60 ADDR $121;
length line1-line3 $30;
ADDR1 = 'ABCDEFGHI JKLMNOP QRSTUV WXYZRTT';
ADDR2 = 'ABCDEF GHIJKLMNOP QRSTUVWX YZRTT';
if not prx2 then prx2 + prxParse("/\s*(.{1,30})\b\s*(.{1,30})\b/");
if not prx3 then prx3 + prxParse("/\s*(.{1,30})\b\s*(.{1,30})\b\s*(.{1,30})\b/");
if length(ADDR1) <= 30 then do;
    line1 = ADDR1;
    if length(ADDR2) <= 30 then line2 = ADDR2;
    else do;
        if prxmatch(prx2, ADDR2) then do;
            line1 = prxposn(prx2, 1, ADDR2); 
            line2 = prxposn(prx2, 2, ADDR2); 
            end;
        end;
    end;
else do;
    ADDR = catx(" ", ADDR1, ADDR2);
    if prxmatch(prx3, ADDR) then do;
        line1 = prxposn(prx3, 1, ADDR); 
        line2 = prxposn(prx3, 2, ADDR); 
        line3 = prxposn(prx3, 3, ADDR); 
        end;
    end;
keep line1-line3;
run;

proc print data=test noobs; run;

PG

vaibhavpratap · Posted 06-10-2016 08:15 AM

Thank you so much for your quck response. Guess you have used Pearl language in the code ...i am running my code on ZOS mainframe Platform using SAS 9.1 .I am not aware of pear though so am having dificulty in understanding the Pearl Portion.

Ran you code and surprisingly it ran good on ZOS platform as well.Below is the output:

LINE1
JKLMNOP QRSTUV WXYZRTT ABCDE

LINE2
GHIJKLMNOP QRSTUVWX YZRTT

LINE3

The truncation did not happened as expected. Desired output is:

LINE1

ABCDEFGHI JKLMNOP QRSTUV

LINE2

WXYZRTT ABCDEF GHIJKLMNOP

LINE3

QRSTUVWX YZRTT

Whatever is the character after 30 bytes if it is lying in middle of a WORD then that entire word should be moved to LINE2 and so on for LINE3. Like as per your variable:

ADDR1 = 'ABCDEFGHI JKLMNOP QRSTUV WXYZRTT';

30 byte lies at R (3rd byte from back)...and this is a complete WORD in itself...so we want the entire word i.e WXYZRTT to be moved to LINE2 then a SPACE in b/w and then actual 30 byte of ADDR2 and so on.

Appreciating your help on this.

Thanks!!

PGStats · Posted 06-10-2016 11:24 AM

The code as posted prints

               line1                        line2                  line3

      ABCDEFGHI JKLMNOP QRSTUV    WXYZRTT ABCDEF GHIJKLMNOP    QRSTUVWX YZRTT

With SAS 9.4 on Windows 7.

PG

vaibhavpratap · Posted 06-11-2016 05:41 AM

May be a different SAS version and a different platform might be causing some issue. Thanks a lot for your help on this.

Ksharp · Posted 06-10-2016 12:57 AM

Give us an example . Data and Output .

vaibhavpratap · Posted 06-10-2016 08:26 AM

Hi,

We are running SAS 9.1 on ZOS platform.

PFB the details:

If ADDRESS exceeds 30 byte the remaining WORD will be moved to subsequent address lines.In Input we have only two fileds

ADDR1

ADDR2

We need to load them into ADDR_LINE_1,ADDR_LINE_2,ADDR_LINE_3 (each having 30 byte length)

Below is the input:

ADDR1='L-9 PEARL APARTMENT BUDDHA NAGAR GHAZIABAD UTTAR PRADESHERS';
ADDR2='NEAR CROSSING';

If LENGTH of any of the above field is GREATER than 30 bytes then we Need to move the WORD that lies after 30 bytes to different varaibale ... we will move the entire WORD not only the characters that are lying after 30 bytes....and so on for ADDR2

We need to split the Output in three parts 30 byte each.

Desired Output:

ADDR_LINE_1= L-9 PEARL APARTMENT BUDDHA

ADDR_LINE_2= NAGAR GHAZIABAD UTTAR

ADDR_LINE_3= PRADESHERS NEAR CROSSING

Ksharp · Posted 06-10-2016 11:03 PM

data have;
ADDR1='L-9 PEARL APARTMENT BUDDHA NAGAR GHAZIABAD UTTAR PRADESHERS          ';
ADDR2='NEAR CROSSING';output;
ADDR1='L-9 PEARL APARTMENT BUDDHASSABC NAGAR GHAZIABAD UTTAR PRADESHERS';
ADDR2='NEAR CROSSING';output;
run;

data temp;
 set have;
 length temp $ 30;
 n+1;
 do i=1 to countw(ADDR1,' ');   
  temp=scan(ADDR1,i,' ');
  output;
 end;
 do i=1 to countw(ADDR2,' ');   
  temp=scan(ADDR2,i,' ');
  output;
 end;
 drop i;
run;
data temp;
 set temp;
 by n;
 if first.n then do;len=0;g=0;end;
 len+(length(temp)+1);
 if len gt 31 then do;g+1;len=length(temp)+1;end;
run;
data x;
 length x $ 30;
 do until(last.g);
  set temp;
  by n g;
  x=catx(' ',x,temp);
 end;
run;
proc transpose data=x out=want(drop=_name_) prefix=line;
by n ADDR1 ADDR2;
var x;
run;

PGStats · Posted 06-11-2016 12:11 AM

My code produces

              line1                          line2                           line3

    L-9 PEARL APARTMENT BUDDHA    NAGAR GHAZIABAD UTTAR          PRADESHERS NEAR CROSSING
    L-9 PEARL APARTMENT           BUDDHASSABC NAGAR GHAZIABAD    UTTAR PRADESHERS NEAR CROSSING

with @Ksharp's data. Is that the expected result?

PG

vaibhavpratap · Posted 06-11-2016 05:39 AM

@PG Stats -

Yes this is the expected result. Thanks a lot for your help on this. Appreciate and Thanks!!

@KSharp -

You made my day...thanks for posting the perfect code. It worked perfectly just had to add few fields to it as it was to be merged with other dataset based on a unique key.Thank you so much !!!!

Thanks a lot guys...:)

Just to tell you guys below was the code i was using, but the issue with it was it was splitting the ADDR1 and ADDR2 fields into 4 parts rarther than 3.

DATA
   TEST
   ;

   LENGTH LINE1 LINE2 LINE3 $30.;

   ADDR1='L-9 PEARL APARTMENT BUDDHA NAGAR';
   ADDR2='GHAZIABAD UTTAR PRADESHERS NEAR CROSSING';
   ADDRL3 =' ';

    CHK_AD1 = SUBSTR(ADDRL1,31,05);
    CHK_AD2 = SUBSTR(ADDRL2,31,05);
    CHK_AD3 = SUBSTR(ADDRL3,31,05);

    IF CHK_AD1 > ' ' OR
       CHK_AD2 > ' ' OR
       CHK_AD3 > ' ' THEN DO;

ARRAY ADD_IN(3) $60;
ARRAY ADD_OUT(5) $30;
ARRAY WORD(50) $60;

ADD_IN(1) = ADDRL1;
ADD_IN(2) = ADDRL2;
ADD_IN(3) = ADDRL3;

WORD_CUT = 1;
LINE = 0;

DO N=1 TO 3;

    IF LENGTH(ADD_IN(N)) > 30 THEN DO;

       DO I=1 TO 50 UNTIL(WORD(I) = '');
          WORD(I) = TRIM(LEFT(SCAN(ADD_IN(N), I, ' ')));
       END;

      IF INDEXC(WORD(I-1),'#0123456789') > 0 THEN WORD_CUT = 2;

      ELSE IF (LENGTH(TRIM(LEFT(ADD_IN(N)))) -
         LENGTH(TRIM(LEFT(WORD(I-1))))) > 30 THEN WORD_CUT = 2;

      IF WORD_CUT = 2 THEN DO;

         NXT1 = N + LINE;
         NXT2 = N + 1 + LINE;

         ADD_OUT(NXT2) =
          TRIM(LEFT(WORD(I-2))) || ' ' || TRIM(LEFT(WORD(I-1)));
         ADD_OUT(NXT1) =
          SUBSTR(ADD_IN(N),1,LENGTH(ADD_IN(N)) -
          LENGTH(ADD_OUT(NXT2)) - 1);

      END;
      ELSE IF WORD_CUT = 1 THEN DO;

          NXT1 = N + LINE;
          NXT2 = N + 1 + LINE;

          ADD_OUT(NXT2) =
           TRIM(LEFT(WORD(I-1)));
          ADD_OUT(NXT1) =
           SUBSTR(ADD_IN(N),1,LENGTH(ADD_IN(N)) -
           LENGTH(ADD_OUT(NXT2)) - 1);

       END;

       LINE + 1;

    END;

    ELSE ADD_OUT(N + LINE) = ADD_IN(N);

       END;

    END;
    ELSE DO;
       ADD_OUT(1) = ADDRL1;
       ADD_OUT(2) = ADDRL2;
       ADD_OUT(3) = ADDRL3;
       ADD_OUT(4) = ' ';
       ADD_OUT(5) = ' ';
    END;

    LINE1 = ADD_OUT(1);
    LINE2 = ADD_OUT(2);
    LINE3 = ADD_OUT(3);
RUN;

Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR3)

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Re: Need help in Splitting two address Address Variables(ADDR1,ADDR2) into 3 parts (ADDR1,ADDR2,ADDR

Classroom Training Available!