Divide a string variable into sub-components

Accepted Solution Solved
Reply
Contributor SM1
Contributor
Posts: 51
Accepted Solution

Divide a string variable into sub-components

Hello,

 

I need to divide a string variable into several component parts. All new variables will be string variables.

 

Sample data provided below:

 

OriginalVar     Component1    Component2    Component3

1702                 17                      02

1627M              16                      27                      M

1628SD            16                      28                      SD

18003               18                      003

1730S3             17                      30                      S3 

1704/1              17                      04                      1

 

I used the following syntax:

 

data want;

     data have;

           Component1 = substr(originalvar,1,2); /*this variable is good*/

           Rest=substr(originalvar,3); /*this variable contains everything from position 3 to end of OriginalVar*/

           restAlpha=anyalpha(rest); /*identify position of 1st alpha in "REST"*/

           restPunct=anypunct(rest); /*identify position of 1st punctuation in "REST"*/

           restDigit=anydigit(rest); /*identify position of 1st digit in "REST"*/

 

if restAlpha = 0 and restPunct = 0 then Component2 = Rest;/*this syntax correctly creates Component2*/
        else if restAlpha gt 0 and restPunct eq 0 then Component2 = substr(Rest,restDigit,restAlpha-restDigit);
        else if restAlpha eq 0 and restPunct gt 0 then Component2 = substr(Rest,restDigit,restPunct-restDigit);

    if restAlpha = 0 and restPunct = 0 thenComponent3 = ' '; /*Component3 is not properly generated*/
        else if restAlpha gt 0 and restPunct eq 0 then Component3 = substr(Rest,restAlpha,3);
        else if restAlpha eq 0 and restPunct gt 0 then Component3 = substr(Rest,restPunct,3);

 

end;

 

This syntax generates the following:

OriginalVar     Component1    Component2    Component3

1702                 17                      02

1627M              16                      27                      M

1628SD            16                      28                      S         /*this is wrong - Component3 should = 'SD'*/

18003               18                      003

1730S3             17                      30                      S        /*this is wrong - Component3 should = 'S3'*/

1704/1              17                      04                      /         /*this is wrong - Component3 should = '1'*/

 

 

Any suggestions for how to fix my syntax?

 

Thanks so much!

 

 

        

 

 

 

 

 

 

 


Accepted Solutions
Solution
‎12-13-2017 01:22 PM
Super User
Posts: 6,637

Re: Divide a string variable into sub-components

Given that you insert the space where Reeza suggested, there are a couple of additional issues to attend to.

 

First, COMPONENT3 is defined as one character long.  There are a couple of easy ways to fix this, one being to add a second space inside the quotes:

 

 if restAlpha = 0 and restPunct = 0 thenComponent3 = '  ';

 

Second, when you find the punctuation, you actually want to begin locating COMPONENT3 with the next character, not where the punctuation is found.  So:

 

else if restAlpha eq 0 and restPunct gt 0 then Component3 = substr(Rest,restPunct+1,3);

 

 

View solution in original post


All Replies
Super User
Posts: 23,323

Re: Divide a string variable into sub-components

The log generates a lot of errors. Did you fix those first?

Or is the code shown not your actual code?

 

Mistakes:

1. No SET but two DATA statements (line 2)

2. missing space I used an underscore to illustrate where the issue is. 

    if restAlpha = 0 and restPunct = 0 then_Component3 = ' '; /*Component3 is not properly generated*/

 

Results are correct after these fixes. 

They were all noted in the log, so please read your log in the future and include it your original post.

 


SM1 wrote:

Hello,

 

I need to divide a string variable into several component parts. All new variables will be string variables.

 

Sample data provided below:

 

OriginalVar     Component1    Component2    Component3

1702                 17                      02

1627M              16                      27                      M

1628SD            16                      28                      SD

18003               18                      003

1730S3             17                      30                      S3 

1704/1              17                      04                      1

 

I used the following syntax:

 

data want;

     data have;

           Component1 = substr(originalvar,1,2); /*this variable is good*/

           Rest=substr(originalvar,3); /*this variable contains everything from position 3 to end of OriginalVar*/

           restAlpha=anyalpha(rest); /*identify position of 1st alpha in "REST"*/

           restPunct=anypunct(rest); /*identify position of 1st punctuation in "REST"*/

           restDigit=anydigit(rest); /*identify position of 1st digit in "REST"*/

 

if restAlpha = 0 and restPunct = 0 then Component2 = Rest;/*this syntax correctly creates Component2*/
        else if restAlpha gt 0 and restPunct eq 0 then Component2 = substr(Rest,restDigit,restAlpha-restDigit);
        else if restAlpha eq 0 and restPunct gt 0 then Component2 = substr(Rest,restDigit,restPunct-restDigit);

    if restAlpha = 0 and restPunct = 0 thenComponent3 = ' '; /*Component3 is not properly generated*/
        else if restAlpha gt 0 and restPunct eq 0 then Component3 = substr(Rest,restAlpha,3);
        else if restAlpha eq 0 and restPunct gt 0 then Component3 = substr(Rest,restPunct,3);

 

end;

 

This syntax generates the following:

OriginalVar     Component1    Component2    Component3

1702                 17                      02

1627M              16                      27                      M

1628SD            16                      28                      S         /*this is wrong - Component3 should = 'SD'*/

18003               18                      003

1730S3             17                      30                      S        /*this is wrong - Component3 should = 'S3'*/

1704/1              17                      04                      /         /*this is wrong - Component3 should = '1'*/

 

 

Any suggestions for how to fix my syntax?

 

Thanks so much!

 



 

Solution
‎12-13-2017 01:22 PM
Super User
Posts: 6,637

Re: Divide a string variable into sub-components

Given that you insert the space where Reeza suggested, there are a couple of additional issues to attend to.

 

First, COMPONENT3 is defined as one character long.  There are a couple of easy ways to fix this, one being to add a second space inside the quotes:

 

 if restAlpha = 0 and restPunct = 0 thenComponent3 = '  ';

 

Second, when you find the punctuation, you actually want to begin locating COMPONENT3 with the next character, not where the punctuation is found.  So:

 

else if restAlpha eq 0 and restPunct gt 0 then Component3 = substr(Rest,restPunct+1,3);

 

 

Super User
Posts: 23,323

Re: Divide a string variable into sub-components

Posted in reply to Astounding

Component3 exists in the original data with the original length, it's not a newly created variable. 

Super User
Posts: 6,637

Re: Divide a string variable into sub-components

If @Reeza is correct about this, and COMPONENT3 already exists before the DATA step begins, you might still have the same problem.  COMPONENT3 might be already defined as having a length of $1.  You can easily change that by inserting this statement BEFORE the SET statement:

 

length component3 $ 2;

Contributor SM1
Contributor
Posts: 51

Re: Divide a string variable into sub-components

Posted in reply to Astounding

@Reezaand @Astounding

 

Thanks for your comments.

 

Yes, I tried to simplify my syntax and typed it (rather than cutting and pasting). The variable names are not descriptive and quickly become confusing.

 

My actual syntax (with additional comments) follows:

 

data rr_rptsernum;
    set rr_srutype;
    length rptsernumA rptsernum1 $15.;  /*added length statements as suggested*/
    length rptsernum1a rptsernum1b rptsernum2 $10.;
    length rptsernum3a rptsernum3b RRnum3 $5.;
    rptsernum = compress(rptsernum,'-)','C');
    rptsernumA = tranwrd(rptsernum,' (','/');/*replace ' (' with '/'*/
    rptsernum1 = tranwrd(rptsernumA,'XX','999');
    rptsernum1a = substr(rptsernum1,1,2); /*retains only initial 2 digits = year*/
    rptsernum1b = substr(rptsernum1,3); /*retains everything else (rpt number and extra text)*/    
    rptsernum2 = scan(rptsernum1b,1,' '); /*deletes everything that follows a space*/
    rptsernum2a = anyalpha(rptsernum2);
    rptsernum2b = anypunct(rptsernum2);
    rptsernum2c = anydigit(rptsernum2);

    if rptsernum2a = 0 and rptsernum2b = 0 then rptsernum3a = rptsernum2;
        else if rptsernum2a gt 0 and rptsernum2b eq 0 then rptsernum3a = substr(rptsernum2,rptsernum2c,rptsernum2a-rptsernum2c);
        else if rptsernum2a eq 0 and rptsernum2b gt 0 then rptsernum3a = substr(rptsernum2,rptsernum2c,rptsernum2b-rptsernum2c);
        else if rptsernum2a gt 0 and rptsernum2b gt 0 then rptsernum3a = substr(rptsernum2,rptsernum2c,rptsernum2b-rptsernum2c);

    if rptsernum2a = 0 and rptsernum2b = 0 then rptsernum3b = ' ';
        else if rptsernum2a gt 0 and rptsernum2b eq 0 then rptsernum3b = substr(rptsernum2,rptsernum2a,3);
        else if rptsernum2a eq 0 and rptsernum2b gt 0 then rptsernum3b = substr(rptsernum2,rptsernum2b+1,3); /*added '+1' as suggested*/
        else if rptsernum2a gt 0 and rptsernum2b gt 0 then rptsernum3b = substr(rptsernum2,rptsernum2b+1,3); /*added '+1' as suggested*/
    RRnum1 = input(rptsernum1a, 4.);
    RRnum2 = input(rptsernum3a, 4.);
    RRnum3 = rptsernum3b;
run;

 

Including the length statements and adding '+1' to account for the punctuation were both required.

 

The syntax now properly divides the original variable into the required component parts.

 

Thanks so much for your assistance!

 

SM1

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 159 views
  • 2 likes
  • 3 in conversation