BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

Hello,

 

I need to divide a string variable into several component parts. All new variables will be string variables.

 

Sample data provided below:

 

OriginalVar     Component1    Component2    Component3

1702                 17                      02

1627M              16                      27                      M

1628SD            16                      28                      SD

18003               18                      003

1730S3             17                      30                      S3 

1704/1              17                      04                      1

 

I used the following syntax:

 

data want;

     data have;

           Component1 = substr(originalvar,1,2); /*this variable is good*/

           Rest=substr(originalvar,3); /*this variable contains everything from position 3 to end of OriginalVar*/

           restAlpha=anyalpha(rest); /*identify position of 1st alpha in "REST"*/

           restPunct=anypunct(rest); /*identify position of 1st punctuation in "REST"*/

           restDigit=anydigit(rest); /*identify position of 1st digit in "REST"*/

 

if restAlpha = 0 and restPunct = 0 then Component2 = Rest;/*this syntax correctly creates Component2*/
        else if restAlpha gt 0 and restPunct eq 0 then Component2 = substr(Rest,restDigit,restAlpha-restDigit);
        else if restAlpha eq 0 and restPunct gt 0 then Component2 = substr(Rest,restDigit,restPunct-restDigit);

    if restAlpha = 0 and restPunct = 0 thenComponent3 = ' '; /*Component3 is not properly generated*/
        else if restAlpha gt 0 and restPunct eq 0 then Component3 = substr(Rest,restAlpha,3);
        else if restAlpha eq 0 and restPunct gt 0 then Component3 = substr(Rest,restPunct,3);

 

end;

 

This syntax generates the following:

OriginalVar     Component1    Component2    Component3

1702                 17                      02

1627M              16                      27                      M

1628SD            16                      28                      S         /*this is wrong - Component3 should = 'SD'*/

18003               18                      003

1730S3             17                      30                      S        /*this is wrong - Component3 should = 'S3'*/

1704/1              17                      04                      /         /*this is wrong - Component3 should = '1'*/

 

 

Any suggestions for how to fix my syntax?

 

Thanks so much!

 

 

        

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Given that you insert the space where Reeza suggested, there are a couple of additional issues to attend to.

 

First, COMPONENT3 is defined as one character long.  There are a couple of easy ways to fix this, one being to add a second space inside the quotes:

 

 if restAlpha = 0 and restPunct = 0 thenComponent3 = '  ';

 

Second, when you find the punctuation, you actually want to begin locating COMPONENT3 with the next character, not where the punctuation is found.  So:

 

else if restAlpha eq 0 and restPunct gt 0 then Component3 = substr(Rest,restPunct+1,3);

 

 

View solution in original post

5 REPLIES 5
Reeza
Super User

The log generates a lot of errors. Did you fix those first?

Or is the code shown not your actual code?

 

Mistakes:

1. No SET but two DATA statements (line 2)

2. missing space I used an underscore to illustrate where the issue is. 

    if restAlpha = 0 and restPunct = 0 then_Component3 = ' '; /*Component3 is not properly generated*/

 

Results are correct after these fixes. 

They were all noted in the log, so please read your log in the future and include it your original post.

 


@SM1 wrote:

Hello,

 

I need to divide a string variable into several component parts. All new variables will be string variables.

 

Sample data provided below:

 

OriginalVar     Component1    Component2    Component3

1702                 17                      02

1627M              16                      27                      M

1628SD            16                      28                      SD

18003               18                      003

1730S3             17                      30                      S3 

1704/1              17                      04                      1

 

I used the following syntax:

 

data want;

     data have;

           Component1 = substr(originalvar,1,2); /*this variable is good*/

           Rest=substr(originalvar,3); /*this variable contains everything from position 3 to end of OriginalVar*/

           restAlpha=anyalpha(rest); /*identify position of 1st alpha in "REST"*/

           restPunct=anypunct(rest); /*identify position of 1st punctuation in "REST"*/

           restDigit=anydigit(rest); /*identify position of 1st digit in "REST"*/

 

if restAlpha = 0 and restPunct = 0 then Component2 = Rest;/*this syntax correctly creates Component2*/
        else if restAlpha gt 0 and restPunct eq 0 then Component2 = substr(Rest,restDigit,restAlpha-restDigit);
        else if restAlpha eq 0 and restPunct gt 0 then Component2 = substr(Rest,restDigit,restPunct-restDigit);

    if restAlpha = 0 and restPunct = 0 thenComponent3 = ' '; /*Component3 is not properly generated*/
        else if restAlpha gt 0 and restPunct eq 0 then Component3 = substr(Rest,restAlpha,3);
        else if restAlpha eq 0 and restPunct gt 0 then Component3 = substr(Rest,restPunct,3);

 

end;

 

This syntax generates the following:

OriginalVar     Component1    Component2    Component3

1702                 17                      02

1627M              16                      27                      M

1628SD            16                      28                      S         /*this is wrong - Component3 should = 'SD'*/

18003               18                      003

1730S3             17                      30                      S        /*this is wrong - Component3 should = 'S3'*/

1704/1              17                      04                      /         /*this is wrong - Component3 should = '1'*/

 

 

Any suggestions for how to fix my syntax?

 

Thanks so much!

 



 

Astounding
PROC Star

Given that you insert the space where Reeza suggested, there are a couple of additional issues to attend to.

 

First, COMPONENT3 is defined as one character long.  There are a couple of easy ways to fix this, one being to add a second space inside the quotes:

 

 if restAlpha = 0 and restPunct = 0 thenComponent3 = '  ';

 

Second, when you find the punctuation, you actually want to begin locating COMPONENT3 with the next character, not where the punctuation is found.  So:

 

else if restAlpha eq 0 and restPunct gt 0 then Component3 = substr(Rest,restPunct+1,3);

 

 

Reeza
Super User

Component3 exists in the original data with the original length, it's not a newly created variable. 

Astounding
PROC Star

If @Reeza is correct about this, and COMPONENT3 already exists before the DATA step begins, you might still have the same problem.  COMPONENT3 might be already defined as having a length of $1.  You can easily change that by inserting this statement BEFORE the SET statement:

 

length component3 $ 2;

SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

@Reezaand @Astounding

 

Thanks for your comments.

 

Yes, I tried to simplify my syntax and typed it (rather than cutting and pasting). The variable names are not descriptive and quickly become confusing.

 

My actual syntax (with additional comments) follows:

 

data rr_rptsernum;
    set rr_srutype;
    length rptsernumA rptsernum1 $15.;  /*added length statements as suggested*/
    length rptsernum1a rptsernum1b rptsernum2 $10.;
    length rptsernum3a rptsernum3b RRnum3 $5.;
    rptsernum = compress(rptsernum,'-)','C');
    rptsernumA = tranwrd(rptsernum,' (','/');/*replace ' (' with '/'*/
    rptsernum1 = tranwrd(rptsernumA,'XX','999');
    rptsernum1a = substr(rptsernum1,1,2); /*retains only initial 2 digits = year*/
    rptsernum1b = substr(rptsernum1,3); /*retains everything else (rpt number and extra text)*/    
    rptsernum2 = scan(rptsernum1b,1,' '); /*deletes everything that follows a space*/
    rptsernum2a = anyalpha(rptsernum2);
    rptsernum2b = anypunct(rptsernum2);
    rptsernum2c = anydigit(rptsernum2);

    if rptsernum2a = 0 and rptsernum2b = 0 then rptsernum3a = rptsernum2;
        else if rptsernum2a gt 0 and rptsernum2b eq 0 then rptsernum3a = substr(rptsernum2,rptsernum2c,rptsernum2a-rptsernum2c);
        else if rptsernum2a eq 0 and rptsernum2b gt 0 then rptsernum3a = substr(rptsernum2,rptsernum2c,rptsernum2b-rptsernum2c);
        else if rptsernum2a gt 0 and rptsernum2b gt 0 then rptsernum3a = substr(rptsernum2,rptsernum2c,rptsernum2b-rptsernum2c);

    if rptsernum2a = 0 and rptsernum2b = 0 then rptsernum3b = ' ';
        else if rptsernum2a gt 0 and rptsernum2b eq 0 then rptsernum3b = substr(rptsernum2,rptsernum2a,3);
        else if rptsernum2a eq 0 and rptsernum2b gt 0 then rptsernum3b = substr(rptsernum2,rptsernum2b+1,3); /*added '+1' as suggested*/
        else if rptsernum2a gt 0 and rptsernum2b gt 0 then rptsernum3b = substr(rptsernum2,rptsernum2b+1,3); /*added '+1' as suggested*/
    RRnum1 = input(rptsernum1a, 4.);
    RRnum2 = input(rptsernum3a, 4.);
    RRnum3 = rptsernum3b;
run;

 

Including the length statements and adding '+1' to account for the punctuation were both required.

 

The syntax now properly divides the original variable into the required component parts.

 

Thanks so much for your assistance!

 

SM1

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 744 views
  • 2 likes
  • 3 in conversation