Help using Base SAS procedures

Splitting a text using substr

Accepted Solution Solved
Reply
Contributor
Posts: 52
Accepted Solution

Splitting a text using substr

Hi

I have a column of codes containing alphanumeric characters:

Code

1A2B3C4D

2B3C3D

1A1G2C2B3A

and so on....

I need to split the code into 2-character variables thus:

Code                    Type_a     Type_b     Type_c     Type_d

1A2B3C4D          1A               2B          3C              4D

2B3C3D               2B               3C          3D

This is the code I am using:

data f12;

set f12;

type_a=substr(code,1,2);    

type_b=substr(code,3,2);

type_c=substr(code,5,2);

type_d=substr(code,7,2);

type_e=substr(code,9,2);

run;

The code runs but returns this error: NOTE: Invalid second argument to function SUBSTR at line 2007 column 8.

Each time, I run the code, the error pertains to a different line/row.

I thought I could use substrn to avoid the problem but I was testing the difference in output with both substr and substrn but the results come off the same.

Should I use the Strip or Trim function as in: type_a=substr(trim(code),1,2)?

Thanks!


Accepted Solutions
Solution
‎12-07-2013 09:26 AM
Respected Advisor
Posts: 3,799

Re: Splitting a text using substr

SUBPAD

18         data _null_;
19            a='abc';
20            x = subpad(a,3,2);
21            put (_all_)(=);
22            run;

a=abc x=c

View solution in original post


All Replies
Super User
Posts: 19,862

Re: Splitting a text using substr

On your second observation, when you run type_d, the values to substr are invalid because they're out of bounds, the string is too short.

I suggest a loop instead.

data have;

length code $10.;

input code;

cards;

1A2B3C4D

2B3C3D

1A1G2C2B3A

;

run;

data want;

set have;

array type(5) $ type_a type_b type_c type_d type_e;

i=1; j=1;

do while(j-1<length(code));

type(i)=substr(code, j, 2);

i+1;

j+2;

end;

run;

Super User
Super User
Posts: 7,076

Re: Splitting a text using substr

Your problem is that the variable CODE must have been defined as length $9.   So you cannot use SUBSTR(CODE,9,2) because 2 is an impossible length when starting on the last character.

One way to avoid having to specify the length is to let SAS do the truncation by defining the target variables as length $2.

data f12;

  set f12;

   array type $2 type1-type5;

   do i=1 to 5 while (2*i-1 <= length(code));

      type(i) = substr(code,2*i-1) ;

  end;

run;

Contributor
Posts: 52

Re: Splitting a text using substr

Thanks a lot,Reeze and Tom.

I have a quick question: I understand that the string in my variable' is too short for type_d for some of the observations.

However, why does the code run in some cases even when the string is too short- the error is returned for different rows every time I run the code?

Another question is I am trying to import a csv dataset that contains variables for address and dates too. I use Proc Import and the log shows the number of observations,rows/variables created in SAS. The dataset is therefore, created.The log however, also gives a message that the import is unsuccessful and when I scroll up, I find errors regarding invalid data for date. Is the message 'import unsuccessful' just because of those 'invalid data' errors? In that case, I can actually ignore the message since I shall anyway drop the variables for address and date.

Thanks a lot!

Super User
Posts: 19,862

Re: Splitting a text using substr

The row reference that the error return are from the line number of the code submitted, not with your observations.  Since each code is submitted with new line numbers the location of the error changes.

Please post your new question as a different post.

Super User
Super User
Posts: 7,076

Re: Splitting a text using substr

If you use PROC IMPORT to convert a CSV file to a SAS dataset it will have to GUESS what type of variable to define for each column. CSV files have no place store variable definitions.  So some CSV files the maximum length of the value in the "CODE" column might be 8 and in others it might be 10.  So sometimes PROC IMPORT will define it with length $8 and other times it will be defined as length $10.  (Not to mention when it only sees digits in the field so it defines CODE as number instead of character)

If your files have a regular structure then just write a data step to read the CSV file and then you have complete control over how the variables are defined.

Solution
‎12-07-2013 09:26 AM
Respected Advisor
Posts: 3,799

Re: Splitting a text using substr

SUBPAD

18         data _null_;
19            a='abc';
20            x = subpad(a,3,2);
21            put (_all_)(=);
22            run;

a=abc x=c
Contributor
Posts: 52

Re: Splitting a text using substr

Posted in reply to data_null__

Thanks, everyone! I really appreciate the feedback.

@Tom: You had explained the datastep/proc import in another post(below)-specify the length of the variables. Is that what you are suggesting that I should do?

https://communities.sas.com/thread/40451

In this case, I shall anyway drop the variables(with respect to which the errors are mentioned in SAS log).  Should I still be concerned if there is a message 'Import unsuccessful'?

Thanks!

Super User
Super User
Posts: 7,076

Re: Splitting a text using substr

In general if you want to do a quick ad hoc usage of an unknown CSV file then PROC IMPORT is pretty good, but know its limitations.  However if you are going to be doing this frequently and the format of the CSV is supposed to be standardized then you should read it in with a data step.  Especially if you will be combine the data with other data where the type and length of the variables could caused trouble.

For your original issue changing the program to use SUBPAD() function instead of SUBSTR() function should eliminate the errors about invalid argurments for SUBSTR().

Contributor
Posts: 52

Re: Splitting a text using substr

Well,SUBPAD() works!Thank you,data_null_.  Smiley Happy

Had a quick question while I change my codes. What is the purpose of the statement 'put (_all_)(=);' above in the SUBPAD() command?

Namrata

PROC Star
Posts: 7,492

Re: Splitting a text using substr

: I think data_null_ only used that to show you what the results looked like.

data_null_: Thanks!  I can't recall ever even seeing that function (i.e., subpad)

Contributor
Posts: 52

Re: Splitting a text using substr

Thanks Arthur! I never used the put statement and so was unsure Smiley Happy

Contributor
Posts: 52

Re: Splitting a text using substr

Tom

You are right. Since I was using Proc Import for a couple of datasets and now, am trying to vertically combine some of these processed datafiles, the error pops up: a couple of variables are numeric in some places while in others it is character.

Well, I see that I shall need to use a data step to import my data files Smiley Wink

Thanks everyone!!

You all are a great help Smiley Happy

Namrata

Frequent Contributor
Posts: 87

Re: Splitting a text using substr

Could also do it this way:

data want (keep = codes type_a type_b type_c type_d type_e) ;

length type_a type_b type_c type_d type_e $2 ;

set have ;

ExpressionID = prxparse('/\d\w/');

start = 1;

stop = length(codes);

array tcodes{5} $ type_a type_b type_c type_d type_e ;

call prxnext(ExpressionID, start, stop, codes, position, length);

do i = 1 to 5 ;

if position > 0 then do ;

tcodes{i} = substr(codes, position, length);

call prxnext(ExpressionID, start, stop, codes, position, length);

end ;

end;

run;

Super Contributor
Posts: 276

Re: Splitting a text using substr

Well,

Why don't we include Macros in this topic.Smiley Wink

Here is the way by using Macros,

Data A;

input Code $;

cards;

1A2B3C4D

2A3B4C

1A2B

;

run;

%MACRO t;

Data Aa;

Set A;

%do i = 1 %to 8 %by 2;

Var&i=Subpad(Code,&i,2);

%end;

run;

%mend;

%t;

Regards,

Sanjeev.K

Message was edited by: sajeev kuridi Substr function replae with Subpad.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 601 views
  • 8 likes
  • 8 in conversation