BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
namrata
Fluorite | Level 6

Hi

I have a column of codes containing alphanumeric characters:

Code

1A2B3C4D

2B3C3D

1A1G2C2B3A

and so on....

I need to split the code into 2-character variables thus:

Code                    Type_a     Type_b     Type_c     Type_d

1A2B3C4D          1A               2B          3C              4D

2B3C3D               2B               3C          3D

This is the code I am using:

data f12;

set f12;

type_a=substr(code,1,2);    

type_b=substr(code,3,2);

type_c=substr(code,5,2);

type_d=substr(code,7,2);

type_e=substr(code,9,2);

run;

The code runs but returns this error: NOTE: Invalid second argument to function SUBSTR at line 2007 column 8.

Each time, I run the code, the error pertains to a different line/row.

I thought I could use substrn to avoid the problem but I was testing the difference in output with both substr and substrn but the results come off the same.

Should I use the Strip or Trim function as in: type_a=substr(trim(code),1,2)?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

SUBPAD

18         data _null_;
19            a='abc';
20            x = subpad(a,3,2);
21            put (_all_)(=);
22            run;

a=abc x=c

View solution in original post

16 REPLIES 16
Reeza
Super User

On your second observation, when you run type_d, the values to substr are invalid because they're out of bounds, the string is too short.

I suggest a loop instead.

data have;

length code $10.;

input code;

cards;

1A2B3C4D

2B3C3D

1A1G2C2B3A

;

run;

data want;

set have;

array type(5) $ type_a type_b type_c type_d type_e;

i=1; j=1;

do while(j-1<length(code));

type(i)=substr(code, j, 2);

i+1;

j+2;

end;

run;

Tom
Super User Tom
Super User

Your problem is that the variable CODE must have been defined as length $9.   So you cannot use SUBSTR(CODE,9,2) because 2 is an impossible length when starting on the last character.

One way to avoid having to specify the length is to let SAS do the truncation by defining the target variables as length $2.

data f12;

  set f12;

   array type $2 type1-type5;

   do i=1 to 5 while (2*i-1 <= length(code));

      type(i) = substr(code,2*i-1) ;

  end;

run;

namrata
Fluorite | Level 6

Thanks a lot,Reeze and Tom.

I have a quick question: I understand that the string in my variable' is too short for type_d for some of the observations.

However, why does the code run in some cases even when the string is too short- the error is returned for different rows every time I run the code?

Another question is I am trying to import a csv dataset that contains variables for address and dates too. I use Proc Import and the log shows the number of observations,rows/variables created in SAS. The dataset is therefore, created.The log however, also gives a message that the import is unsuccessful and when I scroll up, I find errors regarding invalid data for date. Is the message 'import unsuccessful' just because of those 'invalid data' errors? In that case, I can actually ignore the message since I shall anyway drop the variables for address and date.

Thanks a lot!

Reeza
Super User

The row reference that the error return are from the line number of the code submitted, not with your observations.  Since each code is submitted with new line numbers the location of the error changes.

Please post your new question as a different post.

Tom
Super User Tom
Super User

If you use PROC IMPORT to convert a CSV file to a SAS dataset it will have to GUESS what type of variable to define for each column. CSV files have no place store variable definitions.  So some CSV files the maximum length of the value in the "CODE" column might be 8 and in others it might be 10.  So sometimes PROC IMPORT will define it with length $8 and other times it will be defined as length $10.  (Not to mention when it only sees digits in the field so it defines CODE as number instead of character)

If your files have a regular structure then just write a data step to read the CSV file and then you have complete control over how the variables are defined.

data_null__
Jade | Level 19

SUBPAD

18         data _null_;
19            a='abc';
20            x = subpad(a,3,2);
21            put (_all_)(=);
22            run;

a=abc x=c
namrata
Fluorite | Level 6

Thanks, everyone! I really appreciate the feedback.

@Tom: You had explained the datastep/proc import in another post(below)-specify the length of the variables. Is that what you are suggesting that I should do?

https://communities.sas.com/thread/40451

In this case, I shall anyway drop the variables(with respect to which the errors are mentioned in SAS log).  Should I still be concerned if there is a message 'Import unsuccessful'?

Thanks!

Tom
Super User Tom
Super User

In general if you want to do a quick ad hoc usage of an unknown CSV file then PROC IMPORT is pretty good, but know its limitations.  However if you are going to be doing this frequently and the format of the CSV is supposed to be standardized then you should read it in with a data step.  Especially if you will be combine the data with other data where the type and length of the variables could caused trouble.

For your original issue changing the program to use SUBPAD() function instead of SUBSTR() function should eliminate the errors about invalid argurments for SUBSTR().

namrata
Fluorite | Level 6

Well,SUBPAD() works!Thank you,data_null_.  Smiley Happy

Had a quick question while I change my codes. What is the purpose of the statement 'put (_all_)(=);' above in the SUBPAD() command?

Namrata

art297
Opal | Level 21

: I think data_null_ only used that to show you what the results looked like.

data_null_: Thanks!  I can't recall ever even seeing that function (i.e., subpad)

namrata
Fluorite | Level 6

Thanks Arthur! I never used the put statement and so was unsure Smiley Happy

namrata
Fluorite | Level 6

Tom

You are right. Since I was using Proc Import for a couple of datasets and now, am trying to vertically combine some of these processed datafiles, the error pops up: a couple of variables are numeric in some places while in others it is character.

Well, I see that I shall need to use a data step to import my data files Smiley Wink

Thanks everyone!!

You all are a great help Smiley Happy

Namrata

SteveNZ
Obsidian | Level 7

Could also do it this way:

data want (keep = codes type_a type_b type_c type_d type_e) ;

length type_a type_b type_c type_d type_e $2 ;

set have ;

ExpressionID = prxparse('/\d\w/');

start = 1;

stop = length(codes);

array tcodes{5} $ type_a type_b type_c type_d type_e ;

call prxnext(ExpressionID, start, stop, codes, position, length);

do i = 1 to 5 ;

if position > 0 then do ;

tcodes{i} = substr(codes, position, length);

call prxnext(ExpressionID, start, stop, codes, position, length);

end ;

end;

run;

kuridisanjeev
Quartz | Level 8

Well,

Why don't we include Macros in this topic.Smiley Wink

Here is the way by using Macros,

Data A;

input Code $;

cards;

1A2B3C4D

2A3B4C

1A2B

;

run;

%MACRO t;

Data Aa;

Set A;

%do i = 1 %to 8 %by 2;

Var&i=Subpad(Code,&i,2);

%end;

run;

%mend;

%t;

Regards,

Sanjeev.K

Message was edited by: sajeev kuridi Substr function replae with Subpad.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 4962 views
  • 8 likes
  • 8 in conversation