Hi
I have a column of codes containing alphanumeric characters:
Code
1A2B3C4D
2B3C3D
1A1G2C2B3A
and so on....
I need to split the code into 2-character variables thus:
Code Type_a Type_b Type_c Type_d
1A2B3C4D 1A 2B 3C 4D
2B3C3D 2B 3C 3D
This is the code I am using:
data f12;
set f12;
type_a=substr(code,1,2);
type_b=substr(code,3,2);
type_c=substr(code,5,2);
type_d=substr(code,7,2);
type_e=substr(code,9,2);
run;
The code runs but returns this error: NOTE: Invalid second argument to function SUBSTR at line 2007 column 8.
Each time, I run the code, the error pertains to a different line/row.
I thought I could use substrn to avoid the problem but I was testing the difference in output with both substr and substrn but the results come off the same.
Should I use the Strip or Trim function as in: type_a=substr(trim(code),1,2)?
Thanks!
SUBPAD
On your second observation, when you run type_d, the values to substr are invalid because they're out of bounds, the string is too short.
I suggest a loop instead.
data have;
length code $10.;
input code;
cards;
1A2B3C4D
2B3C3D
1A1G2C2B3A
;
run;
data want;
set have;
array type(5) $ type_a type_b type_c type_d type_e;
i=1; j=1;
do while(j-1<length(code));
type(i)=substr(code, j, 2);
i+1;
j+2;
end;
run;
Your problem is that the variable CODE must have been defined as length $9. So you cannot use SUBSTR(CODE,9,2) because 2 is an impossible length when starting on the last character.
One way to avoid having to specify the length is to let SAS do the truncation by defining the target variables as length $2.
data f12;
set f12;
array type $2 type1-type5;
do i=1 to 5 while (2*i-1 <= length(code));
type(i) = substr(code,2*i-1) ;
end;
run;
Thanks a lot,Reeze and Tom.
I have a quick question: I understand that the string in my variable' is too short for type_d for some of the observations.
However, why does the code run in some cases even when the string is too short- the error is returned for different rows every time I run the code?
Another question is I am trying to import a csv dataset that contains variables for address and dates too. I use Proc Import and the log shows the number of observations,rows/variables created in SAS. The dataset is therefore, created.The log however, also gives a message that the import is unsuccessful and when I scroll up, I find errors regarding invalid data for date. Is the message 'import unsuccessful' just because of those 'invalid data' errors? In that case, I can actually ignore the message since I shall anyway drop the variables for address and date.
Thanks a lot!
The row reference that the error return are from the line number of the code submitted, not with your observations. Since each code is submitted with new line numbers the location of the error changes.
Please post your new question as a different post.
If you use PROC IMPORT to convert a CSV file to a SAS dataset it will have to GUESS what type of variable to define for each column. CSV files have no place store variable definitions. So some CSV files the maximum length of the value in the "CODE" column might be 8 and in others it might be 10. So sometimes PROC IMPORT will define it with length $8 and other times it will be defined as length $10. (Not to mention when it only sees digits in the field so it defines CODE as number instead of character)
If your files have a regular structure then just write a data step to read the CSV file and then you have complete control over how the variables are defined.
SUBPAD
Thanks, everyone! I really appreciate the feedback.
@Tom: You had explained the datastep/proc import in another post(below)-specify the length of the variables. Is that what you are suggesting that I should do?
https://communities.sas.com/thread/40451
In this case, I shall anyway drop the variables(with respect to which the errors are mentioned in SAS log). Should I still be concerned if there is a message 'Import unsuccessful'?
Thanks!
In general if you want to do a quick ad hoc usage of an unknown CSV file then PROC IMPORT is pretty good, but know its limitations. However if you are going to be doing this frequently and the format of the CSV is supposed to be standardized then you should read it in with a data step. Especially if you will be combine the data with other data where the type and length of the variables could caused trouble.
For your original issue changing the program to use SUBPAD() function instead of SUBSTR() function should eliminate the errors about invalid argurments for SUBSTR().
Well,SUBPAD() works!Thank you,data_null_.
Had a quick question while I change my codes. What is the purpose of the statement 'put (_all_)(=);' above in the SUBPAD() command?
Namrata
Thanks Arthur! I never used the put statement and so was unsure
Tom
You are right. Since I was using Proc Import for a couple of datasets and now, am trying to vertically combine some of these processed datafiles, the error pops up: a couple of variables are numeric in some places while in others it is character.
Well, I see that I shall need to use a data step to import my data files
Thanks everyone!!
You all are a great help
Namrata
Could also do it this way:
data want (keep = codes type_a type_b type_c type_d type_e) ;
length type_a type_b type_c type_d type_e $2 ;
set have ;
ExpressionID = prxparse('/\d\w/');
start = 1;
stop = length(codes);
array tcodes{5} $ type_a type_b type_c type_d type_e ;
call prxnext(ExpressionID, start, stop, codes, position, length);
do i = 1 to 5 ;
if position > 0 then do ;
tcodes{i} = substr(codes, position, length);
call prxnext(ExpressionID, start, stop, codes, position, length);
end ;
end;
run;
Well,
Why don't we include Macros in this topic.
Here is the way by using Macros,
Data A;
input Code $;
cards;
1A2B3C4D
2A3B4C
1A2B
;
run;
%MACRO t;
Data Aa;
Set A;
%do i = 1 %to 8 %by 2;
Var&i=Subpad(Code,&i,2);
%end;
run;
%mend;
%t;
Regards,
Sanjeev.K
Message was edited by: sajeev kuridi Substr function replae with Subpad.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.