I am trying to use the SUBSTR function to parse out values in a string. There are 44 variables in my dataset that contain such strings. The strings are a maximum of 6 characters. Each value is 2 characters long. In other words, a value of 010101 has 3 distinct values (01, 01, 01).
I have this:
_400203 _400204 etc.
010001 000100
000001 0100
I want this:
VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11
01 00 01 00 00 01 00 01 00 01 00
Ultimately, I want to create 3 new variables for each string:
IF VAR1="01" THEN NEW_VAR1=1; ELSE NEW_VAR1=0;
IF VAR2="01" THEN NEW_VAR2=1; ELSE NEW_VAR2=0;
IF VAR3="01" THEN NEW_VAR3=1; ELSE NEW_VAR3=0;
I can run the code successfully for 1 variable, but not for more than one. I receive the following error: "The SUBSTR function call has too many arguments".
I am no expert in arrays, or in SAS in general...
This is the code I'm using just to run a test using 2 variables (6 total values):
DATA want; SET have
ARRAY array_name (*) $ 2 VAR1-VAR6; DO i = 1 to 5 by 2; array_name(i) = substr(_400203-_400204, i, 2); END;
DROP _400203 _400204 i; RUN;
This code works fine for one variable:
DATA want; SET have; ARRAY array_name (*) $ 2 VAR1-VAR3; DO i = 1 to 5 by 2; array_name(i) = substr(_400203, i, 2); END; DROP _400203 i; RUN;
I'd change the names to make them manageable later. Arrays is still a viable option:
data want;
set have;
array original {44} _400203 _400204 ....;
array new {44,3} $ 2 _400203_part1 - _400203_part3
_400204_part1 - _400204_part3
...;
do i=1 to 44;
do j=1 to 3;
new{i,j} = substr(original{i}, 2*j-1);
end;
end;
run;
It's still not clear if you want to copy two characters from the original variable, or if you want to set flags based on what you find there. But there are (as usual) plenty of ways to do it in SAS.
Technically, the SUBSTR function with no third parameter copies all remaining characters. But since the new variables are only two characters long, the program saves just two characters in each.
Good luck.
Hi @_maldini_,
Have you checked the documentation of the SUBSTR function? The first argument to the function "specifies a character constant, variable, or expression." So, there's no room for a variable list, unfortunately.
Also, an array named array_name cannot be referenced later by a different name such as answers.
I'm skeptical already about your code for one variable: If _400203='123456', VAR1 - VAR3 will receive the values '12', '23', '34', respectively, and not '12', '34', '56', as you might have intended. You may want to adapt the DO statement as follows:
do i=1 to 5 by 2;
Sorry @FreelanceReinh My mistake on the array name. I corrected it in the original post.
<do i=1 to 5 by 2;>
What does the 5 represent here? Should it be 6 since there are 6 digits/characters in your example?
Thanks very much for your help.
My suggestion with the DO statement focused on the SUBSTR function, whose second argument specifies the beginning character position of the substrings to be extracted. So, if you want to extract '12', '34' and '56' from the string '123456', those beginning character positions should be 1, 3 and 5. These are the values of the index variable of a DO loop from 1 to 5 with a step size of 2.
However, to assign these three substrings to array elements answers(1), answers(2), answers(3), the index of the array elements must be specified as follows:
answers((i+1)/2) = substr(_400203, i, 2);
First, a quick tip. If you have defined VAR1 as $2, you don't need the SUBSTR function. You could code:
var1 = _400203;
The big problem you have ias that you haven't created nearly enough variables. If VAR1 is to hold the first two characters of _400203, you can't reuse it to hold the first two characters of some other variables at the same time. You will have to devise a set of 3 names for each incoming variable. Once you have that approach in mind, arrays will be the key tool that processes many variables in exactly the same way.
<You will have to devise a set of 3 names for each incoming variable. >
That is what I was trying to do w/ VAR1, VAR2, VAR3 for _400203 and then VAR4, VAR5, VAR6 for _400204.
Is there a better approach?
Thanks for the help!
Can you post a sample of what your data looks like and what you want the output to look like?
Several different cases would be ideal.
@Reeza Added to original post. Thanks!
Are you expecting each of those var on their own line? Or all on one line?
@Reeza I'm not sure I understand your question. Could you please clarify?
Would the output be a single row with 11 different variables
or multiple rows with 2 variables?
You've listed your output as Var01-Var11, but structured them as rows...
@Reeza Sorry. Yes, a single row with 11 different variables.
As a brief aside, this code
IF VAR1=01 THEN NEW_VAR1=1; ELSE NEW_VAR1=0;
since mentioning substr probably was intended to be
IF VAR1='01' THEN NEW_VAR1=1; ELSE NEW_VAR1=0;
You can get equivalent behavior with
New_Var1 = (Var1 = '01');
Since SAS assigns a numeric value of 1 to a true comparison and 0 to a false comparison.
data have;
input _400203 $ _400204 $;
cards;
010001 000100
000001 0100
;
run;
data temp;
set have;
array x{*} $ _: ;
do group=1 to dim(x);
do i=1 to length(x{group}) by 2;
temp=substr(x{group},i,2);output;
end;
end;
keep group temp;
run;
proc sort data=temp; by group;run;
proc transpose data=temp out=want(drop=_:) prefix=var;
var temp;
run;
I'd change the names to make them manageable later. Arrays is still a viable option:
data want;
set have;
array original {44} _400203 _400204 ....;
array new {44,3} $ 2 _400203_part1 - _400203_part3
_400204_part1 - _400204_part3
...;
do i=1 to 44;
do j=1 to 3;
new{i,j} = substr(original{i}, 2*j-1);
end;
end;
run;
It's still not clear if you want to copy two characters from the original variable, or if you want to set flags based on what you find there. But there are (as usual) plenty of ways to do it in SAS.
Technically, the SUBSTR function with no third parameter copies all remaining characters. But since the new variables are only two characters long, the program saves just two characters in each.
Good luck.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.