BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_maldini_
Barite | Level 11

I am trying to use the SUBSTR function to parse out values in a string. There are 44 variables in my dataset that contain such strings. The strings are a maximum of 6 characters. Each value is 2 characters long. In other words, a value of 010101 has 3 distinct values (01, 01, 01).

 

I have this:

_400203                _400204      etc.

010001                    000100

000001                    0100

 

I want this: 

VAR1   VAR2    VAR3  VAR4   VAR5   VAR6   VAR7    VAR8  VAR9   VAR10  VAR11

   01       00         01     00        00          01        00         01       00        01         00

 

Ultimately, I want to create 3 new variables for each string:

IF VAR1="01" THEN NEW_VAR1=1; ELSE NEW_VAR1=0;
IF VAR2="01" THEN NEW_VAR2=1; ELSE NEW_VAR2=0;
IF VAR3="01" THEN NEW_VAR3=1; ELSE NEW_VAR3=0; 

 

I can run the code successfully for 1 variable, but not for more than one. I receive the following error: "The SUBSTR function call has too many arguments".

 

I am no expert in arrays, or in SAS in general...

 

This is the code I'm using just to run a test using 2 variables (6 total values):

 

DATA want;
    SET have
ARRAY array_name (*) $ 2 VAR1-VAR6; DO i = 1 to 5 by 2; array_name(i) = substr(_400203-_400204, i, 2); END;
DROP _400203 _400204 i; RUN;

 

This code works fine for one variable:

 

DATA want;
   SET have;

ARRAY array_name (*) $ 2 VAR1-VAR3;
DO i = 1 to 5 by 2;
array_name(i) = substr(_400203, i, 2);
END;

DROP 
_400203
i;
RUN; 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

I'd change the names to make them manageable later.  Arrays is still a viable option:

 

data want;

set have;

array original {44}  _400203 _400204 ....;

array new {44,3} $ 2 _400203_part1 - _400203_part3

                     _400204_part1 - _400204_part3

              ...;

do i=1 to 44;

   do j=1 to 3;

      new{i,j} = substr(original{i}, 2*j-1);

   end;

end;

run;

 

It's still not clear if you want to copy two characters from the original variable, or if you want to set flags based on what you find there.  But there are (as usual) plenty of ways to do it in SAS.

 

Technically, the SUBSTR function with no third parameter copies all remaining characters.  But since the new variables are only two characters long, the program saves just two characters in each.

 

Good luck.

View solution in original post

31 REPLIES 31
FreelanceReinh
Jade | Level 19

Hi @_maldini_,

 

Have you checked the documentation of the SUBSTR function? The first argument to the function "specifies a character constant, variable, or expression." So, there's no room for a variable list, unfortunately.

 

Also, an array named array_name cannot be referenced later by a different name such as answers.

 

I'm skeptical already about your code for one variable: If _400203='123456', VAR1 - VAR3 will receive the values '12', '23', '34', respectively, and not '12', '34', '56', as you might have intended. You may want to adapt the DO statement as follows:

do i=1 to 5 by 2;
_maldini_
Barite | Level 11

Sorry @FreelanceReinh My mistake on the array name. I corrected it in the original post. 

 

<do i=1 to 5 by 2;>

 

What does the 5 represent here? Should it be 6 since there are 6 digits/characters in your example?

 

Thanks very much for your help.

FreelanceReinh
Jade | Level 19

My suggestion with the DO statement focused on the SUBSTR function, whose second argument specifies the beginning character position of the substrings to be extracted. So, if you want to extract '12', '34' and '56' from the string '123456', those beginning character positions should be 1, 3 and 5. These are the values of the index variable of a DO loop from 1 to 5 with a step size of 2.

 

However, to assign these three substrings to array elements answers(1), answers(2), answers(3), the index of the array elements must be specified as follows:

answers((i+1)/2) = substr(_400203, i, 2);
Astounding
PROC Star

First, a quick tip.  If you have defined VAR1 as $2, you don't need the SUBSTR function.  You could code:

 

var1 = _400203;

 

The big problem you have ias that you haven't created nearly enough variables.  If VAR1 is to hold the first two characters of _400203, you can't reuse it to hold the first two characters of some other variables at the same time.  You will have to devise a set of 3 names for each incoming variable.  Once you have that approach in mind, arrays will be the key tool that processes many variables in exactly the same way.

_maldini_
Barite | Level 11

@Astounding

 

<You will have to devise a set of 3 names for each incoming variable. >

 

That is what I was trying to do w/ VAR1, VAR2, VAR3 for _400203 and then VAR4, VAR5, VAR6 for _400204.

 

Is there a better approach?

 

Thanks for the help!

Reeza
Super User

Can you post a sample of what your data looks like and what you want the output to look like? 

Several different cases would be ideal.

_maldini_
Barite | Level 11

@Reeza Added to original post. Thanks!

Reeza
Super User

Are you expecting each of those var on their own line? Or all on one line?

_maldini_
Barite | Level 11

@Reeza I'm not sure I understand your question. Could you please clarify?

 

Reeza
Super User

Would the output be a single row with 11 different variables

or multiple rows with 2 variables?

 

You've listed your output as Var01-Var11, but structured them as rows...

_maldini_
Barite | Level 11

@Reeza Sorry. Yes, a single row with 11 different variables.

ballardw
Super User

As a brief aside, this code

IF VAR1=01 THEN NEW_VAR1=1; ELSE NEW_VAR1=0; 

since mentioning substr probably was intended to be

IF VAR1='01' THEN NEW_VAR1=1; ELSE NEW_VAR1=0;

 

You can get equivalent behavior with

 

New_Var1 = (Var1 = '01');

 

Since SAS assigns a numeric value of 1 to a true comparison and 0 to a false comparison.

 

 

 

 

Ksharp
Super User
data have;
input _400203 $    _400204 $;
cards;
010001                    000100
000001                    0100
;
run;
data temp;
 set have;
 array x{*} $ _: ;
 do group=1 to dim(x);
  do i=1 to length(x{group}) by 2;
   temp=substr(x{group},i,2);output;
  end;
 end;
 keep group temp;
run;
proc sort data=temp; by group;run;
proc transpose data=temp out=want(drop=_:) prefix=var;
 var temp;
run;
Astounding
PROC Star

I'd change the names to make them manageable later.  Arrays is still a viable option:

 

data want;

set have;

array original {44}  _400203 _400204 ....;

array new {44,3} $ 2 _400203_part1 - _400203_part3

                     _400204_part1 - _400204_part3

              ...;

do i=1 to 44;

   do j=1 to 3;

      new{i,j} = substr(original{i}, 2*j-1);

   end;

end;

run;

 

It's still not clear if you want to copy two characters from the original variable, or if you want to set flags based on what you find there.  But there are (as usual) plenty of ways to do it in SAS.

 

Technically, the SUBSTR function with no third parameter copies all remaining characters.  But since the new variables are only two characters long, the program saves just two characters in each.

 

Good luck.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 31 replies
  • 4406 views
  • 17 likes
  • 6 in conversation