Hi All,
I need help with splitting a string without a delimiter into multiples variables. So here is an example of the string that I have
ABC12345678910qwertyuiop
I want to split it into multiple variables, these variables have various predefined length themselves. Here is an example of the desired output:
Var1 Var2 Var3 Var4 .....
ABC 1234 5678910 qwert ....
I have the start position and length of each variable. So in the output above, Var2 starts from the 4th character and has 4 characters, Var3 starts from the 8th character and has 7 characters itself.
If all of your data look like that then use the information to create a data step to read the data from the columns. The numbers below are the start and end columns each variable occupies. This could well be a tad easier than pulling apart a string and adding additional bits to create variables of the proper type. The $ tells SAS that the data read is character.
data want; infile "filename"; input var1 $ 1-3 var2 4-7 var3 8-14 var4 $ 15-24 ; run;
If you have a document that tells you what columns the data occupy this is a nearly trivial exercise.
If you have dates or times then you likely want to specify an appropriate INFORMAT to tell SAS to read the data correctly. If you have values that are currency values like $12,345 then specify the comma or commax informat to read values correctly.
You can specify the format after the columns or in place of the column
It's possible that you already have an ideal solution in @ballardw's post.
It depends on what form your various pieces of information are in:
The original strings to be splitted are stored in a text file. I do have all the information about the data, in paper though.
It is something like this:
Item Field Size Position Format
1 Type 3 1-3 CHAR
2 ID 4 4-8 CHAR
3 Date 8 9-17 ZD
My assignment is to break down the string into these fields, I also have to assigned names to the variables as well as the original data does not have a header.
Thanks,
Given that the information is on paper, you will need to work with the paper-based information to create an INPUT statement. (Once you have the INPUT statement, building the rest of the DATA step is straightforward, but let me know if that is something you need help doing.)
Example based on the information you posted:
input @1 type $char3.
@4 ID $char4.
@5 Date ZD8.
;
You need to a bit about what goes into an INPUT statement. For example, if you were just reading 3 characters from columns 25 to 27 to assign to a numeric variable, you might be using:
input @25 numvar 3.;
You may need to understand what ZD means in your paper-based-instructions, and check the SAS instructions that would match. Here, I assumed that SAS would use its ZD informat, but you would need to check the details on that. (In the SAS documentation, look up INFORMATS.)
Assuming that the data you have is already in a SAS dataset or a DBMS table, the way to split such a string is the SUBSTR function, e.g.
Length
Var1 $3
Var2 $4
Var3 $7
;
Var1=String;
Var2=substr(String,4);
Var3=substr(String,8);
Note that when SAS copies strings, it just copies enough characters to fill up the receiving string, if that is shorter than the source. Which is why I just wrote "Var1=String" - when the length of Var1 is already set to 3 characters, only the first 3 characters of String will be written there. And this is also why I did not specify the length in the SUBSTR invocations (as in "Var2=substr(String,4,4)") - the length of the receiving variables is defined in the LENGTH statement. If you do not use the LENGTH statement first, all the variables declared by the SUBSTR function call will be 8. So it is better to declare the length first, and then call SUBSTR.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.