Splitting String into Variables without delimiter

Reply
New Contributor
Posts: 2

Splitting String into Variables without delimiter

Hi All,

 

I need help with splitting a string without a delimiter into multiples variables. So here is an example of the string that I have

ABC12345678910qwertyuiop

I want to split it into multiple variables, these variables have various predefined length themselves. Here is an example of the desired output:

Var1  Var2        Var3           Var4      .....

ABC 1234       5678910     qwert    ....

 

I have the start position and length of each variable. So in the output above, Var2 starts from the 4th character and has 4 characters, Var3 starts from the 8th character and has 7 characters itself.

 

 

Super User
Posts: 13,584

Re: Splitting String into Variables without delimiter

If all of your data look like that then use the information to create a data step to read the data from the columns. The numbers below are the start and end columns each variable occupies. This could well be a tad easier than pulling apart a string and adding additional bits to create variables of the proper type. The $ tells SAS that the data read is character.

data want;
   infile "filename";
   input 
      var1 $ 1-3
      var2 4-7
      var3 8-14
      var4 $ 15-24
   ;
run;

If you have a document that tells you what columns the data occupy this is a nearly trivial exercise.

 

If you have dates or times then you likely want to specify an appropriate INFORMAT to tell SAS to read the data correctly. If you have values that are currency values like $12,345 then specify the comma or commax informat to read values correctly.

You can specify the format after the columns or in place of the column

Super User
Posts: 6,788

Re: Splitting String into Variables without delimiter

It's possible that you already have an ideal solution in @ballardw's post.

 

It depends on what form your various pieces of information are in:

 

  • Your original string that needs to be split ... where is that string stored?  In what format (SAS data set? text?)
  • Your information about where each field starts and how long it is ... is that stored in a file or is that on paper?
  • Do you also have information about which fields are numeric and which are character? 
  • Similarly, do you have information about the proper variable name to use when reading in the data?
New Contributor
Posts: 2

Re: Splitting String into Variables without delimiter

Posted in reply to Astounding

The original strings to be splitted are stored in a text file. I do have all the information about the data, in paper though.

It is something like this:

Item    Field            Size          Position     Format

1         Type              3               1-3            CHAR

2          ID                  4               4-8            CHAR

3          Date             8                9-17          ZD

 

My assignment is to break down the string into these fields, I also have to assigned names to the variables as well as the original data does not have a header.

 

Thanks,

 

Super User
Posts: 6,788

Re: Splitting String into Variables without delimiter

Given that the information is on paper, you will need to work with the paper-based information to create an INPUT statement.  (Once you have the INPUT statement, building the rest of the DATA step is straightforward, but let me know if that is something you need help doing.)

 

Example based on the information you posted:

 

input @1 type   $char3.

      @4 ID     $char4.

      @5 Date   ZD8.

      ;

 

You need to a bit about what goes into an INPUT statement.  For example, if you were just reading 3 characters from columns 25 to 27 to assign to a numeric variable, you might be using:

 

input @25 numvar   3.;

 

You may need to understand what ZD means in your paper-based-instructions, and check the SAS instructions that would match.  Here, I assumed that SAS would use its ZD informat, but you would need to check the details on that.  (In the SAS documentation, look up INFORMATS.)

PROC Star
Posts: 266

Re: Splitting String into Variables without delimiter

Assuming that the data you have is already in a SAS dataset or a DBMS table, the way to split such a string is the SUBSTR function, e.g.

Length
  Var1 $3
  Var2 $4
  Var3 $7
  ;
Var1=String;
Var2=substr(String,4);
Var3=substr(String,8);

Note that when SAS copies strings, it just copies enough characters to fill up the receiving string, if that is shorter than the source. Which is why I just wrote "Var1=String" - when the length of Var1 is already set to 3 characters, only the first 3 characters of String will be written there. And this is also why I did not specify the length in the SUBSTR invocations (as in "Var2=substr(String,4,4)") - the length of the receiving variables is defined in the LENGTH statement. If you do not use the LENGTH statement first, all the variables declared by the SUBSTR function call will be 8. So it is better to declare the length first, and then call SUBSTR.

Ask a Question
Discussion stats
  • 5 replies
  • 177 views
  • 2 likes
  • 4 in conversation