BookmarkSubscribeRSS Feed
TPhung
Calcite | Level 5

Hi All,

 

I need help with splitting a string without a delimiter into multiples variables. So here is an example of the string that I have

ABC12345678910qwertyuiop

I want to split it into multiple variables, these variables have various predefined length themselves. Here is an example of the desired output:

Var1  Var2        Var3           Var4      .....

ABC 1234       5678910     qwert    ....

 

I have the start position and length of each variable. So in the output above, Var2 starts from the 4th character and has 4 characters, Var3 starts from the 8th character and has 7 characters itself.

 

 

5 REPLIES 5
ballardw
Super User

If all of your data look like that then use the information to create a data step to read the data from the columns. The numbers below are the start and end columns each variable occupies. This could well be a tad easier than pulling apart a string and adding additional bits to create variables of the proper type. The $ tells SAS that the data read is character.

data want;
   infile "filename";
   input 
      var1 $ 1-3
      var2 4-7
      var3 8-14
      var4 $ 15-24
   ;
run;

If you have a document that tells you what columns the data occupy this is a nearly trivial exercise.

 

If you have dates or times then you likely want to specify an appropriate INFORMAT to tell SAS to read the data correctly. If you have values that are currency values like $12,345 then specify the comma or commax informat to read values correctly.

You can specify the format after the columns or in place of the column

Astounding
PROC Star

It's possible that you already have an ideal solution in @ballardw's post.

 

It depends on what form your various pieces of information are in:

 

  • Your original string that needs to be split ... where is that string stored?  In what format (SAS data set? text?)
  • Your information about where each field starts and how long it is ... is that stored in a file or is that on paper?
  • Do you also have information about which fields are numeric and which are character? 
  • Similarly, do you have information about the proper variable name to use when reading in the data?
TPhung
Calcite | Level 5

The original strings to be splitted are stored in a text file. I do have all the information about the data, in paper though.

It is something like this:

Item    Field            Size          Position     Format

1         Type              3               1-3            CHAR

2          ID                  4               4-8            CHAR

3          Date             8                9-17          ZD

 

My assignment is to break down the string into these fields, I also have to assigned names to the variables as well as the original data does not have a header.

 

Thanks,

 

Astounding
PROC Star

Given that the information is on paper, you will need to work with the paper-based information to create an INPUT statement.  (Once you have the INPUT statement, building the rest of the DATA step is straightforward, but let me know if that is something you need help doing.)

 

Example based on the information you posted:

 

input @1 type   $char3.

      @4 ID     $char4.

      @5 Date   ZD8.

      ;

 

You need to a bit about what goes into an INPUT statement.  For example, if you were just reading 3 characters from columns 25 to 27 to assign to a numeric variable, you might be using:

 

input @25 numvar   3.;

 

You may need to understand what ZD means in your paper-based-instructions, and check the SAS instructions that would match.  Here, I assumed that SAS would use its ZD informat, but you would need to check the details on that.  (In the SAS documentation, look up INFORMATS.)

s_lassen
Meteorite | Level 14

Assuming that the data you have is already in a SAS dataset or a DBMS table, the way to split such a string is the SUBSTR function, e.g.

Length
  Var1 $3
  Var2 $4
  Var3 $7
  ;
Var1=String;
Var2=substr(String,4);
Var3=substr(String,8);

Note that when SAS copies strings, it just copies enough characters to fill up the receiving string, if that is shorter than the source. Which is why I just wrote "Var1=String" - when the length of Var1 is already set to 3 characters, only the first 3 characters of String will be written there. And this is also why I did not specify the length in the SUBSTR invocations (as in "Var2=substr(String,4,4)") - the length of the receiving variables is defined in the LENGTH statement. If you do not use the LENGTH statement first, all the variables declared by the SUBSTR function call will be 8. So it is better to declare the length first, and then call SUBSTR.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 4877 views
  • 2 likes
  • 4 in conversation