BookmarkSubscribeRSS Feed
cantonais
Calcite | Level 5

Hello everyone, 

Was hoping you can help me out, I am a bit stuck with SAS informats, I've tried a lot of different combinations but not quite getting the right output. I am pretty much trying to get the below data into 4 columns. However, Company has blanks and some special characters, which is throwing all the inputs out. 

Below is my code and only 17 observations:

DATA example;
INFORMAT id 5. Company $Comma5. Employees 4. Profit Dollar15.; 
INPUT id Company Employees Profit;
FORMAT id 5. Company $Comma5. Employees 4. Profit Dollar15.; 
DATALINES;
1 Kling Inc 240 $6641679.60
2 Thompson-Okuneva 348 $8235180.28
3 Ratke, Littel and Lind 496 $1426858.40
4 Crist, Bashirian and O'Connell 217 $2444880.75
5 Hermann, Simonis and Lockman 233 $6732864.52
6 Greenfelder and Sons 318 $3955691.84
7 Rempel-Herzog 210 $1356491.19
8 Simonis Group 216 $97393.42
9 Davis-Ziemann 45 $2078465.33
10 Konopelski, Wehner and Marquardt 444 $3095364.16
11 Halvorson-O'Hara 379 $9515982.36
12 Schimmel, Nitzsche and Sawayn 311 $8109720.46
13 Hahn Inc 498 $3683395.00
14 Connelly, Bashirian and Blanda 432 $2037320.33
15 Littel, Reynolds and Lueilwitz 12 $1585741.13
16 Wolf, Cruickshank and Kshlerin 42 $9032698.68
17 Kassulke, Runolfsdottir and Padberg 70 $9774043.93;
RUN;

 

My code gets the below, the spacing is throwing the inputs off.
Wrong Screen Grab.PNG

 

Using the first observation as an example, if I drop Inc, everything will load correctly.

Kling.PNG

 

Any help greatly appreciated! 

Thanks in advance 
Cantonais

2 REPLIES 2
Kurt_Bremser
Super User

When items contain the delimiter, they should be enclosed in quotes, and the DSD option used in the DATALINES statement:

data example;
infile datalines dlm=" " dsd;
input id :5. Company :$30. Employees Profit :dollar15.;
format id 5. Company $Comma5. Employees 4. Profit Dollar15.; 
datalines;
1 "Kling Inc" 240 $6641679.60
2 "Thompson-Okuneva" 348 $8235180.28
3 "Ratke, Littel and Lind" 496 $1426858.40
;

Note that a DATALINES block has to end with a single semicolon on its own line, and no RUN is needed, because the DATALINES is the last statement of a DATA step per definition.

Tom
Super User Tom
Super User

You need to change how the data is on the line so that it can be parsed.  You can 

  • Put the data in fixed columns

 

data example;
  input id 1-4 Company $ 5-24 Employees 35-39 Profit comma15.;
  format Profit dollar15.; 
*---5---10----5---20----5---30----5---40----5---40 ;
datalines;
1   Kling Inc           240 $6641679.60
2   Thompson-Okuneva    348 $8235180.28
;

 

  • Use a delimiter between the fields that does not appear in any of the field values

 

data example;
  infile datalines dsd dlm='|' truncover;
  input id Company :$40. Employees Profit :comma.;
  format Profit dollar15.; 
datalines;
1|Kling Inc|240|$6641679.60
2|Thompson-Okuneva|348|$8235180.28
;
  • Use quotes around the values that include delimiter or quotes (the default delimiter is a space, but when use DSD option default delimiter is a comma)
data example;
  infile datalines dsd dlm=' ' truncover;
  input id Company :$40. Employees Profit :comma.;
  format Profit dollar15.; 
datalines;
1 "Kling Inc" 240 $6641679.60
2 Thompson-Okuneva 348 $8235180.28
;
  • Put at least two delimiters after values that include the delimiter and use the & modifier in the INPUT statement. Note that the values cannot have two or more adjacent delimiters.
data example;
  input id Company &:$40. Employees Profit :comma.;
  format Profit dollar15.; 
datalines;
1 Kling Inc  240 $6641679.60
2 Thompson-Okuneva  348 $8235180.28
;

A couple of other notes.

Place the semi-colon that ends the in-line data after the last line of data.  

image.png

If you put it on one of the lines of data it means that line will be ignored. If you are using a SAS aware editor the color highlighting should help you see that mistake:

image.png

 

You appear to be trying to use the INFORMAT statement as a way to define the variables. SAS will define the variables in the order it first sees them (at the least the first place it needs to know how they are defined).  So using INFORMAT in this way is just using that side effect to define the variables. It is clearer to define the variables in advance using a LENGTH statement (or an ATTRIB statement with the LENGTH= attribute).  In code above I have just let SAS define the variables as it sees them in the INPUT statement, but for the delimited examples I have use an in-line INFORMAT in the INPUT statement to give SAS a hint for how long to create the character variables  because otherwise the default length would be $8. Remember to use : modifier before any informat specifications when reading delimited data (list mode input) so that SAS only reads the number of characters that they values uses on that line. Otherwise the informat might cause it to read too few or too many characters and then be out of position for when it tries to read the next field on the line.

 

Informats are instructions for how to convert text to values. SAS only needs you to tell it to use a specific informat in rare cases.  The main place where you need informats is for reading DATE, TIME or DATETIME values.  Another is, as in your example, when there are commas and dollar signs in a numeric field you will need to use the COMMA informat (you can also call the comma informat by its alias of dollar). 

 

Formats are instructions for how to convert values to text. Again you rarely need to attach formats to your variables. The main exception is DATE, TIME and DATETIME values.  Or if you want the values to print in a particular style, like the DOLLAR format in your example.  But be careful with attaching formats to variables as it can cause unexpected behavior if the values are not compatible with the format attached to them.  For example if any of the count of EMPLOYEES in your dataset was larger then 9,999 and you had attached the 4. format as in your example it would switch to use scientific notation.

264   data _null_;
265     Employees = 12345 ;
266     put Employees= 4.;
267   run;

Employees=12E3

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 526 views
  • 0 likes
  • 3 in conversation