Hello everyone,
Was hoping you can help me out, I am a bit stuck with SAS informats, I've tried a lot of different combinations but not quite getting the right output. I am pretty much trying to get the below data into 4 columns. However, Company has blanks and some special characters, which is throwing all the inputs out.
Below is my code and only 17 observations:
DATA example;
INFORMAT id 5. Company $Comma5. Employees 4. Profit Dollar15.;
INPUT id Company Employees Profit;
FORMAT id 5. Company $Comma5. Employees 4. Profit Dollar15.;
DATALINES;
1 Kling Inc 240 $6641679.60
2 Thompson-Okuneva 348 $8235180.28
3 Ratke, Littel and Lind 496 $1426858.40
4 Crist, Bashirian and O'Connell 217 $2444880.75
5 Hermann, Simonis and Lockman 233 $6732864.52
6 Greenfelder and Sons 318 $3955691.84
7 Rempel-Herzog 210 $1356491.19
8 Simonis Group 216 $97393.42
9 Davis-Ziemann 45 $2078465.33
10 Konopelski, Wehner and Marquardt 444 $3095364.16
11 Halvorson-O'Hara 379 $9515982.36
12 Schimmel, Nitzsche and Sawayn 311 $8109720.46
13 Hahn Inc 498 $3683395.00
14 Connelly, Bashirian and Blanda 432 $2037320.33
15 Littel, Reynolds and Lueilwitz 12 $1585741.13
16 Wolf, Cruickshank and Kshlerin 42 $9032698.68
17 Kassulke, Runolfsdottir and Padberg 70 $9774043.93;
RUN;
My code gets the below, the spacing is throwing the inputs off.
Using the first observation as an example, if I drop Inc, everything will load correctly.
Any help greatly appreciated!
Thanks in advance
Cantonais
When items contain the delimiter, they should be enclosed in quotes, and the DSD option used in the DATALINES statement:
data example;
infile datalines dlm=" " dsd;
input id :5. Company :$30. Employees Profit :dollar15.;
format id 5. Company $Comma5. Employees 4. Profit Dollar15.;
datalines;
1 "Kling Inc" 240 $6641679.60
2 "Thompson-Okuneva" 348 $8235180.28
3 "Ratke, Littel and Lind" 496 $1426858.40
;
Note that a DATALINES block has to end with a single semicolon on its own line, and no RUN is needed, because the DATALINES is the last statement of a DATA step per definition.
You need to change how the data is on the line so that it can be parsed. You can
data example;
input id 1-4 Company $ 5-24 Employees 35-39 Profit comma15.;
format Profit dollar15.;
*---5---10----5---20----5---30----5---40----5---40 ;
datalines;
1 Kling Inc 240 $6641679.60
2 Thompson-Okuneva 348 $8235180.28
;
data example;
infile datalines dsd dlm='|' truncover;
input id Company :$40. Employees Profit :comma.;
format Profit dollar15.;
datalines;
1|Kling Inc|240|$6641679.60
2|Thompson-Okuneva|348|$8235180.28
;
data example;
infile datalines dsd dlm=' ' truncover;
input id Company :$40. Employees Profit :comma.;
format Profit dollar15.;
datalines;
1 "Kling Inc" 240 $6641679.60
2 Thompson-Okuneva 348 $8235180.28
;
data example;
input id Company &:$40. Employees Profit :comma.;
format Profit dollar15.;
datalines;
1 Kling Inc 240 $6641679.60
2 Thompson-Okuneva 348 $8235180.28
;
A couple of other notes.
Place the semi-colon that ends the in-line data after the last line of data.
If you put it on one of the lines of data it means that line will be ignored. If you are using a SAS aware editor the color highlighting should help you see that mistake:
You appear to be trying to use the INFORMAT statement as a way to define the variables. SAS will define the variables in the order it first sees them (at the least the first place it needs to know how they are defined). So using INFORMAT in this way is just using that side effect to define the variables. It is clearer to define the variables in advance using a LENGTH statement (or an ATTRIB statement with the LENGTH= attribute). In code above I have just let SAS define the variables as it sees them in the INPUT statement, but for the delimited examples I have use an in-line INFORMAT in the INPUT statement to give SAS a hint for how long to create the character variables because otherwise the default length would be $8. Remember to use : modifier before any informat specifications when reading delimited data (list mode input) so that SAS only reads the number of characters that they values uses on that line. Otherwise the informat might cause it to read too few or too many characters and then be out of position for when it tries to read the next field on the line.
Informats are instructions for how to convert text to values. SAS only needs you to tell it to use a specific informat in rare cases. The main place where you need informats is for reading DATE, TIME or DATETIME values. Another is, as in your example, when there are commas and dollar signs in a numeric field you will need to use the COMMA informat (you can also call the comma informat by its alias of dollar).
Formats are instructions for how to convert values to text. Again you rarely need to attach formats to your variables. The main exception is DATE, TIME and DATETIME values. Or if you want the values to print in a particular style, like the DOLLAR format in your example. But be careful with attaching formats to variables as it can cause unexpected behavior if the values are not compatible with the format attached to them. For example if any of the count of EMPLOYEES in your dataset was larger then 9,999 and you had attached the 4. format as in your example it would switch to use scientific notation.
264 data _null_; 265 Employees = 12345 ; 266 put Employees= 4.; 267 run; Employees=12E3
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.