Got confused here while learning how informat works in reading raw data files. Could any expert give a hint?
Raw data in a txt file:
Here is what I tried first.
input city $7. visit;
This landed me with only the first row plus an error message:
SAS went to a new line when INPUT statement reached past the end of a line.
Could someone help me to understand why would SAS reach past the end of the first line at all? I know if "LA" is "Seattle" (a longer value), my code will work fine so I don't think the first ob is the problem. To stop SAS reaching past the end, I tried MISSOVER. This did bring in a second row with values missing. In the end, I had to use a colon modifier.
My SAS book says "The informat in modified list input determines only the length of the variable, not the number of columns that are read. " But in this case if I don't use an informat for city, the original code will work fine so I'm puzzled.
The problem is that you are NOT using the method the statement describes. For modified list input you would have to include a colon in your input statement. e.g.:
input city : $7. visit;
Otherwise, yes, SAS will read all seven characters for city, regardless of whether they include imbeded spaces or not. If you also have imbeded spaces, then you would also have to include the ampersand modifier.
Thanks to you both, Art and Ksharp. The recommended file is very easy to follow.
Just to confirm:
My SAS Certification book says 'The colon ( modifier is used to read nonstandard data values and character values that are longer than eight characters, but which contain no embedded blanks.' It almost makes me think if character values are shorter than 8 bytes and do not have embedded blanks, the modifiers are not necessary.
Now after our earlier discussion, it seems that I have to use : or & so SAS understands I'm not trying to read data with formatted input? I guess it's more of an issue when the variable starts from column 1 otherwise by looking at the pointer portion, it's not hard to tell input method.
>if character values are shorter than 8 bytes and do not have embedded blanks, the modifiers are not necessary.
That is only suitable for list input .that is mean list input ' input a $ ' is the same as modified input ' input a : $8.' .Once you use formatted then maybe you would use colon as Art mentioned ' input a : $7.'.
Thanks for the additional explanation! I noticed that whenever I use informat for character variable in the input statement, SAS will treat it as formatted input, unless I use colon. It makes a real difference when my values have different length.
Here's my quick test:
Andy Lee 150
Adam Jack 200
Mary Jacob 300
When I test Input Firstname$ Lastname $5. Hours; the lastname for the first row shows Lee 1 (because SAS grabbed 5 characters) and the value for hours = 50, even though I "think" my code is using modified list input and I was hoping SAS stops reading for my lastname field when it encounters the space after Lee. In short, have to use the colon to get the data in right. Hope this helps the next puzzled student.
Note that, as it says at the end of the topic -- SAS read formatted input until it has read the number of positions specified by the INFORMAT. In your case, $5. was considered to be the INFORMAT for the LASTNAME field.
So when you had
input firstname $ lastname $5. ... ;
You started with simple list input for FIRSTNAME, and then, switched to formatted input for LASTNAME. I'm not sure how you read HOURS -- with formatted or list input.
You can use a LENGTH statement with simple list input to specify the maximum length for a character variable that you are going to read with list input. If you specify the length, then list input will read until it hits a delimiter (the default delimiter for list input is a space or blank -- although, you can change it with the DLM option).
The program below reads your data without using the colon modifier.
length firstname $8 lastname $15;
input firstname $ lastname $ hours;
Andy Lee 150
Adam Jack 200
Mary Jacob 300
John Jingleheimer 400
proc contents data=hourdata;
title 'PROC CONTENTS';
proc print data=hourdata;
title 'PROC PRINT';
>whenever I use informat for character variable in the input statement, SAS will treat it as formatted input, unless I use colon. It makes a real difference when my values have different length.
Yes. you are right. But the length of variable is the length you defined in format, that is not different length when you use colon in your input statement.There is an important thing you need to remember(i.e. when you use length statement and informat statement before input statement, input method is identical with colon input method just as Cynthia mentioned),and once the character varible enters the PDV ,its length will not allow to change afer data step.
Now let's take a look at your example.
In your code ,'hours'is list input, 'Firstname' is list input(which has eight length sas default),'Lastname' is formatted input ( which will ignore the delimiter such as blank, and input until the fifth character, So you will get 'Lee 1' not 'Lee', you should add colon before $5. such as : $5.), The colon in ' : $5. ' will stop read the data when encounter delimiter ( blank and so on),but the length of variable is still 5.
Hope this will help you a little bit.
Cynthia gives some value reference about it.
In SAS ,there are four input method : list input, formatted input, column input , named input.
The difference you refer to is between list input and formatted input.Art has some details for it.
So If you understand these four input way,then will process complicated data perfently.