- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I am currently reading SAS Essential book by Elliot.
In one example (page 77) the FORMAT statement comes after the INPUT statement.
DATA MYDATA; INPUT @1 FNAME $11. @12 LNAME $12. @24 BDATE DATE9.; FORMAT BDATE WORDDATE12.; LABEL FNAME="First Name" LNAME="Last Name" BDATE="Birth Date"; DATALINES; Bill Smith 08JAN1952 Jane Jones 02FEB1953 Clyde York 23MAR1949 ; PROC PRINT LABEL; RUN;
if I change the FORMAT to before the INPUT statement, I will move the BDATE variable to column 1 instead of column 3.
Then in another example (page 79), the FORMAT statement comes before the INPUT statement.
data roomsize; format room $10.; input room $ W L; area = L*W; label L = "Length" W= "Width" Area = "Sq. Feet"; datalines; LIVING 14 22 DINING 14 12 BREAKFAST 10 12 KITCHEN 12 16 BEDROOM1 18 12 BEDROOM2 12 14 BEDROOM3 13 16 BATH1 8 12 BATH2 7 10 BATH3 6 8 GARAGE 23 24; run; proc print label; sum area; run;
If I change the FORMAT to after the INPUT statement, the room variable names will not be formatted.
So why questions is what is rule regarding to the location of the FORMAT statement in SAS? The book does not really explain it and the order of the FORMAT and the INPUT statement obvious have a visible effect on the result.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In your first code sample, you note that putting the FORMAT statement prior to the INPUT statement makes the formatted variable as column 1. That's because, no matter what order the source data is physically read, the SAS compiler (which prepares the subsequent actual executable computer instructions) will make provision for variables in the order they are mentioned in your SAS program code. Moving the FORMAT prior to the INPUT statement means variable BDATE is mentioned prior to all the other variables, and seen first by the compiler.
In the second sample, what do you mean by "If I change the FORMAT to after the INPUT statement, the room variable names will not be formatted.". I suspect you mean that that variable room is truncated from what you expected, so BREAKFAST is missing the last two characters. But if you do a proc contents on dataset ROOMSIZE, you will see that room actually is formatted to $10, but its storage length is only $8. That's because when the INPUT statement was encountered, SAS compiler assigned a default length of $8, because the compiler had not yet seen the subsequent FORMAT $10. statement implying a length of 10 bytes. Of course, you could leave the FORMAT to follow the input statement, if you inserted a LENGTH ROOM $10; statement prior to INPUT.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi:
I am not familiar with the book you're referencing. However, I fear that you are misunderstanding the purpose of the FORMAT statement. SAS builds the descriptor portion of the data set based on the references it finds in the program, working from top to bottom in the code during a "pre-processing" phase called compile time. The FORMAT statement in this program is ONLY impacting the order of variables as they are internally stored in the SAS data set that you are creating. Typically, the "housekeeping" statements like the FORMAT statement are either placed at the very top of the SAS program or the very bottom of the SAS program (in this case, before the DATALINES statement). But the internally stored order of the variables really doesn't matter until you run a PROC PRINT taking all the defaults. The default order that variables are displayed in PROC PRINT is the internally stored order of the variables. The reason you need the LABEL option in PROC PRINT is that the default for PROC PRINT is NOT to use the labels that you specify. So you are already overriding one of the PROC PRINT defaults. Why not learn about the VAR statement and then you don't have to worry about the placement of the FORMAT statement.
When the FORMAT statement for BDATE is before the INPUT statement in the program, then the BDATE variable would be listed and defined first in the descriptor portion of the data set. So if you have a PROC PRINT such as you show, when the FORMAT statement comes first, then you would probably see BDATE listed first in any PROC PRINT output. This is not something to stress about. You can display the variables in any order you want using a VAR statement in PROC PRINT. The internal storage order of the variables really is not something to worry about since it is easy to control. Here's the default ordering when you have the FORMAT statement after the INPUT statement:
And here's the ordering when you have the FORMAT statement first, but as you can see, the VAR statement allows you to control the order of the variables:
I hope this explains what I think you're seeing. And, yes, I changed the data in your program for my screen shots because I was curious to see whether there was a first name that was 11 characters long.
Cynthia
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you Cynthia for the detailed reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In your first code sample, you note that putting the FORMAT statement prior to the INPUT statement makes the formatted variable as column 1. That's because, no matter what order the source data is physically read, the SAS compiler (which prepares the subsequent actual executable computer instructions) will make provision for variables in the order they are mentioned in your SAS program code. Moving the FORMAT prior to the INPUT statement means variable BDATE is mentioned prior to all the other variables, and seen first by the compiler.
In the second sample, what do you mean by "If I change the FORMAT to after the INPUT statement, the room variable names will not be formatted.". I suspect you mean that that variable room is truncated from what you expected, so BREAKFAST is missing the last two characters. But if you do a proc contents on dataset ROOMSIZE, you will see that room actually is formatted to $10, but its storage length is only $8. That's because when the INPUT statement was encountered, SAS compiler assigned a default length of $8, because the compiler had not yet seen the subsequent FORMAT $10. statement implying a length of 10 bytes. Of course, you could leave the FORMAT to follow the input statement, if you inserted a LENGTH ROOM $10; statement prior to INPUT.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you mketintz for the reply, I think your answer is the best in answering what I am seeking.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The first place you reference the variable is when it is added to the dataset.
A FORMAT statement has NOTHING to with defining the variable. The purpose of a FORMAT statement is to tell SAS how you want the variable to DISPLAY. Note that most variables do not need to have special formats attached to them. DATE, TIME and DATETIME variables are examples of variables where attaching a format is important as the raw numbers will be hard for humans to understand.
You can use the LENGTH statement, or ATTRIB statement with the LENGTH= option, to explicitly tell SAS how you want the variables defined.
Without that SAS will have to GUESS how you wanted to define the variable. It will use clues from how you referenced the variable to guide its guess. Like did you use it with a format, perhaps in a FORMAT statement or an INPUT statement? If so it will guess that you wanted the type to match the type of the format. For character variables it will guess that you wanted the length to match the display width of the format (which is not always a good idea).
If you don't give it any information about how long the character variable should be it will default it to a length of 8 bytes. Like in this INPUT statement:
input room $ W L;
If that is the first place is sees those three variables then ROOM will be defined as character with a length of 8 and and W and L will be defined as numeric.
If you have properly defined the variables first then it does not matter whether you place the FORMAT statement of the INPUT statement first.
So your programs should look something like this:
data mydata;
length fname $11 lname $12 bdate 8 ;
input fname 1-11 lname 12-23 bdate date9.;
format bdate worddate12.;
label
fname="First Name"
lname="Last Name"
bdate="Birth Date"
;
datalines;
Bill Smith 08JAN1952
Jane Jones 02FEB1953
Clyde York 23MAR1949
;
data roomsize;
length room $10 w l area 8;
input room W L;
area = L*W;
label L = "Length" W= "Width" Area = "Sq. Feet";
datalines;
LIVING 14 22
DINING 14 12
BREAKFAST 10 12
KITCHEN 12 16
BEDROOM1 18 12
BEDROOM2 12 14
BEDROOM3 13 16
BATH1 8 12
BATH2 7 10
BATH3 6 8
GARAGE 23 24
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you Tom for the detailed reply, I will learn about the LENGTH statement.