Hi I am reading page 244 of the official specialist exam prep guide.
the book says the example is demonstrating reading with the mmddyy10 informats.
but in the code, there is no mmddyy10 anywhere in the code.
but when I ran the code as shown in the book, the result does indeed look like what in the book. So what is book is trying to say? MMDDYY10 is the default informat?
How should I change the code if I want to use DATE11 informat?
@Nietzsche wrote:
here is a copy of the new_hires.csv
can you just tell me the code if I want to use DATE11. I can not find anywhere in the book that show code to use informat in PROC IMPORT.
or do I have to use PROC IMPORT first, the change it within the DATA step?
You do not use informats with PROC IMPORT. PROC IMPORT generates code that uses informats. It GUESSES what informats to use based on what it sees in the text file.
If you want to control how the file is read just write your own data step to read the file. Since a CSV is a delimited text file you should read it with LIST MODE input style. In that style the width on any informat is IGNORED. So there is no difference between reading it using DATE. or DATE9. or DATE11. as the informat specification. SAS will match the width to the width of the next word it sees based on the delimiters.
But note that the text in that particular file is in the style MDY, so you must read it using the MMDDYY informat. THe DATE informat would not understand what those characters mean.
Of course you can attach the DATE11. format to the variable to have the values displayed in the style DD-MON-YYYY if you want. You could attach any of the many formats that know how to display date values once the variable has date values in it.
You will have to attach the format in a separate step from the PROC IMPORT step. Either another data step that copies the data, or just by using PROC DATASETS to modify the format attribute of the variable.
Note that if you write the data step yourself you can tell it to read the strings from the file using the MMDDYY. informat and also attach the DATE11. format to the variable.
Not sure what they mean. Did the book also show the actual text of the CSV file?
Since the code used PROC IMPORT to read the file then it was PROC IMPORT that made the decision that MMDDYY was the best informat to use for the example values of DATE_OF_BIRTH and HIRE_DATE that it saw in that particular CSV file.
I suspect the point they are trying to make is that the MMDDYY informat can be used to read strings that are in the style mm/dd/yyyy. As apposed to reading strings in some other style which would require a different informat. They are clearly NOT showing how YOU can use the MMDDYY informat since it is not in the code. I have no idea how advanced the book you are using is intended to be or how nuanced it wants to be in what it is trying to teach. They might have another section that is designed to teach about informats or about dates. If you want to learn more about informats then experiment with writing your own code.
You should also read about the DATESTYLE option to see how you can control what SAS does when the strings are ambiguous as to what date style they are using. It definitely impacts how the ANYDT... series of informats work. I am not sure whether it impacts the decisions that PROC IMPORT makes.
here is a copy of the new_hires.csv
can you just tell me the code if I want to use DATE11. I can not find anywhere in the book that show code to use informat in PROC IMPORT.
or do I have to use PROC IMPORT first, the change it within the DATA step?
Hello Nietsche!
Well ... if you use the proc import you should see some (equivalent) data step code being stated in the log using explicit informat statements.
You could use that instead to determine informats explicitely (i.e. your date11). Please have a look at the (probably) helpful examples in the documentation online.
As far as I know, there is no way to control the informat using the proc import directly.
--Fja
PS: I am suffering a bit from a hangover ... appologies for me English ...
@Nietzsche wrote:
here is a copy of the new_hires.csv
can you just tell me the code if I want to use DATE11. I can not find anywhere in the book that show code to use informat in PROC IMPORT.
or do I have to use PROC IMPORT first, the change it within the DATA step?
You do not use informats with PROC IMPORT. PROC IMPORT generates code that uses informats. It GUESSES what informats to use based on what it sees in the text file.
If you want to control how the file is read just write your own data step to read the file. Since a CSV is a delimited text file you should read it with LIST MODE input style. In that style the width on any informat is IGNORED. So there is no difference between reading it using DATE. or DATE9. or DATE11. as the informat specification. SAS will match the width to the width of the next word it sees based on the delimiters.
But note that the text in that particular file is in the style MDY, so you must read it using the MMDDYY informat. THe DATE informat would not understand what those characters mean.
Of course you can attach the DATE11. format to the variable to have the values displayed in the style DD-MON-YYYY if you want. You could attach any of the many formats that know how to display date values once the variable has date values in it.
You will have to attach the format in a separate step from the PROC IMPORT step. Either another data step that copies the data, or just by using PROC DATASETS to modify the format attribute of the variable.
Note that if you write the data step yourself you can tell it to read the strings from the file using the MMDDYY. informat and also attach the DATE11. format to the variable.
There is no real reason to use PROC IMPORT to read a text. Especially as simple a file like that.
First look at the file.
Name,Hire Date,Company,Country,Date of Birth Gisela S. Santos,8/12/17,Pede Nunc Sed Limited,Micronesia,8/21/1971 Maxwell L. Cooley,9/4/17,A LLP,Somalia,4/30/1975 Thane P. Obrien,10/28/17,Consectetuer Limited,Jamaica,4/23/1988 Minerva C. Conley,1/5/18,Feugiat Tellus Lorem Institute,Fiji,2/18/1975
So there are only 5 variables.
So just start writing the code to read the file.
data want ;
infile 'new_hires.csv' dsd truncover firstobs=2;
You can then copy the first line and use it to generate names for the variables. You can use a LENGTH statement to define the variables.
length
Name $30
Hire_Date 8
Company $40
Country $30
Date_of_Birth 8
;
Now read in the values. You can add in-line informats in the INPUT statement if you want, just remember to prefix them with the colon modifier so that SAS will read the line in LIST MODE.
input Name Hire_Date :mmddyy. Company Country Date_of_Birth :mmddyy. ;
And finally attach any formats to variables that NEED them (SAS does not need to be given special instructions for how to display most variables).
format Hire_Date Date_of_Birth date11. ;
You could also attach any labels you want to the variables.
1292 data want ; 1293 infile 'c:\downloads\new_hires.csv' dsd truncover firstobs=2; 1294 length 1295 Name $30 1296 Hire_Date 8 1297 Company $40 1298 Country $30 1299 Date_of_Birth 8 1300 ; 1301 input Name Hire_Date :mmddyy. Company Country Date_of_Birth :mmddyy. ; 1302 format Hire_Date Date_of_Birth date11. ; 1303 run; NOTE: The infile 'c:\downloads\new_hires.csv' is: Filename=c:\downloads\new_hires.csv, RECFM=V,LRECL=32767,File Size (bytes)=15701, Last Modified=19Nov2022:12:41:59, Create Time=19Nov2022:12:41:59 NOTE: 100 records were read from the infile 'c:\downloads\new_hires.csv'. The minimum record length was 80. The maximum record length was 160. NOTE: The data set WORK.WANT has 100 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.01 seconds 1304 1305 proc print; 1306 run; NOTE: There were 100 observations read from the data set WORK.WANT. NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
I think the book has a typo, and means to say "... reading a CSV with dates in MMDDYY10. format."
They are saying the CSV has dates in mmddyy10 format in it, and the PROC IMPORT will recognize the values as dates and read them in as dates.
@Quentin wrote:
I think the book has a typo, and means to say "... reading a CSV with dates in MMDDYY10. format."
They are saying the CSV has dates in mmddyy10 format in it, and the PROC IMPORT will recognize the values as dates and read them in as dates.
That make sense. I prefer to reference the way the text looks as the STYLE of date (or pattern of date) used in the CSV file.
I avoid using the words FORMAT or INFORMAT for that meaning since those two words have a very specific meaning in SAS code.
A format is used to convert values to text. The format determines the style in which the value is displayed.
An informat is used to convert text to values. An informat supports reading text that match the styles or patterns that it understands.
I will update that in the errata thread.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.