Hello, I've just started SAS Programming and I've run into major problems.
The data I am trying to bring in is a simple .csv file, named "aaaa.csv" (I changed it to make the name simpler). It looks like this:
Took office,Left office 30/04/1789,04/03/1797 04/03/1797,04/03/1801 04/03/1801,04/03/1809 04/03/1809,04/03/1817 04/03/1817,04/03/1825
There is more, the data is presidential entry and exit dates.
The code I am trying to use looks like this:
DATA presidents; INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; INPUT TookOffice DDMMYY10. LeftOffice DDMMYY10.; RUN; TITLE "presidents.csv"; PROC PRINT Data=presidents; RUN;
When I run the code, it seems to be unable to read the second date in each line. The errors I get are:
NOTE: Invalid data for LeftOffice in line 2 11-20. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 2 30/04/1789:04/03/1797 21 TookOffice=-62336 LeftOffice=. _ERROR_=1 _N_=1 NOTE: Invalid data for LeftOffice in line 3 11-20. 3 04/03/1797:04/03/1801 21 TookOffice=-59471 LeftOffice=. _ERROR_=1 _N_=2 NOTE: Invalid data for LeftOffice in line 4 11-20. 4 04/03/1801:04/03/1809 21 TookOffice=-58011 LeftOffice=. _ERROR_=1 _N_=3
I am worried about getting the dates to look like proper dates but my biggest concern at hand is getting the data to read at all.
All assistance is greatly appreciated.
Add a colon to your input statement. i.e.:
DATA presidents; INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; INPUT TookOffice DDMMYY10. LeftOffice : DDMMYY10.; RUN;
Art, CEO, AnalystFinder.com
Your data is shown with a colon : in it, is it possible it's using that as a delimiter as well as using a comma?
2 30/04/1789:04/03/1797 21 TookOffice=-62336 LeftOffice=. _ERROR_=1 _N_=1
Maybe try the following:
INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLM=',;' DSD FIRSTOBS=2;
If it doesn't work, please post your full log and/or post a sample text file so we can verify the contents. Your approach/code is correct for the data you've shown, but the log indicates the data doesn't align with what you've shown.
@actionjmanx wrote:
Hello, I've just started SAS Programming and I've run into major problems.
The data I am trying to bring in is a simple .csv file, named "aaaa.csv" (I changed it to make the name simpler). It looks like this:
Took office,Left office 30/04/1789,04/03/1797 04/03/1797,04/03/1801 04/03/1801,04/03/1809 04/03/1809,04/03/1817 04/03/1817,04/03/1825
There is more, the data is presidential entry and exit dates.
The code I am trying to use looks like this:
DATA presidents; INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; INPUT TookOffice DDMMYY10. LeftOffice DDMMYY10.; RUN; TITLE "presidents.csv"; PROC PRINT Data=presidents; RUN;
When I run the code, it seems to be unable to read the second date in each line. The errors I get are:
NOTE: Invalid data for LeftOffice in line 2 11-20. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 2 30/04/1789:04/03/1797 21 TookOffice=-62336 LeftOffice=. _ERROR_=1 _N_=1 NOTE: Invalid data for LeftOffice in line 3 11-20. 3 04/03/1797:04/03/1801 21 TookOffice=-59471 LeftOffice=. _ERROR_=1 _N_=2 NOTE: Invalid data for LeftOffice in line 4 11-20. 4 04/03/1801:04/03/1809 21 TookOffice=-58011 LeftOffice=. _ERROR_=1 _N_=3
I am worried about getting the dates to look like proper dates but my biggest concern at hand is getting the data to read at all.
All assistance is greatly appreciated.
I was switching back and forth between a comma and a colon because I thought the delimiter was causing problems.
Turns out changing it had no effect at all.
I also tried changing the delimiter to ",;" while using a comma and a colon in the data. Neither method worked.
Full log is as follows:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 76 77 DATA presidents; 78 INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; 79 INPUT TookOffice DDMMYY10. LeftOffice DDMMYY10.; 80 RUN; NOTE: The infile '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' is: Filename=/folders/myshortcuts/SASScripts/RawData/aaaa.csv, Owner Name=root,Group Name=vboxsf, Access Permission=-rwxrwx---, Last Modified=14Jan2018:12:57:06, File Size (bytes)=1041 NOTE: Invalid data for LeftOffice in line 2 11-20. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 2 30/04/1789,04/03/1797 21 TookOffice=-62336 LeftOffice=. _ERROR_=1 _N_=1 NOTE: Invalid data for LeftOffice in line 3 11-20. 3 04/03/1797,04/03/1801 21 TookOffice=-59471 LeftOffice=. _ERROR_=1 _N_=2 NOTE: Invalid data for LeftOffice in line 4 11-20. 4 04/03/1801,04/03/1809 21 TookOffice=-58011 LeftOffice=. _ERROR_=1 _N_=3 NOTE: Invalid data for LeftOffice in line 5 11-20. 5 04/03/1809,04/03/1817 21 TookOffice=-55089 LeftOffice=. _ERROR_=1 _N_=4 NOTE: Invalid data for LeftOffice in line 6 11-20. 6 04/03/1817,04/03/1825 21 TookOffice=-52167 LeftOffice=. _ERROR_=1 _N_=5 NOTE: Invalid data for LeftOffice in line 7 11-20. 7 04/03/1825,04/03/1829 21 TookOffice=-49245 LeftOffice=. _ERROR_=1 _N_=6 NOTE: Invalid data for LeftOffice in line 8 11-20. 8 04/03/1829,04/03/1837 21 TookOffice=-47784 LeftOffice=. _ERROR_=1 _N_=7 NOTE: Invalid data for LeftOffice in line 9 11-20. 9 04/03/1837,04/03/1841 21 TookOffice=-44862 LeftOffice=. _ERROR_=1 _N_=8 NOTE: Invalid data for LeftOffice in line 10 11-20. 10 04/03/1841,04/04/1841 21 TookOffice=-43401 LeftOffice=. _ERROR_=1 _N_=9 NOTE: Invalid data for LeftOffice in line 11 11-20. 11 04/04/1841,04/03/1845 21 TookOffice=-43370 LeftOffice=. _ERROR_=1 _N_=10 NOTE: Invalid data for LeftOffice in line 12 11-20. 12 04/03/1845,04/03/1849 21 TookOffice=-41940 LeftOffice=. _ERROR_=1 _N_=11 NOTE: Invalid data for LeftOffice in line 13 11-20. 13 04/03/1849,09/07/1850 21 TookOffice=-40479 LeftOffice=. _ERROR_=1 _N_=12 NOTE: Invalid data for LeftOffice in line 14 11-20. 14 09/07/1850,04/03/1853 21 TookOffice=-39987 LeftOffice=. _ERROR_=1 _N_=13 NOTE: Invalid data for LeftOffice in line 15 11-20. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 15 04/03/1853,04/03/1857 21 TookOffice=-39018 LeftOffice=. _ERROR_=1 _N_=14 NOTE: Invalid data for LeftOffice in line 16 11-20. 16 04/03/1857,04/03/1861 21 TookOffice=-37557 LeftOffice=. _ERROR_=1 _N_=15 NOTE: Invalid data for LeftOffice in line 17 11-20. 17 04/03/1861,15/04/1865 21 TookOffice=-36096 LeftOffice=. _ERROR_=1 _N_=16 NOTE: Invalid data for LeftOffice in line 18 11-20. 18 15/04/1865,04/03/1869 21 TookOffice=-34593 LeftOffice=. _ERROR_=1 _N_=17 NOTE: Invalid data for LeftOffice in line 19 11-20. 19 04/03/1869,04/03/1877 21 TookOffice=-33174 LeftOffice=. _ERROR_=1 _N_=18 NOTE: Invalid data for LeftOffice in line 20 11-20. 20 04/03/1877,04/03/1881 21 TookOffice=-30252 LeftOffice=. _ERROR_=1 _N_=19 NOTE: Invalid data for LeftOffice in line 21 11-20. WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed. 21 04/03/1881,19/09/1881 21 TookOffice=-28791 LeftOffice=. _ERROR_=1 _N_=20 NOTE: 44 records were read from the infile '/folders/myshortcuts/SASScripts/RawData/aaaa.csv'. The minimum record length was 21. The maximum record length was 21. NOTE: The data set WORK.PRESIDENTS has 44 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds 81 TITLE "presidents.csv"; 82 PROC PRINT Data=presidents; 83 RUN; NOTE: There were 44 observations read from the data set WORK.PRESIDENTS. NOTE: PROCEDURE PRINT used (Total process time): real time 0.10 seconds cpu time 0.09 seconds 84 85 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 98
Did you use DLM or DLMSTR?
DLMSTR would have expected the delimiter to be the string of both comma and colon.
DLM would recognize either as the delimiter.
Another option to fix your issue is to use an INFORMAT statement instead of using the formats in the INPUT statement.
This works fine.
data have;
informat took_office left_office ddmmyy10.;
format took_office left_office ddmmyy10.;
infile cards dlm=',';
input Took_office Left_office;
cards;
30/04/1789,04/03/1797
04/03/1797,04/03/1801
04/03/1801,04/03/1809
04/03/1809,04/03/1817
04/03/1817,04/03/1825
;
run;
proc print data=have;
run;
@actionjmanx wrote:
I was switching back and forth between a comma and a colon because I thought the delimiter was causing problems.
Turns out changing it had no effect at all.
I also tried changing the delimiter to ",;" while using a comma and a colon in the data. Neither method worked.
Add a colon to your input statement. i.e.:
DATA presidents; INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; INPUT TookOffice DDMMYY10. LeftOffice : DDMMYY10.; RUN;
Art, CEO, AnalystFinder.com
I just tried this and it worked and was able to read the second date.
Thank you!
I'm curious as to why I need a colon on the second date but not the first.
In a non-answer to your question .. I don't know! It will work by using the colon modifier on the first field, the second field, or both fields. I've never read documentation that explains exactly how it works, thus I usually use the method that @Reeza suggested, namely using an informat statement. Hopefully, someone more knowledgeable than me will chime in with an answer.
That said, all four of the following will work:
DATA presidents; /* INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; */ INFILE '/folders/myfolders/presidents.csv' DLMstr=',' FIRSTOBS=2; INPUT TookOffice : DDMMYY10. LeftOffice DDMMYY10.; RUN; DATA presidents; /* INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; */ INFILE '/folders/myfolders/presidents.csv' DLMstr=',' FIRSTOBS=2; INPUT TookOffice : DDMMYY10. LeftOffice : DDMMYY10.; RUN; DATA presidents; /* INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; */ INFILE '/folders/myfolders/presidents.csv' DLMstr=',' FIRSTOBS=2; INPUT TookOffice : DDMMYY10. LeftOffice : DDMMYY10.; RUN; DATA presidents; /* INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; */ INFILE '/folders/myfolders/presidents.csv' DLMstr=',' FIRSTOBS=2; informat TookOffice LeftOffice DDMMYY10.; INPUT TookOffice LeftOffice; RUN;
Art, CEO, AnalystFinder.com
Good afternoon @art297 and @actionjmanx,
I picked up this excerpt from Ron cody's book. Would this be of some help to understand better?
3.12 Using Informats with List Input
Suppose you have a blank- or comma-delimited file containing dates and character values longer than 8 bytes (or other values that require an informat). One way to provide informats with list input is to follow each variable name in your INPUT statement with a
44 Learning SAS by Example: A Programmer’s Guide
colon, followed by the appropriate informat. To see how this works, suppose you want to read this CSV file:
File: c:\books\learning\list.csv
"001","Christopher Mullens",11/12/1955,"$45,200" "002","Michelle Kwo",9/12/1955,"$78,123" "003","Roger W. McDonald",1/1/1960,"$107,200"
Variables in this file represent a subject number (Subj), Name, date of birth (DOB), and yearly salary (Salary). You need to supply informats for Name (length is greater than 8 bytes), DOB (you need a date informat here), and Salary (this is a nonstandard numeric value—with a dollar sign and commas). Program 3-11 shows one way to supply the appropriate informats for these variables.
Program 3-11 Using informats with list input
data list_example; infile 'c:\books\learning\list.csv' dsd; input Subj : $3. Name : $20. DOB : mmddyy10. Salary : dollar8.; format DOB date9. Salary dollar8.; run;
You see here that there is a colon preceding each informat. This colon (called an informat modifier) tells SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered. Do not forget the colons because without them SAS may read past a delimiter to satisfy the width specified in the informat.
This program would also work if the informat for Subj were omitted and the variable name was followed by a dollar sign (to signify that Subj is a character variable). However, the Subj variable would then be stored in 8 bytes (the default length for character variables with list input). By providing the $3. informat, you tell SAS to use 3 bytes to store this variable.
@novinosrin: Thanks, but I'd already read that, as well as the documentation. Neither provides (for me at least) enough info to let me know exactly why and how the colon modifier works. Without testing it, presume that the input file actually had three comma separated dates. Without testing it, would you expect the following to correctly input all three dates?:
DATA presidents; INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; INPUT TookOffice DDMMYY10. LeftOffice : DDMMYY10. thirddate ddmmyy10.; RUN;
Art, CEO, AnalystFinder.com
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.