BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
actionjmanx
Calcite | Level 5

Hello, I've just started SAS Programming and I've run into major problems.

 

The data I am trying to bring in is a simple .csv file, named "aaaa.csv" (I changed it to make the name simpler).  It looks like this:

Took office,Left office 
30/04/1789,04/03/1797
04/03/1797,04/03/1801
04/03/1801,04/03/1809
04/03/1809,04/03/1817
04/03/1817,04/03/1825

 

 

There is more, the data is presidential entry and exit dates.

 

The code I am trying to use looks like this:

 

DATA presidents;
INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2;
INPUT TookOffice DDMMYY10. LeftOffice DDMMYY10.;
RUN;
TITLE "presidents.csv";
PROC PRINT Data=presidents;
RUN;

 

 

When I run the code, it seems to be unable to read the second date in each line.  The errors I get are:

 

 

NOTE: Invalid data for LeftOffice in line 2 11-20.
 RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                      
 2         30/04/1789:04/03/1797 21
 TookOffice=-62336 LeftOffice=. _ERROR_=1 _N_=1
 NOTE: Invalid data for LeftOffice in line 3 11-20.
 3         04/03/1797:04/03/1801 21
 TookOffice=-59471 LeftOffice=. _ERROR_=1 _N_=2
 NOTE: Invalid data for LeftOffice in line 4 11-20.
 4         04/03/1801:04/03/1809 21
 TookOffice=-58011 LeftOffice=. _ERROR_=1 _N_=3

 

 

 

I am worried about getting the dates to look like proper dates but my biggest concern at hand is getting the data to read at all.

 

All assistance is greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Add a colon to your input statement. i.e.:

DATA presidents;
  INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2;
  INPUT TookOffice DDMMYY10. LeftOffice : DDMMYY10.;
RUN;

Art, CEO, AnalystFinder.com

 

View solution in original post

8 REPLIES 8
Reeza
Super User

Your data is shown with a colon : in it, is it possible it's using that as a delimiter as well as using a comma?

 

 2         30/04/1789:04/03/1797 21
 TookOffice=-62336 LeftOffice=. _ERROR_=1 _N_=1 

 

Maybe try the following:

 

INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLM=',;' DSD FIRSTOBS=2;

 

If it doesn't work, please post your full log and/or post a sample text file so we can verify the contents. Your approach/code is correct for the data you've shown, but the log indicates the data doesn't align with what you've shown. 

 


@actionjmanx wrote:

Hello, I've just started SAS Programming and I've run into major problems.

 

The data I am trying to bring in is a simple .csv file, named "aaaa.csv" (I changed it to make the name simpler).  It looks like this:

Took office,Left office 
30/04/1789,04/03/1797
04/03/1797,04/03/1801
04/03/1801,04/03/1809
04/03/1809,04/03/1817
04/03/1817,04/03/1825

 

 

There is more, the data is presidential entry and exit dates.

 

The code I am trying to use looks like this:

 

DATA presidents;
INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2;
INPUT TookOffice DDMMYY10. LeftOffice DDMMYY10.;
RUN;
TITLE "presidents.csv";
PROC PRINT Data=presidents;
RUN;

 

 

When I run the code, it seems to be unable to read the second date in each line.  The errors I get are:

 

 

NOTE: Invalid data for LeftOffice in line 2 11-20.
 RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                      
 2         30/04/1789:04/03/1797 21
 TookOffice=-62336 LeftOffice=. _ERROR_=1 _N_=1
 NOTE: Invalid data for LeftOffice in line 3 11-20.
 3         04/03/1797:04/03/1801 21
 TookOffice=-59471 LeftOffice=. _ERROR_=1 _N_=2
 NOTE: Invalid data for LeftOffice in line 4 11-20.
 4         04/03/1801:04/03/1809 21
 TookOffice=-58011 LeftOffice=. _ERROR_=1 _N_=3

 

 

 

I am worried about getting the dates to look like proper dates but my biggest concern at hand is getting the data to read at all.

 

All assistance is greatly appreciated.


 

actionjmanx
Calcite | Level 5

I was switching back and forth between a comma and a colon because I thought the delimiter was causing problems.

 

Turns out changing it had no effect at all.

 

I also tried changing the delimiter to ",;" while using a comma and a colon in the data.  Neither method worked.

 

Full log is as follows:

 

 
 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 76         
 77         DATA presidents;
 78         INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2;
 79         INPUT TookOffice DDMMYY10. LeftOffice DDMMYY10.;
 80         RUN;
 
 NOTE: The infile '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' is:
       Filename=/folders/myshortcuts/SASScripts/RawData/aaaa.csv,
       Owner Name=root,Group Name=vboxsf,
       Access Permission=-rwxrwx---,
       Last Modified=14Jan2018:12:57:06,
       File Size (bytes)=1041
 
 NOTE: Invalid data for LeftOffice in line 2 11-20.
 RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                      
 2         30/04/1789,04/03/1797 21
 TookOffice=-62336 LeftOffice=. _ERROR_=1 _N_=1
 NOTE: Invalid data for LeftOffice in line 3 11-20.
 3         04/03/1797,04/03/1801 21
 TookOffice=-59471 LeftOffice=. _ERROR_=1 _N_=2
 NOTE: Invalid data for LeftOffice in line 4 11-20.
 4         04/03/1801,04/03/1809 21
 TookOffice=-58011 LeftOffice=. _ERROR_=1 _N_=3
 NOTE: Invalid data for LeftOffice in line 5 11-20.
 5         04/03/1809,04/03/1817 21
 TookOffice=-55089 LeftOffice=. _ERROR_=1 _N_=4
 NOTE: Invalid data for LeftOffice in line 6 11-20.
 6         04/03/1817,04/03/1825 21
 TookOffice=-52167 LeftOffice=. _ERROR_=1 _N_=5
 NOTE: Invalid data for LeftOffice in line 7 11-20.
 7         04/03/1825,04/03/1829 21
 TookOffice=-49245 LeftOffice=. _ERROR_=1 _N_=6
 NOTE: Invalid data for LeftOffice in line 8 11-20.
 8         04/03/1829,04/03/1837 21
 TookOffice=-47784 LeftOffice=. _ERROR_=1 _N_=7
 NOTE: Invalid data for LeftOffice in line 9 11-20.
 9         04/03/1837,04/03/1841 21
 TookOffice=-44862 LeftOffice=. _ERROR_=1 _N_=8
 NOTE: Invalid data for LeftOffice in line 10 11-20.
 10        04/03/1841,04/04/1841 21
 TookOffice=-43401 LeftOffice=. _ERROR_=1 _N_=9
 NOTE: Invalid data for LeftOffice in line 11 11-20.
 11        04/04/1841,04/03/1845 21
 TookOffice=-43370 LeftOffice=. _ERROR_=1 _N_=10
 NOTE: Invalid data for LeftOffice in line 12 11-20.
 12        04/03/1845,04/03/1849 21
 TookOffice=-41940 LeftOffice=. _ERROR_=1 _N_=11
 NOTE: Invalid data for LeftOffice in line 13 11-20.
 13        04/03/1849,09/07/1850 21
 TookOffice=-40479 LeftOffice=. _ERROR_=1 _N_=12
 NOTE: Invalid data for LeftOffice in line 14 11-20.
 14        09/07/1850,04/03/1853 21
 TookOffice=-39987 LeftOffice=. _ERROR_=1 _N_=13
 NOTE: Invalid data for LeftOffice in line 15 11-20.
 RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                      
 15        04/03/1853,04/03/1857 21
 TookOffice=-39018 LeftOffice=. _ERROR_=1 _N_=14
 NOTE: Invalid data for LeftOffice in line 16 11-20.
 16        04/03/1857,04/03/1861 21
 TookOffice=-37557 LeftOffice=. _ERROR_=1 _N_=15
 NOTE: Invalid data for LeftOffice in line 17 11-20.
 17        04/03/1861,15/04/1865 21
 TookOffice=-36096 LeftOffice=. _ERROR_=1 _N_=16
 NOTE: Invalid data for LeftOffice in line 18 11-20.
 18        15/04/1865,04/03/1869 21
 TookOffice=-34593 LeftOffice=. _ERROR_=1 _N_=17
 NOTE: Invalid data for LeftOffice in line 19 11-20.
 19        04/03/1869,04/03/1877 21
 TookOffice=-33174 LeftOffice=. _ERROR_=1 _N_=18
 NOTE: Invalid data for LeftOffice in line 20 11-20.
 20        04/03/1877,04/03/1881 21
 TookOffice=-30252 LeftOffice=. _ERROR_=1 _N_=19
 NOTE: Invalid data for LeftOffice in line 21 11-20.
 WARNING: Limit set by ERRORS= option reached.  Further errors of this type will not be printed.
 21        04/03/1881,19/09/1881 21
 TookOffice=-28791 LeftOffice=. _ERROR_=1 _N_=20
 NOTE: 44 records were read from the infile '/folders/myshortcuts/SASScripts/RawData/aaaa.csv'.
       The minimum record length was 21.
       The maximum record length was 21.
 NOTE: The data set WORK.PRESIDENTS has 44 observations and 2 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       cpu time            0.01 seconds
       
 
 81         TITLE "presidents.csv";
 82         PROC PRINT Data=presidents;
 83         RUN;
 
 NOTE: There were 44 observations read from the data set WORK.PRESIDENTS.
 NOTE: PROCEDURE PRINT used (Total process time):
       real time           0.10 seconds
       cpu time            0.09 seconds
       
 
 84         
 85         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 98         
Reeza
Super User

Did you use DLM or DLMSTR?

DLMSTR would have expected the delimiter to be the string of both comma and colon. 

DLM would recognize either as the delimiter. 

 

Another option to fix your issue is to use an INFORMAT statement instead of using the formats in the INPUT statement. 

This works fine.

 

data have;
    informat took_office left_office ddmmyy10.;
    format took_office left_office ddmmyy10.;
    infile cards dlm=',';
    input Took_office Left_office;
    cards;
30/04/1789,04/03/1797
04/03/1797,04/03/1801
04/03/1801,04/03/1809
04/03/1809,04/03/1817
04/03/1817,04/03/1825
;
run;

proc print data=have;
run;

 


@actionjmanx wrote:

I was switching back and forth between a comma and a colon because I thought the delimiter was causing problems.

 

Turns out changing it had no effect at all.

 

I also tried changing the delimiter to ",;" while using a comma and a colon in the data.  Neither method worked.

 

 

art297
Opal | Level 21

Add a colon to your input statement. i.e.:

DATA presidents;
  INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2;
  INPUT TookOffice DDMMYY10. LeftOffice : DDMMYY10.;
RUN;

Art, CEO, AnalystFinder.com

 

actionjmanx
Calcite | Level 5

I just tried this and it worked and was able to read the second date.

 

Thank you!

 

I'm curious as to why I need a colon on the second date but not the first.

art297
Opal | Level 21

In a non-answer to your question .. I don't know! It will work by using the colon modifier on the first field, the second field, or both fields. I've never read documentation that explains exactly how it works, thus I usually use the method that @Reeza suggested, namely using an informat statement. Hopefully, someone more knowledgeable than me will chime in with an answer.

 

That said, all four of the following will work:

DATA presidents;
/*   INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; */
  INFILE '/folders/myfolders/presidents.csv' DLMstr=',' FIRSTOBS=2;
  INPUT TookOffice : DDMMYY10. LeftOffice DDMMYY10.;
RUN;

DATA presidents;
/*   INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; */
  INFILE '/folders/myfolders/presidents.csv' DLMstr=',' FIRSTOBS=2;
  INPUT TookOffice : DDMMYY10. LeftOffice : DDMMYY10.;
RUN;

DATA presidents;
/*   INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; */
  INFILE '/folders/myfolders/presidents.csv' DLMstr=',' FIRSTOBS=2;
  INPUT TookOffice : DDMMYY10. LeftOffice : DDMMYY10.;
RUN;

DATA presidents;
/*   INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; */
  INFILE '/folders/myfolders/presidents.csv' DLMstr=',' FIRSTOBS=2;
  informat TookOffice LeftOffice DDMMYY10.;
  INPUT TookOffice LeftOffice;
RUN;

Art, CEO, AnalystFinder.com

 

novinosrin
Tourmaline | Level 20

Good afternoon @art297  and @actionjmanx,

 

I picked up this excerpt from Ron cody's book. Would this be of some help to understand better?

 

3.12 Using Informats with List Input
Suppose you have a blank- or comma-delimited file containing dates and character values longer than 8 bytes (or other values that require an informat). One way to provide informats with list input is to follow each variable name in your INPUT statement with a
44 Learning SAS by Example: A Programmer’s Guide


colon, followed by the appropriate informat. To see how this works, suppose you want to read this CSV file:
File: c:\books\learning\list.csv
"001","Christopher Mullens",11/12/1955,"$45,200" "002","Michelle Kwo",9/12/1955,"$78,123" "003","Roger W. McDonald",1/1/1960,"$107,200"

Variables in this file represent a subject number (Subj), Name, date of birth (DOB), and yearly salary (Salary). You need to supply informats for Name (length is greater than 8 bytes), DOB (you need a date informat here), and Salary (this is a nonstandard numeric value—with a dollar sign and commas). Program 3-11 shows one way to supply the appropriate informats for these variables.
Program 3-11 Using informats with list input

data list_example; infile 'c:\books\learning\list.csv' dsd; input Subj : $3. Name : $20. DOB : mmddyy10. Salary : dollar8.; format DOB date9. Salary dollar8.; run;

You see here that there is a colon preceding each informat. This colon (called an informat modifier) tells SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered. Do not forget the colons because without them SAS may read past a delimiter to satisfy the width specified in the informat.
This program would also work if the informat for Subj were omitted and the variable name was followed by a dollar sign (to signify that Subj is a character variable). However, the Subj variable would then be stored in 8 bytes (the default length for character variables with list input). By providing the $3. informat, you tell SAS to use 3 bytes to store this variable.

 

art297
Opal | Level 21

@novinosrin: Thanks, but I'd already read that, as well as the documentation. Neither provides (for me at least) enough info to let me know exactly why and how the colon modifier works. Without testing it, presume that the input file actually had three comma separated dates. Without testing it, would you expect the following to correctly input all three dates?:

DATA presidents;
   INFILE '/folders/myshortcuts/SASScripts/RawData/aaaa.csv' DLMSTR=',' FIRSTOBS=2; 
  INPUT TookOffice DDMMYY10. LeftOffice : DDMMYY10. thirddate ddmmyy10.;
RUN;

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 737 views
  • 0 likes
  • 4 in conversation