Hello,
I'm trying to import a CSV file. This is the first 5 rows with the header:
ACCTN,PURPOSE,VISIT_DATE,ENTITY,ID
12282,3,7/17/2012,5,API
20083,8,6/12/2012,5,LEO
20082,9,6/12/2012,5,LEO
20129,8,6/14/2012,5,LEO
20063,8,6/14/2012,5,LEO
This is my data step to bring it in:
DATA TEST;
INFILE "C:\test.csv" DLM=',' MISSOVER DSD FIRSTOBS=2;
INPUT
ACCTN $
PURPOSE $
VISIT_DATE $
ENTITY $
ID $;
RUN;
This is the output:
VISIT_DATE is cut off. I guess I don't understand how SAS is working here. I thought by specifying the delimiter character it would give the VISIT_DATE field a variable length but it always cuts it off at 8 length. When I try changing the length to 10 it moves past the delimiter into the other field. So I guess SAS can only read fixed width files? What am I missing here?
@tomcmacdonald wrote:
That works too. I guess the semicolon is a modifier to the length portion of the statement. I sure do wish the documentation specified this.
http://support.sas.com/documentation/cdl/en/hostwin/69955/HTML/default/viewer.htm#chifoptfmain.htm
If you use formatted mode input like
input visit_date mmddyy10. ;
then SAS will read EXACTLY the number of characters you specify in the informat.
If you use list mode input
input var1 var2 var3 ;
then SAS will use the number of characters that exist in the data line between the two delimiters.
You have used a mixed mode that is a combination of reading some variables using list mode and one variable using formattted mode.
Using the colon modifier allows you to specify an informat in the INPUT statement and still parse the line using normal list mode style. This is called modified list mode.
You could also just remove the informat from the INPUT statement and instead use an INFORMAT statement (or ATTRIB) statement to tell SAS what informat to use when reading the variable.
In general you will get better results if you define your variables before you use them. That is, do not force SAS to guess from how you first use the variable, whether in an INPUT, FORMAT, INFORMAT and assignment statement what type and length your variable should have.
So your program look like this instead.
data test;
infile "c:\test.csv" dsd truncover firstobs=2;
length acctn $8 purpose $8 visit_date 8 entity $8 id $8;
input acctn--id;
informat visit_date mmddyy. ;
format visit_date yymmdd10. ;
run;
Obs acctn purpose visit_date entity id 1 12282 3 2012-07-17 5 API 2 20083 8 2012-06-12 5 LEO 3 20082 9 2012-06-12 5 LEO 4 20129 8 2012-06-14 5 LEO 5 20063 8 2012-06-14 5 LEO
Adding a length statement seems to do the trick
DATA TEST; INFILE 'C:\test.csv' DLM=',' MISSOVER DSD FIRSTOBS=2; LENGTH VISIT_DATE $10; INPUT ACCTN $ PURPOSE $ VISIT_DATE $ ENTITY $ ID $; RUN;
Without specific guidance from you SAS is assuming all the character vars are length $8. You could change the
VISIT_DATE $
line to
VISIT_DATE :$10.
telling SAS that visitdate can be up to 10 characters long.
But you might want to convert it to a true date value, in which case use
VISIT_DATE :mmddyy10.
accompanied elsewhere by a
format visit_date date9. ;
statement
That works too. I guess the semicolon is a modifier to the length portion of the statement. I sure do wish the documentation specified this.
http://support.sas.com/documentation/cdl/en/hostwin/69955/HTML/default/viewer.htm#chifoptfmain.htm
@tomcmacdonald wrote:
That works too. I guess the semicolon is a modifier to the length portion of the statement. I sure do wish the documentation specified this.
http://support.sas.com/documentation/cdl/en/hostwin/69955/HTML/default/viewer.htm#chifoptfmain.htm
If you use formatted mode input like
input visit_date mmddyy10. ;
then SAS will read EXACTLY the number of characters you specify in the informat.
If you use list mode input
input var1 var2 var3 ;
then SAS will use the number of characters that exist in the data line between the two delimiters.
You have used a mixed mode that is a combination of reading some variables using list mode and one variable using formattted mode.
Using the colon modifier allows you to specify an informat in the INPUT statement and still parse the line using normal list mode style. This is called modified list mode.
You could also just remove the informat from the INPUT statement and instead use an INFORMAT statement (or ATTRIB) statement to tell SAS what informat to use when reading the variable.
In general you will get better results if you define your variables before you use them. That is, do not force SAS to guess from how you first use the variable, whether in an INPUT, FORMAT, INFORMAT and assignment statement what type and length your variable should have.
So your program look like this instead.
data test;
infile "c:\test.csv" dsd truncover firstobs=2;
length acctn $8 purpose $8 visit_date 8 entity $8 id $8;
input acctn--id;
informat visit_date mmddyy. ;
format visit_date yymmdd10. ;
run;
Obs acctn purpose visit_date entity id 1 12282 3 2012-07-17 5 API 2 20083 8 2012-06-12 5 LEO 3 20082 9 2012-06-12 5 LEO 4 20129 8 2012-06-14 5 LEO 5 20063 8 2012-06-14 5 LEO
@tomcmacdonald That's the documentation reference for the INFILE statement. The colon is used on the INPUT statement and it is documented under INPUT, LIST. I do kinda wish there weren't 5 (?) different pages for the INPUT statement though.
FYI - when all else fails with reading in a text file use PROC IMPORT with a large value for GUESSINGROWS and check the log for the code.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.