BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hello,

I'm trying to import a CSV file. This is the first 5 rows with the header:

ACCTN,PURPOSE,VISIT_DATE,ENTITY,ID
12282,3,7/17/2012,5,API
20083,8,6/12/2012,5,LEO
20082,9,6/12/2012,5,LEO
20129,8,6/14/2012,5,LEO
20063,8,6/14/2012,5,LEO

This is my data step to bring it in:

DATA TEST;
  INFILE "C:\test.csv" DLM=',' MISSOVER DSD FIRSTOBS=2;
  INPUT
    ACCTN $
    PURPOSE $
    VISIT_DATE $
    ENTITY $
    ID $;
RUN;

This is the output:

Untitled.png

 

VISIT_DATE is cut off.  I guess I don't understand how SAS is working here.  I thought by specifying the delimiter character it would give the VISIT_DATE field a variable length but it always cuts it off at 8 length.  When I try changing the length to 10 it moves past the delimiter into the other field.  So I guess SAS can only read fixed width files?  What am I missing here?

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@tomcmacdonald wrote:

That works too.  I guess the semicolon is a modifier to the length portion of the statement.  I sure do wish the documentation specified this.

 

http://support.sas.com/documentation/cdl/en/hostwin/69955/HTML/default/viewer.htm#chifoptfmain.htm


If you use formatted mode input like

input visit_date mmddyy10. ;

then SAS will read EXACTLY the number of characters you specify in the informat.

If you use list mode input

input var1 var2 var3 ;

then SAS will use the number of characters that exist in the data line between the two delimiters.

You have used a mixed mode that is a combination of reading some variables using list mode and one variable using formattted mode.

Using the colon modifier allows you to specify an informat in the INPUT statement and still parse the line using normal list mode style.  This is called modified list mode.

 

You could also just remove the informat from the INPUT statement and instead use an INFORMAT statement (or ATTRIB) statement to tell SAS what informat to use when reading the variable.

 

In general you will get better results if you define your variables before you use them. That is, do not force SAS to guess from how you first use the variable, whether in an INPUT, FORMAT, INFORMAT and assignment statement what type and length your variable should have.

 

So your program look like this instead.

 

data test;
  infile "c:\test.csv" dsd truncover firstobs=2;
  length acctn $8 purpose $8 visit_date 8 entity $8 id $8;
  input acctn--id;
  informat visit_date mmddyy. ;
  format visit_date yymmdd10. ;
run;
Obs    acctn    purpose    visit_date    entity    id

 1     12282       3       2012-07-17       5      API
 2     20083       8       2012-06-12       5      LEO
 3     20082       9       2012-06-12       5      LEO
 4     20129       8       2012-06-14       5      LEO
 5     20063       8       2012-06-14       5      LEO

View solution in original post

5 REPLIES 5
tomcmacdonald
Quartz | Level 8

Adding a length statement seems to do the trick

 

DATA TEST;
  INFILE 'C:\test.csv' DLM=',' MISSOVER DSD FIRSTOBS=2;
  LENGTH VISIT_DATE $10;
  INPUT
    ACCTN $
    PURPOSE $
    VISIT_DATE $
    ENTITY $
    ID $;
RUN;
mkeintz
PROC Star

Without specific guidance from you SAS is assuming all the character vars are length $8.  You could change the

    VISIT_DATE $

line to
   VISIT_DATE :$10.

 

telling SAS that visitdate can be up to 10 characters long.

 

But you might want to convert it to a true date value, in which case use

   VISIT_DATE :mmddyy10.

 

accompanied elsewhere by a

   format visit_date date9. ;

statement

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
tomcmacdonald
Quartz | Level 8

That works too.  I guess the semicolon is a modifier to the length portion of the statement.  I sure do wish the documentation specified this.

 

http://support.sas.com/documentation/cdl/en/hostwin/69955/HTML/default/viewer.htm#chifoptfmain.htm

Tom
Super User Tom
Super User

@tomcmacdonald wrote:

That works too.  I guess the semicolon is a modifier to the length portion of the statement.  I sure do wish the documentation specified this.

 

http://support.sas.com/documentation/cdl/en/hostwin/69955/HTML/default/viewer.htm#chifoptfmain.htm


If you use formatted mode input like

input visit_date mmddyy10. ;

then SAS will read EXACTLY the number of characters you specify in the informat.

If you use list mode input

input var1 var2 var3 ;

then SAS will use the number of characters that exist in the data line between the two delimiters.

You have used a mixed mode that is a combination of reading some variables using list mode and one variable using formattted mode.

Using the colon modifier allows you to specify an informat in the INPUT statement and still parse the line using normal list mode style.  This is called modified list mode.

 

You could also just remove the informat from the INPUT statement and instead use an INFORMAT statement (or ATTRIB) statement to tell SAS what informat to use when reading the variable.

 

In general you will get better results if you define your variables before you use them. That is, do not force SAS to guess from how you first use the variable, whether in an INPUT, FORMAT, INFORMAT and assignment statement what type and length your variable should have.

 

So your program look like this instead.

 

data test;
  infile "c:\test.csv" dsd truncover firstobs=2;
  length acctn $8 purpose $8 visit_date 8 entity $8 id $8;
  input acctn--id;
  informat visit_date mmddyy. ;
  format visit_date yymmdd10. ;
run;
Obs    acctn    purpose    visit_date    entity    id

 1     12282       3       2012-07-17       5      API
 2     20083       8       2012-06-12       5      LEO
 3     20082       9       2012-06-12       5      LEO
 4     20129       8       2012-06-14       5      LEO
 5     20063       8       2012-06-14       5      LEO
Reeza
Super User

@tomcmacdonald That's the documentation reference for the INFILE statement. The colon is used on the INPUT statement and it is documented under INPUT, LIST. I do kinda wish there weren't 5 (?) different pages for the INPUT statement though. 

 

http://support.sas.com/documentation/cdl/en/lestmtsref/69738/HTML/default/viewer.htm#n0lrz3gb7m9e4rn...

 

FYI - when all else fails with reading in a text file use PROC IMPORT with a large value for GUESSINGROWS and check the log for the code. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 870 views
  • 5 likes
  • 4 in conversation