DATA Step, Macro, Functions and more

Getting hung up on a trivial CSV import

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Getting hung up on a trivial CSV import

[ Edited ]

Hello,

I'm trying to import a CSV file. This is the first 5 rows with the header:

ACCTN,PURPOSE,VISIT_DATE,ENTITY,ID
12282,3,7/17/2012,5,API
20083,8,6/12/2012,5,LEO
20082,9,6/12/2012,5,LEO
20129,8,6/14/2012,5,LEO
20063,8,6/14/2012,5,LEO

This is my data step to bring it in:

DATA TEST;
  INFILE "C:\test.csv" DLM=',' MISSOVER DSD FIRSTOBS=2;
  INPUT
    ACCTN $
    PURPOSE $
    VISIT_DATE $
    ENTITY $
    ID $;
RUN;

This is the output:

Untitled.png

 

VISIT_DATE is cut off.  I guess I don't understand how SAS is working here.  I thought by specifying the delimiter character it would give the VISIT_DATE field a variable length but it always cuts it off at 8 length.  When I try changing the length to 10 it moves past the delimiter into the other field.  So I guess SAS can only read fixed width files?  What am I missing here?

 


Accepted Solutions
Solution
‎09-27-2017 10:39 AM
Super User
Super User
Posts: 7,050

Re: Getting hung up on a trivial CSV import

Posted in reply to tomcmacdonald

tomcmacdonald wrote:

That works too.  I guess the semicolon is a modifier to the length portion of the statement.  I sure do wish the documentation specified this.

 

http://support.sas.com/documentation/cdl/en/hostwin/69955/HTML/default/viewer.htm#chifoptfmain.htm


If you use formatted mode input like

input visit_date mmddyy10. ;

then SAS will read EXACTLY the number of characters you specify in the informat.

If you use list mode input

input var1 var2 var3 ;

then SAS will use the number of characters that exist in the data line between the two delimiters.

You have used a mixed mode that is a combination of reading some variables using list mode and one variable using formattted mode.

Using the colon modifier allows you to specify an informat in the INPUT statement and still parse the line using normal list mode style.  This is called modified list mode.

 

You could also just remove the informat from the INPUT statement and instead use an INFORMAT statement (or ATTRIB) statement to tell SAS what informat to use when reading the variable.

 

In general you will get better results if you define your variables before you use them. That is, do not force SAS to guess from how you first use the variable, whether in an INPUT, FORMAT, INFORMAT and assignment statement what type and length your variable should have.

 

So your program look like this instead.

 

data test;
  infile "c:\test.csv" dsd truncover firstobs=2;
  length acctn $8 purpose $8 visit_date 8 entity $8 id $8;
  input acctn--id;
  informat visit_date mmddyy. ;
  format visit_date yymmdd10. ;
run;
Obs    acctn    purpose    visit_date    entity    id

 1     12282       3       2012-07-17       5      API
 2     20083       8       2012-06-12       5      LEO
 3     20082       9       2012-06-12       5      LEO
 4     20129       8       2012-06-14       5      LEO
 5     20063       8       2012-06-14       5      LEO

View solution in original post


All Replies
Occasional Contributor
Posts: 15

Re: Getting hung up on a trivial CSV import

Posted in reply to tomcmacdonald

Adding a length statement seems to do the trick

 

DATA TEST;
  INFILE 'C:\test.csv' DLM=',' MISSOVER DSD FIRSTOBS=2;
  LENGTH VISIT_DATE $10;
  INPUT
    ACCTN $
    PURPOSE $
    VISIT_DATE $
    ENTITY $
    ID $;
RUN;
Trusted Advisor
Posts: 1,022

Re: Getting hung up on a trivial CSV import

Posted in reply to tomcmacdonald

Without specific guidance from you SAS is assuming all the character vars are length $8.  You could change the

    VISIT_DATE $

line to
   VISIT_DATE :$10.

 

telling SAS that visitdate can be up to 10 characters long.

 

But you might want to convert it to a true date value, in which case use

   VISIT_DATE :mmddyy10.

 

accompanied elsewhere by a

   format visit_date date9. ;

statement

Occasional Contributor
Posts: 15

Re: Getting hung up on a trivial CSV import

That works too.  I guess the semicolon is a modifier to the length portion of the statement.  I sure do wish the documentation specified this.

 

http://support.sas.com/documentation/cdl/en/hostwin/69955/HTML/default/viewer.htm#chifoptfmain.htm

Solution
‎09-27-2017 10:39 AM
Super User
Super User
Posts: 7,050

Re: Getting hung up on a trivial CSV import

Posted in reply to tomcmacdonald

tomcmacdonald wrote:

That works too.  I guess the semicolon is a modifier to the length portion of the statement.  I sure do wish the documentation specified this.

 

http://support.sas.com/documentation/cdl/en/hostwin/69955/HTML/default/viewer.htm#chifoptfmain.htm


If you use formatted mode input like

input visit_date mmddyy10. ;

then SAS will read EXACTLY the number of characters you specify in the informat.

If you use list mode input

input var1 var2 var3 ;

then SAS will use the number of characters that exist in the data line between the two delimiters.

You have used a mixed mode that is a combination of reading some variables using list mode and one variable using formattted mode.

Using the colon modifier allows you to specify an informat in the INPUT statement and still parse the line using normal list mode style.  This is called modified list mode.

 

You could also just remove the informat from the INPUT statement and instead use an INFORMAT statement (or ATTRIB) statement to tell SAS what informat to use when reading the variable.

 

In general you will get better results if you define your variables before you use them. That is, do not force SAS to guess from how you first use the variable, whether in an INPUT, FORMAT, INFORMAT and assignment statement what type and length your variable should have.

 

So your program look like this instead.

 

data test;
  infile "c:\test.csv" dsd truncover firstobs=2;
  length acctn $8 purpose $8 visit_date 8 entity $8 id $8;
  input acctn--id;
  informat visit_date mmddyy. ;
  format visit_date yymmdd10. ;
run;
Obs    acctn    purpose    visit_date    entity    id

 1     12282       3       2012-07-17       5      API
 2     20083       8       2012-06-12       5      LEO
 3     20082       9       2012-06-12       5      LEO
 4     20129       8       2012-06-14       5      LEO
 5     20063       8       2012-06-14       5      LEO
Super User
Posts: 19,815

Re: Getting hung up on a trivial CSV import

Posted in reply to tomcmacdonald

@tomcmacdonald That's the documentation reference for the INFILE statement. The colon is used on the INPUT statement and it is documented under INPUT, LIST. I do kinda wish there weren't 5 (?) different pages for the INPUT statement though. 

 

http://support.sas.com/documentation/cdl/en/lestmtsref/69738/HTML/default/viewer.htm#n0lrz3gb7m9e4rn...

 

FYI - when all else fails with reading in a text file use PROC IMPORT with a large value for GUESSINGROWS and check the log for the code. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 134 views
  • 5 likes
  • 4 in conversation