BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
StaticFX
Obsidian | Level 7

Having an issue importing a csv... 

simple 3 columns, char, num,num

its a HUGE file... almost 2 million rows.. small sample

 

1103719801,0.27,

1103719901,1.81,

1103720001,1.99,

1103720101,1.88,

1107846101,321.02,2140.16

1103729801,0.27,

 

Note that the 3rd column often as nothing.. 

 

DATA PMA,

    INFILE '/home/xxxx/PMAccrual.csv' delimiter=',' firstobs=2;

    INFORMAT Key $10.;

    INFORMAT Accrual 8.2;

    INFORMAT UpdateAccrual 8.2;

    INPUT Key $ Accrual UpateAccrual;

RUN;

Key and Accrual look fine. but Update Accrual is all .   

nothing is importing for the 3rd column...

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So your later post clarified the real issue, which is that you are reading a file with Windows/DOS end of line characters (CRLF) on a system that is using Unix style end of line characters (LF).  This is causing SAS to threat the carriage return ('0D'x) as data. Since your last variable on the line is numeric then that caused SAS to not be able to translate the characters to a number.

 

Add the TERMSTR=CRLF option to the INFILE statement and see if that helps.

Also add TRUNCOVER to keep SAS from running to the next line for a value when the last value is missing.

Don't specify a decimal value in a INFORMAT unless your text files has remove the dots from the value and you want to tell SAS you know where the dots used to be.  Isn't that a line from Guys and Dolls?  If you tell SAS to use 8.2 and the value being read is '1234' then SAS will interpret that as 12.34.

Also don't use INFORMAT when you mean LENGTH.

When you have already defined the variables type there is no need to add $ in the INPUT statement.

DATA PMA,
    INFILE '/home/xxxx/PMAccrual.csv' dsd firstobs=2 truncover termstr=crlf;
    LENGTH Key $10 Accrual 8 UpdateAccrual  8;
    INPUT Key Accrual UpateAccrual;
RUN;

 

 

View solution in original post

12 REPLIES 12
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please do not code in upper case, its the equivalent of shouting at us.  Now hard to tell from that snippet, but two simple changes to your code should make it work:

data pma;
  infile "s:\temp\x.csv" dlm="," firstobs=2 dsd;
  informat key $10.;
  informat accrual 8.2;
  informat updateaccrual 8.2;
  input key $ accrual updateaccrual;
run;

Change 1 - put dsd so when the variable is missing it moves on - otherwise the next key will appear in the variable - this is likely what is breaking your code.

Change 2 - you have mispelt updateacrrual in the input, so your reading a different variable than the input.

Reeza
Super User

I would also recommend adding TRUNCOVER to the INFILE statement. 

StaticFX
Obsidian | Level 7

The updateaccrual was just a spelling issue (i didnt copy/paste) 🙂

 

sorry caps (I like to do most keywords in caps)

 

but, Neither worked. Still has a null value...

Reeza
Super User

It works fine for the example posted, does it work for you for the sample? If not, please post your code and log - for the actual file not this test code. 

 

 data want;
 infile cards dlm="," truncover dsd;
  informat key $10.;
  informat accrual 8.;
  informat updateaccrual 8.;
  input key $ accrual updateaccrual;
  cards;
1103719801,0.27,
1103719901,1.81,
1103720001,1.99,
1103720101,1.88,
1107846101,321.02,2140.16
1103729801,0.27,
;
run;
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Seriously, don't know where the whole upcase coding has come from but it really just sounds like shouting.  Anyways for the sample you provided the code works fine, you need to supply a sample that doesn't work as expected so we can see.  Maybe the informat length is not great enough, maybe there is a special character in the data?

StaticFX
Obsidian | Level 7

starting to wonder that...

 

NOTE: Invalid data for UpdateAccrual in line 2 10-14.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0

2 CHAR AAA,0.00,0.00. 14

ZONE 44423233232330

NUMR 111C0E00C0E00D

Key=AAA Accrual=0 UpdateAccrual=. _ERROR_=1 _N_=1

 

continues on line 3,4,5,6,etc...

 

 

Key,Accrual,UpdatedAccrual
AAA,0.00,0.00
1005260501,0.00,
1004833501,1239.00,
1004805701,552.00,
1004802501,1446.00,
1004802601,1291.00,

 

I put that first line in just to test...

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Your data looks wrong - I can't really tell as you don't want to post it, but this:
2 CHAR AAA,0.00,0.00. 14

                              ^          ^

Doesn't look right, you cant have two decimal points, nor can you have spaces in a value.

StaticFX
Obsidian | Level 7

that last . is not in the file

 

Key,Accrual,UpdatedAccrual
1005260501,0.00,
1004833501,1239.00,
1004805701,552.00,
1004802501,1446.00,
1004802601,1291.00,
1004802701,1084.00,
1004802801,6337.00,
1004802901,2480.00,

 

NOTE: Invalid data for UpdateAccrual in line 2 17-17.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0

2 CHAR 1005260501,0.00,. 17

ZONE 33333333332323320

NUMR 1005260501C0E00CD

Key=1005260501 Accrual=0 UpdateAccrual=. _ERROR_=1 _N_=1

 

could there be a char on each line im not seeing?

ballardw
Super User

When the log shows


@StaticFX wrote:

starting to wonder that...

 

NOTE: Invalid data for UpdateAccrual in line 2 10-14.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0

2 CHAR AAA,0.00,0.00. 14

ZONE 44423233232330

NUMR 111C0E00C0E00D

Key=AAA Accrual=0 UpdateAccrual=. _ERROR_=1 _N_=1

 

 


Notice there is a second . in the last value:  0.00. So SAS thinks the value is character. It is not impossible that you are getting "invisible" characters if copying and pasting from some sources.

 

If your data is not actually sensitive either paste a few lines into a code box opened using the forum {I} menu icon. The main window reformats text and may insert characters. The code box is plain text.

Or create a txt file with a few lines and attach it, though the code box is better so that folks can see directly what you have without downloading attachments.

Tom
Super User Tom
Super User

So your later post clarified the real issue, which is that you are reading a file with Windows/DOS end of line characters (CRLF) on a system that is using Unix style end of line characters (LF).  This is causing SAS to threat the carriage return ('0D'x) as data. Since your last variable on the line is numeric then that caused SAS to not be able to translate the characters to a number.

 

Add the TERMSTR=CRLF option to the INFILE statement and see if that helps.

Also add TRUNCOVER to keep SAS from running to the next line for a value when the last value is missing.

Don't specify a decimal value in a INFORMAT unless your text files has remove the dots from the value and you want to tell SAS you know where the dots used to be.  Isn't that a line from Guys and Dolls?  If you tell SAS to use 8.2 and the value being read is '1234' then SAS will interpret that as 12.34.

Also don't use INFORMAT when you mean LENGTH.

When you have already defined the variables type there is no need to add $ in the INPUT statement.

DATA PMA,
    INFILE '/home/xxxx/PMAccrual.csv' dsd firstobs=2 truncover termstr=crlf;
    LENGTH Key $10 Accrual 8 UpdateAccrual  8;
    INPUT Key Accrual UpateAccrual;
RUN;

 

 

StaticFX
Obsidian | Level 7
Bingo! THANK YOU !!!
whats the diff between Informat and Length?
Tom
Super User Tom
Super User

@StaticFX wrote:
Bingo! THANK YOU !!!
whats the diff between Informat and Length?

You use the LENGTH statement to define a variable. You tell it the type (character or numeric) and the number of bytes that you want it to use. You use the INFORMAT statement to attach an informat to the variable. An informat is instructions on how to convert text to the value to be stored.

 

The reason is mainly works to use INFORMAT when you mean LENGTH is because SAS will define a variable based on how you first use it.  So if the first use is in the INFORMAT (or FORMAT) statement then SAS will make a guess at what type of variable you wanted to create based on the informat (or format) you are attaching to it.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 4093 views
  • 1 like
  • 5 in conversation