BookmarkSubscribeRSS Feed
SasDewd
Obsidian | Level 7

Hello,

 

I have a data step with LENGTH/FORMAT/INFORMAT/INPUT components that imports a CSV file.

It's a big piece of code as it has close to 200 fields and yes, I copied it from the import wizard.

I've been using this for a while and it's been working fine.

 

However, I just needed to add another variable to import, Comment, and now I'm having all kinds of issues.

The comment variable could and does have all kinds of characters in it and the ones that I think are causing me a headache are new line characters and commas.

So what happens when I run my code only part of the comment will be displayed in the Comment field and then all the following fields will get messed up (for example, Comment is supposed to say '$10,000 was paid to the customer' and the CommentDate field that follows should say 3/5/2019 - instead the Comment field is showing $10 and the CommentDate field is showing .).

 

My question is, is there a way to clean the data coming in somewhere in the INPUT section.

For example, where I have:

INPUT

Comment   : $CHAR275.

 

I could have something like

Comment = TRNWRD(Comment, ".", "")  :$CHAR275. 

and/or something similar to remove new line characters.

Btw, I tried this and it's not working.

 

Any ideas?

 

Thanks!

7 REPLIES 7
ballardw
Super User

If you are running in a Windows environment and the problem is strictly a line feed you might try adding to your INFILE statement: TERMSTR=CRLF

 

That would mean the a single line feed is not considered the end of a record.

 

Also, you might try a simple $275. informat instead of char. If the data is a proper CSV file then usually that will work.

 

If that doesn't work then you may need to share the code you are using and a few lines of example data that have the problem. Paste the Code and the data into separate code boxes opened using the forum's {I} or "running man" icon so the message windows don't reformat the text, which can seriously degrade the ability to read data.

Your example data and code really only needs to have a few variables, not all 200+, as long as the code and data behave the same.

 

 

 

SasDewd
Obsidian | Level 7

Thank you for your reply @ballardw 

I tried adding TERMSTR=CRLF but that resulted in no records returned at all 

 

So I have 

INFILE "&path/&&File&i"
DELIMITER = ','
DSD
LRECL=32767
FIRSTOBS=2;

 

and I changed it to 

INFILE "&path/&&File&i"
DELIMITER = ','
DSD

TERMSTR=CRLF
LRECL=32767
FIRSTOBS=2;

 

but, like I said, I get back an empty data set.

 

I also forgot to mention, and I think it's important, that some comments have multiple commas (I believe having one comma is OK since there is the DSD option, but having another one would throw it off?).

So, I think it's both multiple commas and new line feeds (which there could also be multiple of) that's causing the issue.

 

Here's a sample of my code (just the variable that's causing the issue)

 

DATA WORK.DAILY_DATA_FEED;
LENGTH

Comment $ 275

...

FORMAT

Comment $CHAR275.

...

INFORMAT

Comment $CHAR275.

....

 

INFILE "&path/&&File&i"
DELIMITER = ','
DSD

TERMSTR=CRLF
LRECL=32767
FIRSTOBS=2;

 

INPUT

Comment : $CHAR275.

 

One of the comments that is causing an issue is:

 

'$10,034 paid to customer, $12000 max
Thank you. John M.'

 

As you can see there is a new line feed after max.

 

Thanks!

ballardw
Super User

If possible please attach a TXT file (don't use CSV extension as the download process gets corrupted by spreadsheets opening the file) with a few examples. I'm afraid anything pasted on the forum is not the same , i.e. line feeds and or carriage returns as your actual file.

SasDewd
Obsidian | Level 7

@ballardw 

I just created a small csv file with only 5-6 columns (comment field included) in it and 2 records, both of which are giving me trouble.

I was going to send it to you but before I did I tested it running through the slimmed-down code (removed all the other columns) and added TERMSTR=CLRF option....and it worked fine!

So, if I add the TERMSTR option to the original code and run it through all the records and all the columns I get no records returned but when I run it against the newly created sample file with only a couple of rows and few columns it works just fine.

I get no errors, no warnings but for some reason it just doesn't return any records when I use TERMSTR=CLRF...

Tom
Super User Tom
Super User

First thing is HOW are you generating the file?  If you are writing it with SAS then the commas will NOT be a problem because the values that contain commas will have quotes added around them.

 

Then the problem is just embedded line-breaks.  Again if you are careful in how you create the file then those can also be read.  The trick is to make sure your actual data never contains CR ('0D'x) followed immediately by LF ('0A'X) and that all of your lines do end with CR LF.

 

For example:

file mycsv temp ;
data  _null_;
  x=1;
  y=2;
  comment='This has commas, and CR ' || '0d'x || 'and LF'|| '0A'x ||'but not both';
file mycsv termstr=crlf dsd dlm=',';
  put x y comment;
run;

data want;
  infile mycsv termstr=crlf dsd truncover ;
  input x y comment :$200. ;
run;

image.png

33    data _null_;
34     infile mycsv termstr=crlf;
35     input;
36     list;
37    run;

NOTE: The infile MYCSV is:
      Filename=C:\downloads\mycsv.dat,
      RECFM=V,LRECL=32767,File Size (bytes)=52,
      Last Modified=22May2019:13:51:09,
      Create Time=22May2019:13:51:09

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+--

1   CHAR  1,2,"This has commas, and CR .and LF.but not both" 50
    ZONE  32322566726672666667226662452066624406772667266762
    NUMR  1C2C24893081303FDD13C01E40320D1E40C6A2540EF402F482
NOTE: 1 record was read from the infile MYCSV.
      The minimum record length was 50.
      The maximum record length was 50.
SasDewd
Obsidian | Level 7

Thank you @ballardw !

I'll need to do some work and clean the file before I upload it here as most of it is our customers' data. 

Let me see if I can put something together.

 

Thank you @Tom!

The CSV file is generated by a separate entity and I have no control over it.

It would have been much easier to just get rid of all these characters and new lines before creating the file but, again, it is just dropped in a folder every morning and all I do is pick it up from there and run it through the DATA step I posted above.

It could include every possible character as it comes from a comment section.

 

Thanks again for your answers!

Tom
Super User Tom
Super User

Tell who ever is generating the files to follow the standard.

https://tools.ietf.org/html/rfc4180

 

But also tell them that you would prefer that they replace and actual CRLF values in the data with just a single CR value instead.

 

Then you will get files that SAS can read.

 

If you really do get embedded CRLF in the middle of the files there are number of examples of how to fix those types of files on this forum.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1052 views
  • 0 likes
  • 3 in conversation