BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
MRDM
Obsidian | Level 7

Hi, I'm doing an update on an existing legacy process so don't want to do too much of an overhaul.

I've got a CSV I've got setup at an delimited external file linked to a file reader. One of the text fields can contain commas which would obviously ordinarily cause issues as it would see it as a new column and split it there, unless it was wrapped in quotes...

Unfortunately the way the file is provided they only wrap that field in quotes if it contains a comma, but due to this I believe DI is seeing other values unquoted in the same field and not then not treating them properly, anyway around this?

So as an example

Row Customer Text
1 17 hello
2 23  
3 56 "hello, person"

 

The Text field for Row 3 is being read in as

"hello

 

Rather than the full value. Ideally I'd like to stick with the file reader transformation but will switch to code if needed, but ideally not. I'm fairly sure it would work if all values in that field were quoted but as it's only quoted if it contains a comma it's not doing it.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Is the example in your original post the actual data that is having trouble?
Can you share the actual text file that causes the trouble?  And the code that DI generated to read it.

A CSV file to contain what you shared should look like:

Row,Customer,Text
1,17,hello
2,23, 
3,56,"hello, person"

And the SAS code to read it should look like:

data want;
  infile csv dsd truncover firstobs=2;
  input Row Customer Text :$20.;
run;

Let's try it.

options parmcards=csv;
filename csv temp;
parmcards4;
Row,Customer,Text
1,17,hello
2,23, 
3,56,"hello, person"
;;;;

data want;
  infile csv dsd truncover firstobs=2;
  input Row Customer Text :$20.;
run;

proc print;
run;

Result

Tom_0-1687795958672.png

 

 

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

In normal SAS code that is what the DSD option on the INFILE statement does.

 

If the code that the DI wizard generated does not have the DSD option on the INFILE statement then try using this method to modify the INFILE statement.

 

https://documentation.sas.com/doc/en/etlug/4.904/p04pztrut3whjln1ejyho313zx4b.htm

 

MRDM
Obsidian | Level 7

Thanks, I've had a look though and the infile statement already has dsd as an option.

 

infile "&Filename."
       lrecl = 256
       delimiter = ','
       dsd
       missover
       firstobs = 2; 

 

Tom
Super User Tom
Super User

Then something else is happening.

Perhaps the LRECL is too short and that is why the value is truncated?  If it does not see the closing quote that would explain why the opening quote is there.

Perhaps the quotes are not actually quotes?  Perhaps they are "Smart" quotes (or really stupid quotes) that WORD likes to insert?

MRDM
Obsidian | Level 7
LRECL is ok as it's reading in longer where there's not a comma. As far as I can tell opening the csv in notepad++ they appear to be regular quotes.
Tom
Super User Tom
Super User

Is the example in your original post the actual data that is having trouble?
Can you share the actual text file that causes the trouble?  And the code that DI generated to read it.

A CSV file to contain what you shared should look like:

Row,Customer,Text
1,17,hello
2,23, 
3,56,"hello, person"

And the SAS code to read it should look like:

data want;
  infile csv dsd truncover firstobs=2;
  input Row Customer Text :$20.;
run;

Let's try it.

options parmcards=csv;
filename csv temp;
parmcards4;
Row,Customer,Text
1,17,hello
2,23, 
3,56,"hello, person"
;;;;

data want;
  infile csv dsd truncover firstobs=2;
  input Row Customer Text :$20.;
run;

proc print;
run;

Result

Tom_0-1687795958672.png

 

 

MRDM
Obsidian | Level 7

I'm back sorry, so it's sorted now, but I'm not overly pleased why. Essentially I tried tweaking the external file metadata but no joy, however if I copied it then repointed it at some of the data in a new file it worked. I then decided to make new external file metadata objects pointing at the original files, I used identical field names, formats etc. I verified the infile code was identical, but for some reason the new ones work fine and the old ones don't. I've been through every single thing I can check and they look identical, but one loads in the full text and one doesn't. I'm baffled but at least it's solved.

 

I had two different files with different layouts and both had the same issue and this solved it for both. Just another 'SASism' to deal with.

 

Thanks for the help.

Tom
Super User Tom
Super User

Does DI actually let you see the code it generated and the SAS log from when it ran the generated code?

Can you see any difference in the two versions?

MRDM
Obsidian | Level 7

I made a new job and added the old and new metadata objects each linked to a file reader. 

Old metadata object, doesn't work

data work.W1C68GP0 / view = work.W1C68GP0 ; 
   infile &Filename. 
          lrecl = 256
          delimiter = ','
          dsd
          missover
          firstobs = 2; 
   ; 
   attrib Period length = $100; 
   attrib OrgCode length = $9; 
   attrib Quarter length = 8; 
   attrib Allocation length = $150; 
   attrib SpendServices length = $150; 
   attrib SpendPharma length = $150; 
   attrib SpendOther length = $150; 
   attrib SetQuitDateLastYear length = $150; 
   attrib SetQuitDateThisYear length = $150; 
   attrib UnknownQuitStatusThisYear length = $150; 
   attrib SuccessfulQuittersLastYear length = $150; 
   attrib SuccessfulQuittersThisYear length = $150; 
   attrib BreachComments length = $1000
      format = $char1000.
      informat = $char1000.; 
   
   input Period OrgCode Quarter Allocation SpendServices SpendPharma SpendOther 
         SetQuitDateLastYear SetQuitDateThisYear UnknownQuitStatusThisYear 
         SuccessfulQuittersLastYear SuccessfulQuittersThisYear BreachComments; 
   
run; 

MRDM_0-1687984006654.png

 

New metadata object, does work

 

data work.W1CAEHBW / view = work.W1CAEHBW ; 
   infile &Filename.
          lrecl = 1250
          delimiter = ','
          dsd
          missover
          firstobs = 2; 
   ; 
   attrib Period length = $100; 
   attrib OrgCode length = $9; 
   attrib Quarter length = 8; 
   attrib Allocation length = $150; 
   attrib SpendServices length = $150; 
   attrib SpendPharma length = $150; 
   attrib SpendOther length = $150; 
   attrib SetQuitDateLastYear length = $150; 
   attrib SetQuitDateThisYear length = $150; 
   attrib UnknownQuitStatusThisYear length = $150; 
   attrib SuccessfulQuittersLastYear length = $150; 
   attrib SuccessfulQuittersThisYear length = $150; 
   attrib BreachComments length = $1000
      format = $char1000.
      informat = $char1000.; 
   
   input Period OrgCode Quarter Allocation SpendServices SpendPharma SpendOther 
         SetQuitDateLastYear SetQuitDateThisYear UnknownQuitStatusThisYear 
         SuccessfulQuittersLastYear SuccessfulQuittersThisYear BreachComments; 
   
run; 

MRDM_1-1687984123641.png

 

The code looks identical but you can see in the results one works, one doesn't.

 

 

Tom
Super User Tom
Super User

Only difference is in the way it is specifying the filename.

 

In the NEW one the code is adding quotes around the value of the macro variable FILENAME.

In the OLD one the code is just using the value of the macro variable directly.

 

Does the macro variable FILENAME end up having different values between the two jobs.

 

Notice that both are not defining a dataset.  Instead they are defining a view.  So if the file the data step is referencing changes between the different times you try to use the view then the results will be different.

 

I don't know how DI works, but how can you check the results without running a "job"?  Are you sure you aren't just looking at some old version of the data from some earlier run?

 

Also if there was already a dataset named work.W1C6U328 and you tried to run that OLD data step to define a view named work.W1C6U328 it would fail.  Or if there was already a view named work.W1C6U328 and you ran that OLD data step with a value of FILENAME that caused the INFILE statement to be invalid syntax then the older view would not be replaced.

MRDM
Obsidian | Level 7

I updated the code in the previous post. You can preview the data outside a job on it's own but there's no log then. I updated both metadata objects to point at the actual physical file rather than a macro variable and got the exact same results (I left it as &filename. in the sample code as I can't post filenames/paths here). I noticed the Record lengths were different, although that wasn't the issue as longer records without commas were being read in fine (the comma was at position 77), however, updating the length seems to have forced some kind of refresh and that metadata object now works too.

 

DI has always sadly been the blacksheep of the SAS family (even though it's great) and can have bugs like this sometimes where it can get caught up and have weird metadata bugs, often coming back the next day can solve them but this one stuck around, I was hesitant to start making changes to a process that's run for 5+ years, but altering the file metadata seems to have done it (I'll stick with the new metadata objects anyway to be safe). I found a bug in DI years ago that after a lot of back and forth SAS accepted DI was generating incorrect code and for years kept been told it was being fixed in the next version and I don't think it ever was, I gave up after a few years of asking and our Admin have implemented a workaround.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1629 views
  • 0 likes
  • 2 in conversation