BookmarkSubscribeRSS Feed
ChrisO147
Calcite | Level 5

I am trying to create a new external file from a csv using DI Studio 4.901. I am not a programmer, nor do I understand how to write code. The csv file contains CR in the headers and both CR and blank lines in some of the fields, which causes issues...bumping text to the right and therefore ending up with the data all skewed. I have almost managed to solve the header problem by using TERMSTR=CRLF but, for some strange reason, the headers stop suddenly in the middle of a column name... In addition, some of the columns have CR and Blanks....e.g. all the attached is supposed to be in one cell

 

Your help would be appreciated but please keep the tech speak to a minimum 😞

 

thank you

10 REPLIES 10
Tom
Super User Tom
Super User

I don't know anything about DI, but the theory behind trying to use TERMSTR=CRLF is based on how SAS processes delimited text files (like CSV files).  Some software (like Excel) will handle delimited files with embedded line breaks IF the value is quoted.  The basic SAS method for reading a text cannot do that.  But if your real data lines end with the CR+LF combination (the way that Windows machines make text files) and the embedded line breaks in the middle of a cell value are using only one of those two character then SAS will be able to read the file.  This works because the single CR (or single LF) in the middle of the line it treated the same as any other character.  Note that when this works those CR (or LF) will be in the value of the variable. You will probable what to convert them to space or some other character before trying to display them.

 

But if your source file does NOT use CR+LF to mark the end of lines. OR it DOES include CR+LF in the middle of a cell value then that solution will not work.  

 

You can tell this by looking at the file with an editor that will show you what characters are really there.  You could also use a simple little SAS data step to look at a few lines from the file and see for yourself.  For example this program will read the first 1,000 bytes from the file and display it for you.  The Hex code for CR is 0D and for LF is 0A.

 

data _null_:
  infile 'myfile' recfm=f lrecl=100 obs=10 ;
  input;
  list;
run;

If you find that your file does not have CR+LF at the end of the lines and only CR (or only LF) for the "split" cells then you will probably need to convert the file.  I have had success using a simple data step to read the file character by character and replace CR (or LF) that appear in the middle of quoted string with space (or other character).  Then SAS should be able to read the corrected file. 

ChrisO147
Calcite | Level 5
Thank you for your reply Tom As I said in my post, I don't program or understand programming... I want to use the "New, External File" function within DI Studio and, if necessary, add a bit of code. My only other alternative would be to Import it as Excel.. Very frustrating for something that can be done so easily using Enterprise Guide but fails, or is extremely complicated, when trying to do the same in DI Studio
Tom
Super User Tom
Super User

My suggestion is to first check that the file is actually in a format that SAS can parse. 

ChrisO147
Calcite | Level 5

Thank you

 

It seems that something that is so straightforward in EG, is not straightforward in DI.

Patrick
Opal | Level 21

@ChrisO147 

Unlike SAS EG SAS DI Studio is a developers tool for implementation of metadata driven ETL processes to be run in batch. You don't use the DIS client anymore for production runs but you deploy the DIS jobs and run them in batch via a scheduler.

To use SAS DIS efficiently and to its full potential a technical and coding background is imho a must and if you really want to use DIS then I'd strongly recommend that you first skill-up in SAS programming.

 

Why do you want to use DIS at all? Is it really the right tool for you and the task you want to use it for?

 

Here what SAS Education considers as necessary prerequisites:

 

Capture.JPG

ChrisO147
Calcite | Level 5

Thank you for your "imho" response Patrick

 

Notwithstanding the fact that I have little experience in coding, I have attended the SAS programming basics course.

 

However, as responses have confirmed, the format of the file is not perfect and I am asking for help in overcoming this issue. If you have any suggestions, other than to tell me to attend courses, they would be appreciated.

 

 

Patrick
Opal | Level 21

@ChrisO147 

Well if you've got some coding background then your answer to @Tom should eventually have been slightly different.

 

DIS uses SAS metadata definitions to generate SAS code. In your case it needs an external file, a file reader with a target table and the mapping from source to target to generate useful SAS code. What it will generate is a SAS data step with in infile and input statement.

If something is not working as expected then most likely the external file metadata definition doesn't fully match the actual external file. 

To get it right you can now inspect the log - which requires a SAS coding background to read - or you inspect the actual external data and the generated code and try to figure out what doesn't match.

I personally find it easiest to just copy the not yet fully working generated code out of DIS (the generated SAS data step) and then tweak the code in a SAS EG/Studion or even DIS code window until it works. Once I know how the code really needs to look like I then have all the information I need to get the external file metadata definition 100% right. 

What Tom posted is guidance for you to first get the code right which then makes it easy to get the external file metadata definition right.

 

"the format of the file is not perfect and I am asking for help in overcoming this issue"

For us to do more than guessing you would need to attach a text file with some of the problematic lines so we can actually investigate and test how the code needs to look like.

ChrisO147
Calcite | Level 5

Thank you Patrick

 

I will attach a copy of the file, once I have stripped out any sensitive information.

 

In the meantime, I will see assistance from my learned SAS colleagues on-site.

TomKari
Onyx | Level 15

DI Studio is usually used for production processes that will be run repetitively. If your incoming file doesn't meet the design requirements for CSV files, you're going to have no end of headaches over time.

 

Is it possible for you to contact the provider of the file, and get them to rejig the file so that it's a proper CSV?

Tom

Patrick
Opal | Level 21

@ChrisO147 

Please also post the DIS generated data step with the data sample - only the data step bit not all the other code DIS generates.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1824 views
  • 0 likes
  • 4 in conversation