BookmarkSubscribeRSS Feed
Ramsha
Obsidian | Level 7

I understand a txt file can provide more options for editing so I was wondering if it is possible to change an imported csv file to a txt file. And if so, the code to this? 

14 REPLIES 14
Tom
Super User Tom
Super User

@Ramsha wrote:

I understand a txt file can provide more options for editing so I was wondering if it is possible to change an imported csv file to a txt file. And if so, the code to this? 


YES you can write a text file from a SAS dataset. Whether you created the SAS dataset from a CSV file or via any other method.

 

Note that a CSV file is itself a text file.  You need to explain what you mean by edit and why you would think it was harder to edit a CSV file than some other format of text file.

 

If you specify more detail about what type of file you want to create then you can get more concrete help.

 

Ramsha
Obsidian | Level 7

I've posted previous questions regarding incorrect formatting in an imported csv file i.e. some observations were lumped into one variable when they should have been separated. I understand that using txt, it is easier to define the amount of characters to fit in one variable. Incorrect formatting and the tedious and insufficiency in formatting in a csv file is why I would want to convert to a raw txt file. Hopefully this made it a little bit more clear. I don't have a concrete example. It is just an issue that I keep having. 

Tom
Super User Tom
Super User

If you have an editor (like the SAS program editor) that uses a fixed font and your lines are SHORT then you can edit easier.  The old SAS program editor is even better for than than the "enhanced" editor since it has nice line number commands like RULE.

 

But in general it is easier to edit a delimited file. It might help to use a different delimiter other than comma or tab.  Such as '|'.

 

The example you post on the other thread of three fields being read into a single column is easy to avoid. There are two possible causes that I can think of.  The first is that the source file was missing the delimiters between the fields.  Again if the delimiter is something like a | then that is easier to see in a editor and fix.

 

The other is improper use of formatted input statement when reading the delimited file.  If you include an informat specifications in your INPUT statement and forget to include the colon modifier in front of the informat then if the value is shorter than the specified width on the informat you can "eat" the delimiters.

 

A third problem can be with delimited files generated by some confused database designers that do not insert proper quoting. Instead they try to "escape" quotes and delimiters as if they were generating Unix shell command lines instead of delimited text files.

 

Example with data in-line.

data address;
  length name address city $100 state $2 zip $5 ;
  infile cards dsd dlm='|' truncover firstobs=2;
  input name -- zip;
cards4;
NAME|ADDRESS|CITY|STATE|ZIP
Jones|100 West Street|New Orleans|LA|10464
No Address or zip||New York|NY|
;;;;

 

 

 

 

 

ballardw
Super User

@Ramsha wrote:

I've posted previous questions regarding incorrect formatting in an imported csv file i.e. some observations were lumped into one variable when they should have been separated. I understand that using txt, it is easier to define the amount of characters to fit in one variable. Incorrect formatting and the tedious and insufficiency in formatting in a csv file is why I would want to convert to a raw txt file. Hopefully this made it a little bit more clear. I don't have a concrete example. It is just an issue that I keep having. 


Your problem has nothing to do with " insufficiency in formatting in a csv file". The problem is data collection/ data entry (from your previous posts on this file). Data entered into the wrong column to begin with has nothing to do with the CSV file structure. Entering commas inside fields may present some challenges for some data systems creating a proper csv though.

GIGO

 

Education in data collection learning point: Do not wait until the end of a data collection cycle to examine the data. Look at early results to see if you collectors are following protocol.

I realize since the specific data you are referencing were addresses you may be dealing with old data that was poorly collected long ago.

Reeza
Super User

Why do you have to use PROC IMPORT?

Ramsha
Obsidian | Level 7

I'm given physical excel workbooks so I'm assuming it is proc import? Is there another way? 

Tom
Super User Tom
Super User

If you have actual Excel files (.XLSX hopefull) then your best bet is to fix them in Excel.  If you don't have Excel upload the file to a google doc and edit them there.

 

In addition to PROC IMPORT you can use the XLSX engine to read Excel workbooks.

libname in xlsx 'myfile.xlsx';
proc copy inlib=in outlib=work;
run;
Ramsha
Obsidian | Level 7

This is awesome! Thank you. 

I was wondering if there was a code just because I receive hundreds of excel files and need an efficient way so I am not fixing the excel file itself. 

I have  not played around with libname yet as I am relatively new to SAS. Does this give you more freedom to play around? 

Tom
Super User Tom
Super User

Nothing will fix the type of error you posted other than editing the source files.  Or writing you own programs to fix the values after you have read them into SAS.

 

Excel stores everything into "cells".  If somehow you have what you consider three separate values stored into one single cell in Excel then it will come into one observation for one variable in SAS.

 

Some things are easier to deal with than other.  For example if someone inserted line feeds into a cell in Excel so it looks on the screen like three lines of text then you can probably use the SCAN() function in SAS to pull out the different values.

 

But if you just have space delimited lists of words it will be much harder to programattically figure out if two of them are the street and one is the city or the reverse.

Reeza
Super User

Are you sure they're Excel, your last post indicated CSV, including the code. CSV files open in Excel but they're not really Excel files, they're text files. 

And yes, you have less control over Excel files and if you have 100's you need a more robust procedure otherwise I guarantee you'll run into type mismatch issues very quickly. 

 


@Ramsha wrote:

I'm given physical excel workbooks so I'm assuming it is proc import? Is there another way? 


 

Ramsha
Obsidian | Level 7

Yea I believe they are originally csv files but opened in excel so I initially imported as an excel file using the snippet tool. I just have loads of these files so I just need an efficient way to data repair.  The code just looked cleaner using proc import excel and the tables generated looked to be the same. So would there be an easier cleaner way with using a csv methodology? Thanks alot in advance- you're been a great help!

Tom
Super User Tom
Super User

If you have CSV files and open them in Excel without the proper precautions then Excel will mangle the data. It will drop leading zeros from text values that look like numbers to Excel. Or convert strings with hyphens or slashes in them to dates.  Or mis interpret date strings by confusing the month number for the day numbers. etc etc.

 

It is better to read the CSV files directly using your own SAS code.  If you really have no idea what is in the file then you could use PROC IMPOrT on a couple of them to get a sense of the structure.  But since PROC IMPORT has to GUESS at how to define each column then you can end up with in compatible dataset when running PROC IMPORT on two files that have the same structure but different sets of records.

 

 

andreas_lds
Jade | Level 19

There are two things i do on every windows-computer before i start using them:

  1. activating the “File name extensions” box in the Show/hide section (Windows 10), so that the full file name is shown in windows explorer
  2. changing the default program used to open csv-files, so that they are opened with notepad++ (or any other text editor available)

Excel is almost always not showing what's in the csv-file, but an interpretation of the data. That "interpretation" includes the removal of leading zeros, displaying text data containing hyphens as dates ...

Reeza
Super User

@Ramsha wrote:

Yea I believe they are originally csv files but opened in excel so I initially imported as an excel file using the snippet tool. I just have loads of these files so I just need an efficient way to data repair.  The code just looked cleaner using proc import excel and the tables generated looked to be the same. So would there be an easier cleaner way with using a csv methodology? Thanks alot in advance- you're been a great help!


Then you can use the list of solutions in the other post. Import it using a data step instead of PROC IMPORT and you won't have any issues. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2144 views
  • 5 likes
  • 5 in conversation