BookmarkSubscribeRSS Feed
Lost_Gary
Quartz | Level 8

I've tried to connect to a number of .csv files on a network folder and i can't seem to figure out the best way.  There are probably 10-12 files each with a few hundred fields each.  

I've tried a proc import, but one of the data fields is a 'description' field that contains miscellaneous commas that skews the import fields and record count.  I tried the infile statement with the DSD option - it works great, except they keep changing the data file and adding a record here and there, causing import errors.  So, now I'm trying an ODBC connection.  I like this - very clean in my code, except that I am getting a naming convention error because of the data:

ERROR: This DBMS table or view cannot be accessed by the SAS System because it contains column names that are not unique when a SAS normalized (uppercased) compare is performed. See "Naming Conventions" in the SAS/ACCESS documentation.

I've tried validvarname and preserve_col_names as options, but i am unable to read the data.  I even tried dropping the culprit field but that didn't work and as I said the data keeps changing so there could be additional fields in the future causing this issue.  Is there any hope for me?  

I'm using SAS EG 8.3.  

Here is my code:

libname mylib odbc noprompt="myconneciton"
schema=DBO Connection=global;
options validvarname=v7;
data test;
set mylib.'crappycommadata.txt'n;
run;

9 REPLIES 9
ballardw
Super User

Who is "they" and is there any resembling a requirements or content document for these files?

 

Your comment about "data fields is a 'description' field that contains miscellaneous commas that skews the import fields" makes me wonder what CSV definition "they" or you are using. Most formal Comma Separated Values specifications include something to encapsulate values that contain commas so this doesn't happen. Or does this problem description also include various quote marks breaking the typical control of encasing the value in quotes?

 

 

Tom
Super User Tom
Super User

I don't know how CSV files come into this.

You are showing SAS code connecting to some external database and then using that connection to read a dataset (aka "table") from that database.

Is the remote database the one that is reading the CSV file?  Perhaps you should ask a question on a forum for that database for information on how to control the way that it reads CSV file.

 

If you want to reference a member name that has a period in it then you need to set the VALIDMEMNAME option to EXTEND.  Note that you are still limited to 32 bytes in the member name.

options validmemname=extend;
data test;
  set mylib.'crappycommadata.txt'n;
run;
Lost_Gary
Quartz | Level 8

Not sure my terminology was correct or if it matters, but these are comma delimited text files and the data does have quotations as the text qualifier.  There is no database.  It is simply a network folder full of these files that i am trying to read.  My ODBC connection works for many of these files, which makes reading them simple and routine.  However, a few of the large files give me the naming convention error.  I can't seem to get proc import to work with this file format consistently because of the random commas (unless there is DSD function in Proc Import that I am unaware of).  Again the infile is a challenge because the data files continue to shift (mostly adding files) and my infile statement is now over 3k lines for a singe file import.  The validmemname option had no effect on the naming convention error.  Unfortunately I cannot modify the data structure.  Any other thoughts are appreciated.  

Tom
Super User Tom
Super User

I have no idea what type of database your ODBC connection is connecting to ...

 

But if you just have CSV files in a folder why not just read the CSV files directly with SAS?

 

Make sure the network folder is assessable on the machine where SAS is running.  You can try PROC IMPORT to read the CSV files.  Or perhaps some other tool like:  https://github.com/sasutils/macros/blob/master/csv2ds.sas 

SASJedi
SAS Super FREQ

The LIBNAME statement is used to set up a folder location as a  base SAS library. The base SAS library engine ONLY reads and writes SAS datasets (.sas7bdat). So this statement in your original code cannot possibly be used to read in .CSV files:

libname mylib odbc noprompt="myconneciton" schema=DBO Connection=global;

Your code does not include information about the file system path for the folder containing the .CSV files. You'll need to know the path, and that path will have to be accessible from the system that is running SAS. If your SAS is on a Windows system, that might look like "C:\temp\mydata", on a Linux system, "/home/myuserID/mydata". Most of the code sample I'm providing here is just setup that identifies the path to your WORK directory, makes a folder there, and creates 2 CSV files for you to play around with. Because the folder is in WORK, it will automatically be deleted when you shut down your SAS session, so no need to worry about cleaning up. The PROC IMPORT step the final section is an example of how to import a CSV file. When you run this code, it will import the fish.csv file. For practice, try modifying that code to import the iris.sv file instead.    

/***********************************************************************
 Setup code 
***********************************************************************/

/* Make a subdirectory to store the CSV files in your WORK library folder
 so that it will all automatically be deleted when SAS shuts down */
%let workpath=%qsysfunc(pathname(work));
%let rc=%qsysfunc(dcreate(myCSVfiles,%superq(workpath)));
%let csvpath=%superq(workpath)/myCSVfiles;
%put NOTE: &=csvpath;

/* Make a some CSV files to play with */
proc export data=sashelp.fish (obs=5)
     outfile="%superq(csvpath)/fish.csv"
     dbms=csv 
     replace;
run;
proc export data=sashelp.iris (obs=5)
     outfile="%superq(csvpath)/iris.csv"
     dbms=csv 
     replace;
run;

/***********************************************************************
 OK - sample code to do the work starts here:
***********************************************************************/
/* Import a CSV file */
proc import 
	datafile="&csvpath/fish.csv"
	DBMS=CSV
	out=work.csv_fish replace;
	getnames=YES;
run;

/* Check out the results */
proc print data=work.csv_fish;
run;

You can find more information about PROC IMPORT by clicking this link.

May the SAS be with you!
Mark

Check out my Jedi SAS Tricks for SAS Users
Lost_Gary
Quartz | Level 8
the connection in the libname statement is set up through the Windows ODBC connection and path is set up in the ODBC utility. As long as there is the ODBC driver (and the SAS/Access to ODBC license), SAS can then connect using that libname statement . This would include MS Excel, MS Access, SQL and .csv & .txt sources.

Your code is intriguing - where in the macro would i set my external path? As I tinker with it, I seem to be receiving a number of errors trying to figure that out.
SASJedi
SAS Super FREQ

In the code I provided, the path is set in line #9:

%let csvpath=/home/myUserId/myFiles;

As noted, the code before that line was designed to identify the system path to your WORK library and create a folder there for testing. If you have a specific location in mind, you can skip all of the code before line 9, and then modify that code to point to the location of your choosing. For example, say your SAS is running on Linux, and you want to access the files in the path /home/myUserID/myFiles:

%let csvpath=/home/myUserID/myFiles;

 

Check out my Jedi SAS Tricks for SAS Users
ballardw
Super User

I would suspect any LIbname that points to a single source that may have Excel, Access, CSV and TXT files in it would have issues. You may be getting the warning about invalid variable columns from something other than a CSV file, or a different CSV then you think is causing it.

 

Since it is your ODBC connection then you should share the ODBC connection settings.

As well as an actual log which I still don't see.

 


@Lost_Gary wrote:
the connection in the libname statement is set up through the Windows ODBC connection and path is set up in the ODBC utility. As long as there is the ODBC driver (and the SAS/Access to ODBC license), SAS can then connect using that libname statement . This would include MS Excel, MS Access, SQL and .csv & .txt sources.

Your code is intriguing - where in the macro would i set my external path? As I tinker with it, I seem to be receiving a number of errors trying to figure that out.

 

Tom
Super User Tom
Super User

If the files are using proper quoting then the only thing that should cause issues for PROC IMPORT is embedded end of line characters.  Try using a tool that will remove or replace those first.  Like: https://github.com/sasutils/macros/blob/master/replace_crlf.sas

 

As to the duplicate variable names you might want to try using this macro https://github.com/sasutils/macros/blob/master/csv2ds.sas as it should do a better job of inventing unique variable names than PROC IMPORT can.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 566 views
  • 0 likes
  • 4 in conversation