Importing Data - Direct from the network / phantom character getting added

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Importing Data - Direct from the network / phantom character getting added

[ Edited ]

We are discontinuing the use of desktop SAS as we do 99% of our work on a server.  Basically the only time we process locally is when we are uploading files from local network drives.  As we are transitioning our code to bystep local processing, we are encountering issues--we really only need to solve one of the issues below (solving one will eliminate the need for the other).

 

(1) Is there a way to point SAS, through remote processing, to a local network drive?  Currently we are copying files from the network to the server and then uploading them (but having the issue described below).  We've tried several re-writes but can't find a way to get SAS to recognize a network drive within a remote libname statement.  Is there a way to do that?

 

(2) When we copy files from the network to the server (generally with the program WinSCP) and then upload, a phantom character gets added to every observation, causing the data to not be useable.  The data loads, looks like it is formatted correctly, but there is a phantom character at the end of observations that has to be stripped off using the substr function.  So, we have a fix, but it would be better to understand why this happening.  An example:

 

For a very simple upload, we have a file with one variable, a 9-number value.

 

If we run the below code locally, loading a file from the network drive, it runs perfectly and creates a character variable that is 9 characters long, which is exactly what we want:

 

 

proc import out = workwin.test
datafile = "Q:\generic\filename.csv"
DBMS=CSV REPLACE;
GETNAMES=NO;
DATAROW=2;
run;

However, if we first transfer the file to the server and then load it, using the same code, just changning the file location and adding remote submits, it reads each observation in as a 10-character value, where the tenth character is a phantom:

 

 

rsubmit;
proc import out = workwin.test
datafile = "/opt/sas/users/generic/filename.csv"
DBMS=CSV REPLACE;
GETNAMES=NO;
DATAROW=2;
run;
endrsubmit;

I'm calling the tenth character a "phantom" because if we try to search the uploaded values, we can't match them no matter what we try.  For example, if the variable is 'order_id' and a value is '123456789', if we search where order_id = '123456789' SAS claims there is no match.  We also tried '123456789 ' and '123456789_' and just as numeric, but in no cases can it find the match, so I don't even know what that tenth character is that gets added.  Again, if we use substr and strip off the tenth character, it works, but it'd be nice to not have to do this to every piece of data we're trying to upload.

 

Any ideas on this are GREATLY appreciated!  (FYI, we are on SAS 9.3)


Accepted Solutions
Solution
‎01-11-2017 12:13 PM
Super User
Super User
Posts: 7,413

Re: Importing Data - Direct from the network / phantom character getting added

By moving to a server setup I assume you mean you moving onto Unix SAS from Windows based SAS?  If so then that would explain why you have the extra character.  Windows uses CR+LF to dictate the end of a line, Unix only uses CR, hence one hidden symbol difference.  Your FTP software software should have an option to sort this:

https://winscp.net/eng/docs/faq_line_breaks

 

As for your process, using proc import - which is a guessing procedure - can give different results as it is guessing what the data is.  Depending on various options, it may guess your data is different than on the other system.  I would always recommend writing a datastep import (you can generate from the proc import code), and fix lengths/formats etc. so you get an exact match without guessing.

 

And yes, you will need an FTP setup to access files, I don't think there is a way to directly access windows network areas from UNix and vice versa.

View solution in original post


All Replies
Solution
‎01-11-2017 12:13 PM
Super User
Super User
Posts: 7,413

Re: Importing Data - Direct from the network / phantom character getting added

By moving to a server setup I assume you mean you moving onto Unix SAS from Windows based SAS?  If so then that would explain why you have the extra character.  Windows uses CR+LF to dictate the end of a line, Unix only uses CR, hence one hidden symbol difference.  Your FTP software software should have an option to sort this:

https://winscp.net/eng/docs/faq_line_breaks

 

As for your process, using proc import - which is a guessing procedure - can give different results as it is guessing what the data is.  Depending on various options, it may guess your data is different than on the other system.  I would always recommend writing a datastep import (you can generate from the proc import code), and fix lengths/formats etc. so you get an exact match without guessing.

 

And yes, you will need an FTP setup to access files, I don't think there is a way to directly access windows network areas from UNix and vice versa.

New Contributor
Posts: 4

Re: Importing Data - Direct from the network / phantom character getting added

Hi RW9!

 

Yes, we are transferring from Windows to Unix...modifying the FTP transfer settings (from "Binary" to "Text") worked perfectly.  Also, your recommendation of using the datastep import, modified from the proc import, was very useful--and saves us several steps.

 

Thank you so much for the explanation on this!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 154 views
  • 1 like
  • 2 in conversation