BookmarkSubscribeRSS Feed
kptkhv
Fluorite | Level 6

Hello! I am new to SAS, but I need to get data from it with python. I connected to sas with saspy over passwordless ssh. No problems with connection and getting sasdata from there. 

The problem is that my table is very large but I need only 2 columns from there. Is it possible to select only 2 columns and save it to csv?

Now I can only get the output table in jupyter notebook, but it stored as html or txt, so that is not what I need.

 

 

16 REPLIES 16
Kurt_Bremser
Super User

See this example of creating a csv file with a subset of columns:

data _null_; * do not create a dataset, but give me the power of the data step;
set sashelp.class; * read dataset;
file '$HOME/class.csv' dlm=',' dsd; * define target;
put name sex; * write data;
run;
sastpw
SAS Employee

Hey, to do this with saspy, is fairly simple. Not knowing your talbe or anything, I'll make up names, but you would change them for your data.

 

import saspy

sas = saspy.SASsession()

sas

 

table = sas.sasdata('your_table','libref_if_not_work', dsopts={'keep' : 'column1 column2'})

df = table.to_df()  #now you have a dataframe with the two columns (col1 and col2) from the table in the libref you wanted.

 

As for saving to csv, you can do that from the dataframe (df.to_csv()  that's the pandas method). Or, you could do that from SAS without bringing the data over to python inthe first place:

 

table.to_csv('path/filename.csv')  # now you have a .csv file with the two columns. this is the saspy to_csv() method

 

If you are running a local SAS, the csv file is on your machine. If it's a remote (which you said SSH, so it's remote), you can use the download method to pull it over to your local machine, if you need it there.

 

sas.download('local_file_path', 'path/filename.csv')  # now you have local_file_path/filename.csv file on your machine

 

 

There's even a way to create the dataframe and create the csv file and have it be copied to your local machine all in one line of code, if that's what you want; sort of all of the above in one:

 

df = table.to_df_CSV(tempfile='local_file_path/filename.csv, tempkeep=True)  # now you have data frame and local .csv file too

 

So, you see, with only a line or two of python code, you can accomplish this a variety ow ways, depending upon your needs.

 

Let me know if any of these work for you or if there's something slightly different you need? Might just be another line of code.

 

Thanks,

Tom

kptkhv
Fluorite | Level 6

Thank you for your time! 

The solution is great but I have a large dataset and cant download it on a remote server. Is it possible to avoid this problem?

sastpw
SAS Employee

I'm not sure I completely understand, so I can't give you the exact answer yet. You are using SSH, so the following is what I believe is true:

 

1) SAS is on a remote system

2) python, and thus saspy, are on your local system

3) the large SAS data set is on the remote SAS server

4) you want 2 columns of that large dataset where? In python as a pandas data frame (that's on your client)? on your client system as a .csv file? Somewhere else as a csv?

 

Can you explain exactly what you want where and I can provide a better answer. 

 

Thanks!

Tom

kptkhv
Fluorite | Level 6

Yes, you're right. 

It doesn't matter where my csv file will be, I can transfer it from remote server to my local machine. The main problem is that python, I guess, can't convert all the table to csv (even if only 2 columns are selected).

In jupyter noteboook I've connected to remote SAS through SSH - it's OK. I download tablle with sasdata - it's OK. Then, I want to manipulate data but I can't  do it when my data set in sas format, so I want to convert it to dataframe or save it as csv(doesn't matter which format it will be exactly because if it will be in dataframe than I can save it to csv and vice versa). The problem is that when I try to realize this step, python starts to think for a very long time and I have no result even after 8+ hours of processing something (I don't even know what python exactly do inside). 

So, maybe you had some problems with huge data sets before? Maybe I can save it partly, or just know on which step python starts to fail. Or it is impossible to do it only with python?

sastpw
SAS Employee

Ok, so, what code exactly have you run? I showed 3 different possibilities earlier, It sounds like one of two things (or maybe part of the same). That the data you're trying to load into python is more than that process can support. Or, it's so large and you're using the sd2df() method and it's taking so long that it's not workable.

 

Can you try the following and see what happens:

 

table = sas.sasdata('your_table','libref_if_not_work', dsopts={'keep' : 'column1 column2'})

 

table.to_csv(sas.workpath+'filename.csv') 

 

sas.download('local_file_path', sas.workpath+'filename.csv') 

 

import pandas

df = pandas.read_csv(('local_file_path/'+'filename.csv)

 

I expect that the first two; creating the csv and downloading it won't take terribly long. And I expect the pandas.read_csv() will show if it's really too much data for your python process or if can work.

The above is basically doing what the following does, but in individual steps so you can see if there's a problem, and with which one.

df = table.to_df_CSV(tempfile='local_file_path/filename.csv, tempkeep=True)  

 

Thanks,

Tom

sastpw
SAS Employee

The other simple thing is that if you are using sd2df(), you really should be using sd2df_CSV() since it performs better with larger data. BUT, I'd like to see the manual steps in the previous post so we can determine where the problem is. Again, sd2df_CSV() does all these things itself, but I'd like to do each step manually to diagnose what it really the problem.

 

Thanks!

Tom

kptkhv
Fluorite | Level 6
import saspy
import pandas as pd
from IPython.display import HTML



sas = saspy.SASsession(cfgname='ssh')

sas.saslib('libname', path="libpath")
dataset = sas.sasdata('table','libname',dsopts={'keep' : 'column'}) #I decided to test with only one column
dataset.to_csv('fullpath+filename') 

And that's all. Again, I tried to do the last step and it's taken the whole night and again nothing.

I can't believe that even the memory error hasn't occurred.

 

Also, even when I stopped the kernel and checked the remote server, I haven't found anything in the directory.  

 

sastpw
SAS Employee

So that shows that the export from SAS dataset to cvs, on the SAS server isn't happening, of finishing or something? That's the first step. That seems very strange.There's more going on here than the usual situation.I need more information to diagnose what's going on here. I often got on a web-ex (I can set that up) with customers when trying to post back and forth doesn't seem to get us to the answer very quickly. Is that something you could do? I could diagnose this much quicker interactively seeing what's going on. If not, we can continue this way. The first thing I might have you try is to set obs=10 so it can't possibly be a size issue and see if it still hangs or whatever it's doing.  I don't see any output from anything, so there's not much to go on.

dsopts={'keep' : 'column', 'obs': 10}) 

I'd like to see a number of other things to, but that's where it would be much more efficient to look at this interactively if that's an option.

I'm not sure if it's really the export to csv that's the issue, rather than something more general. 

 

Also, I would probably open in issue on saspy for this (https://github.com/sassoftware/saspy/issues), though as I'm already on this here, that can go either way.

 

Thanks,

Tom

kptkhv
Fluorite | Level 6

Okay, I think I got it. 

The SAS verison on the remote server isn't 9.4 or higher. It's SAS 9.2. So, maybe the problem caused because of this. 

But why it's okay with SAS magic cell? I can perform queries with %%SAS functionality and it returns me the result.

 Is it possible to do something with it? Or the only thing I need is to wait when my company will update the software? 🙂 

Tom
Super User Tom
Super User

Congratulations on getting saspy to work with ancient software.  https://blogs.sas.com/content/iml/2013/08/02/how-old-is-your-version-of-sas-release-dates-for-sas-so...

 

You should definitely ask in your company how you can get access to a version of SAS that is less than 10 years old.  Note SAS is a subscription license, so no extra cost to get the current version. 🙂 

 

Most likely all the nice python tools @sastpw was asking you to try using require features the old version of SAS doesn't have.

 

Why not just write your own SAS code?

kptkhv
Fluorite | Level 6
Thanks, Tom!
It's hard to explain why I can't just write SAS code 🙂 But now I know how old is my version of SAS. Great! Proud of saspy developers.
kptkhv
Fluorite | Level 6
Also, it can't convert sasdata to csv or dataframe even when I write this:
dsopts={'keep' : 'column', 'obs': 1})
sastpw
SAS Employee
Ok, so that confirms my suspicion that it's something more general than proc export. This won't work completely with 9.2. but, using the STDIO, over SSH or not, will work in many ways even with older versions of SAS. I can only guess at what the exact issue happens to be for specific case. That's where being able to try things would help.
Using the sas.saslog() method would be helpful to see what's happening. Setting results='text' on the SASsession object can maybe help for some things. There's just no guessing at what may trip up saspy for any given thing you submit since it's not made to run against that old of a version. So there will be things that work and things that don't.
Tom

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 3638 views
  • 6 likes
  • 4 in conversation