BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
pdwivedi08
Fluorite | Level 6

Hi,

 

I'm able to run my SAS code on Azure databricks and the output file is saved on the SAS ODA. How can I move that file from SAS ODA server to some place on Azure blob storage. I'm writing the code on Azure Databricks notebook. 

1 ACCEPTED SOLUTION

Accepted Solutions
sastpw
SAS Employee

Well, if the remote file didn't exist or was inaccessible, the download would have failed with an error. It it couldn't open and write to the file of the destination filesystem, it would have failed and provided an error. It seems to have worked, and I can't see the destination filesystem, so I don't know what you're seeing. Can you provide output of what's in this filesystem you mounted? Can you write to it yourself from python? I have no info to go on. Also, if the local filename you provide is a directory, then it will copy the remote file to that dir, naming it the same as it was on the remote filesystem. If the filename you provide on the local side, isn't a directory, then that's the name of the file you will create on the local side. So, is /mnt/test a directory on that filesystem? If not, then you copied person.sas7bdat from the directory on the remote side, to a file named test in the /mnt directory on the destination (local) side. But again, I can't see anything on your system to see what happened.

Run and following provide the output, that can help.

 

import os

os.listdir('/mnt')

os.listdir('/mnt/test')

open('/tmp/test/file1', 'wb').write(b'hi there')

open('/tmp/test/file1', 'r').read()

os.listdir('/mnt/test')

os.stat('/mnt/test/file1')

 

sas.dirlist('/home/u61780777/output')

sas.file_info('/home/u61780777/output/person.sas7bdat')

 

res = sas.download('/mnt/test', '/home/u61780777/output/person.sas7bdat')

print(sas.lastlog())

 

os.listdir('/mnt')

os.listdir('/mnt/test')

os.stat('/mnt/test/person.sas7bdat')

os.stat('/mnt/test')

 

Thanks,

Tom

BTW, I typed most of that in by hand (I don't have those paths to run the code); if there are typos, sorry, just correct them if so.

 

View solution in original post

14 REPLIES 14
webart999ARM
Quartz | Level 8

To move a file from the SAS ODA server to Azure blob storage, you will need to use the Azure Databricks File System (DBFS) commands.
First, you will need to mount the Azure blob storage location as a DBFS directory using the following code:

 

dbutils.fs.mount(
source = "wasbs://<CONTAINER_NAME>@<STORAGE_ACCOUNT_NAME>.blob.core.windows.net",
mount_point = "/mnt/<MOUNT_NAME>",
extra_configs = {"fs.azure.account.key.<STORAGE_ACCOUNT_NAME>.blob.core.windows.net": "<ACCESS_KEY>"})

Replace <CONTAINER_NAME>, <STORAGE_ACCOUNT_NAME>, <MOUNT_NAME>, and <ACCESS_KEY> with the appropriate values for your Azure blob storage location.

 

Once the Azure blob storage location has been mounted as a DBFS directory, you can use the dbutils.fs.cp command to copy the file from the SAS ODA server to the Azure blob storage location. The code for this would be:

 

dbutils.fs.cp("/mnt/sasoda/output_file.csv", "/mnt/<MOUNT_NAME>/output_file.csv")

Replace <MOUNT_NAME> with the mount name you specified in the previous step.

 

After running this code, the file should be successfully copied from the SAS ODA server to the Azure blob storage location. You can verify this by checking the Azure blob storage location in the Azure Portal.

 

pdwivedi08
Fluorite | Level 6

I tried the command but getting below error:

 

java.io.FileNotFoundException: /home/u61780777/output/person.sas7bdat

 

I used the below command on databricks:

dbutils.fs.cp("/home/u61780777/output/person.sas7bdat", "dbfs:/mnt/test/person.csv")

 

I was able to successfully mount to the DBFS though. 

 

The file is present in the SAS ODA:

pdwivedi08_0-1670453172166.png

 

 

Please advise as to what am I missing here.

 

pdwivedi08
Fluorite | Level 6

forgot to tag @webart999ARM  Please check my comments above. I need more help as the file is not found.

webart999ARM
Quartz | Level 8

It looks like the file you are trying to copy from the SAS ODA server is not at the specified location. The error message states that the file was not found at /home/u61780777/output/person.sas7bdat, which suggests that the file is not present at that location.

Make sure that the file exists at the specified location on the SAS ODA server and try the dbutils.fs.cp command again. If the file still cannot be found, you may need to check the file path and verify that it is correct.

You can also try using the dbutils.fs.ls command to list the contents of the directory and verify that the file is present. The code for this would be:

dbutils.fs.ls("/home/u61780777/output")

This will list the contents of the /home/u61780777/output directory and you should be able to see if the file person.sas7bdat is present there. If it is not, you will need to find the correct location of the file and update the dbutils.fs.cp command accordingly.

 

Please let me know if it works.

pdwivedi08
Fluorite | Level 6

@webart999ARM 

 

In your 1st comment, you mentioned to use the command:

dbutils.fs.cp("/mnt/sasoda/output_file.csv", "/mnt/<MOUNT_NAME>/output_file.csv")

whereas the location of the file in my SAS ODA is:

pdwivedi08_0-1670479514040.png

i.e. /home/u61780777/output/person.sas7bdat ; therefore, should I use -

 

 dbutils.fs.cp("/home/u61780777/output/person.sas7bdat", "/mnt/<MOUNT_NAME>/output_file.csv")    OR

dbutils.fs.cp("/mnt/sasoda/home/u61780777/output/person.sas7bdat", "/mnt/<MOUNT_NAME>/output_file.csv")

 

Please note that my SAS ODA is standalone login which I was connecting from Azure Databricks using SASPY connection string. 

 

I'm confused because  in your command - dbutils.fs.cp("/mnt/sasoda/output_file.csv", "/mnt/<MOUNT_NAME>/output_file.csv"), there is /mnt in both source and destination whereas my source location is not on azure. My source location is /home/u61780777/output/person.sas7bdat on SAS ODA. 

sastpw
SAS Employee

I'm not familiar with the dbutils in databricks, but I'm not sure how it would be able to access the SAS severs filesystem to be able to copy a file from there to the filesystem you mounted in databricks. But, since you were able to mount the output location, I would guess that Python could access files there; open/read/write to that location? If so, and since you're accessing ODA via SASPy, you could just use the download method of SASPy to copy the files from the SAS Server to your mounted filesystem in your python client. See the method doc here: https://sassoftware.github.io/saspy/api.html#saspy.SASsession.download

 

Hopefully that's all you need, since you were able to mount the filesystem you wanted to write to.

pdwivedi08
Fluorite | Level 6

@sastpw 

 

Since I'm able to mount the azure blob where I want to move the file, the download might work. However, when I run the download method, the command runs successfully but nothing is happening, i.e. I don't see any downloaded file in the 'Downloads' folder in my local machine. Could you please let me know if I'm missing something here. Do I've to first convert the sas file to csv/txt format before downloading and if yes, how to do that programmatically:

 

myFile = sas.download(r'C:\Users\PDwivedi\Downloads', '/home/u61780777/output')

 

 

pdwivedi08
Fluorite | Level 6

@sastpw 

 

some more information to add:

 

the sas.download() runs fine (please see the screenprint below too) and the log shows that:

pdwivedi08_0-1670536255422.png

 

sastpw
SAS Employee

I'm confused. I though you were running in databricks, and mounted some cloud storage to download the file to. Where are you running saspy? If you're running in databricks, the python process can't write to the filesystem on your PC. You said you wanted the file written to the cloud storage you mounted to the python process in databricks. What happens when you try to download into the filesystem you mounted, which you said you wanted the file written to? You can run python on your local PC, and you can then download files to it. Which are you doing? Also, SASPy's upload/download are binary transfers; you don't need to convert files to anything. They are copied as is.

Also, just for my info, what version of SAPSy are you using?

 

Thanks,

Tom

pdwivedi08
Fluorite | Level 6

@sastpw 

 

Yeah, earlier I was trying to copy to local which was wrong. Now i'm trying to copy to DBFS on Azure but still same result - the method executes successfully but nothing copies. 

Also, I'm using SASPY saspy-4.4.1 version. Kindly advise if I'm missing something. 

 

 

pdwivedi08_0-1670626382651.png

 

sastpw
SAS Employee

Well, if the remote file didn't exist or was inaccessible, the download would have failed with an error. It it couldn't open and write to the file of the destination filesystem, it would have failed and provided an error. It seems to have worked, and I can't see the destination filesystem, so I don't know what you're seeing. Can you provide output of what's in this filesystem you mounted? Can you write to it yourself from python? I have no info to go on. Also, if the local filename you provide is a directory, then it will copy the remote file to that dir, naming it the same as it was on the remote filesystem. If the filename you provide on the local side, isn't a directory, then that's the name of the file you will create on the local side. So, is /mnt/test a directory on that filesystem? If not, then you copied person.sas7bdat from the directory on the remote side, to a file named test in the /mnt directory on the destination (local) side. But again, I can't see anything on your system to see what happened.

Run and following provide the output, that can help.

 

import os

os.listdir('/mnt')

os.listdir('/mnt/test')

open('/tmp/test/file1', 'wb').write(b'hi there')

open('/tmp/test/file1', 'r').read()

os.listdir('/mnt/test')

os.stat('/mnt/test/file1')

 

sas.dirlist('/home/u61780777/output')

sas.file_info('/home/u61780777/output/person.sas7bdat')

 

res = sas.download('/mnt/test', '/home/u61780777/output/person.sas7bdat')

print(sas.lastlog())

 

os.listdir('/mnt')

os.listdir('/mnt/test')

os.stat('/mnt/test/person.sas7bdat')

os.stat('/mnt/test')

 

Thanks,

Tom

BTW, I typed most of that in by hand (I don't have those paths to run the code); if there are typos, sorry, just correct them if so.

 

pdwivedi08
Fluorite | Level 6

@sastpw 

 

Thank you very much for your inputs !! While running the code given by you, realized that I should've used '/dbfs' prefix while accessing any directory. After using '/dbfs' prefix, I was able to run the sas.download() successfully. 

 

The downloaded filed on the dbfs however is of sas7bdat format which is not easy to read. Therefore, I want to convert this sas file to csv/txt format before downloading it. Could you please advise some commands which I can execute on databricks notebook to do this conversion. 

sastpw
SAS Employee

Well, that's good. Now you can access that filesystem you're trying to use. If you don't want the sas7bdat file to begin with, why not import it to your python environment as a dataframe and then write that out however you want? You can use sasdata2dataframe() to do that, followed by pandas to write out any format it supports.

 

import saspy

sas = saspy.SASsession()

# I don't know what libref you are using for that library where your 'person' data set is, so I'll make up something:

sas.saslib('x', path='where/person/ds/exists')

df = sas.sd2df('person', 'x')   # this convers the remote SAS data set to a local Pandas dataframe

df.to_csv('/dbfs/person.csv')  # or whatever the right path really is that you figured out. And whatever file format you want that pandas writes out.

 

There are other ways to do this too, but this may be the best for you. The full API doc for SASPy is here

Tom

pdwivedi08
Fluorite | Level 6

Thank you for your inputs @sastpw  !!

 

I guess the process you told will also work smoothly and I've done similar in the past (not from the Azure though but from local) but I used the below code (I found this on the SAS community) and it worked pretty well:

 

sas_conFile = '''
libname MySASLib "/home/u61780777/output/";


PROC EXPORT Data = MySASLib.person

FILE = "/home/u61780777/output/person.csv"

DBMS = CSV

REPLACE

;

RUN;
'''

conFile = sas.submit(sas_conFile)
print(conFile['LOG'])
displayHTML(conFile['LST'])

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1751 views
  • 0 likes
  • 3 in conversation