BookmarkSubscribeRSS Feed
vipinj765
Calcite | Level 5
I have local downloaded copy of parquet file on linux server. Instead of connecting to hadoop cluster, i want to read the local version. Is there any way to do this in SAS 9.4m6 ? I do have Sas access to hadoop licensed
9 REPLIES 9
ChrisNZ
Tourmaline | Level 20

Parquet is a binary compressed columnar data storage format. SAS has no means of reading this format directly; SAS can only do it via other applications such as Hive or Impala. This is similar to SAS not being a able to read a SQL Server file directly, it can only do so by using the SQL Server APIs to communicate with SQL Server.

vipinj765
Calcite | Level 5
Thanks for your reply!
jimbarbour
Meteorite | Level 14

How did you do the download?  Did you download an HDFS file or did you do some kind of export procedure?  If you just downloaded an HDFS file, you cannot read it directly with SAS.  

 

However, if the file is small enough to download, why not just read the data using Proc SQL from within SAS and save the data as a SAS dataset?  Typically a "Create Table As" (CTAS) style SQL query can be used for this.

 

For example:

PROC SQL NOPRINT;
  CREATE TABLE MyLib.SAS_Data_from_Hadoop AS
    SELECT * FROM Hdp.Some_Hadoop_Table;
QUIT;

You may need to qualify the Select with a Where of course, and you may need to apply Length and Format statements just as you would with any other SAS dataset.

 

The above technique, the CTAS style query, is usually the best way to work with data using SAS if you need a local copy of the data.

 

Jim

vipinj765
Calcite | Level 5

Hi Jim,

Thanks for your quick response

The file is a processed file with some ETL's done and then .parquet is created.The .parquet file lands on the landing zone which is connected to linux machine.

 

Within your give resolution you are trying to use HDP as a library which I think connects to Hadoop cluster.I dont want to connect to Hadoop cluster as I already have parquet file on my  storage.

 

Thanks,Vipin

ChrisNZ
Tourmaline | Level 20

> I dont want to connect to Hadoop cluster

You have no choice if you use SAS. SAS cannot read that file.

LinusH
Tourmaline | Level 20

My interpretation is that the OP doesn't have a SAS/ACCESS to Hadoop licence, hence using SAS to make the copy would not be an (simple) option.

Work around would be to make an export to a flat file instead (of course the benefits of Parquet efficient storage is getting lost):

https://stackoverflow.com/questions/39419975/how-to-copy-and-convert-parquet-files-to-csv

 

Data never sleeps
ChrisNZ
Tourmaline | Level 20

In OP:  I do have Sas access to hadoop licensed

Good link though.

LinusH
Tourmaline | Level 20

I jumped to a conclusion, and didn't read properly. Double sins!

But then it makes no sense why to read local copy wen you can access the Hadoop file directly...?

Data never sleeps
jimbarbour
Meteorite | Level 14

@LinusH,

 

If Hadoop's performance is slow, then it might make sense to have a local copy of some sort.  However, HDFS data cannot be read by SAS, so it does not make sense, in my opinion, to just copy the HDFS file to one's local machine.  

 

Some ideas that might make sense:

  1. Use SAS and the Hadoop Libname engine to copy the Hadoop table into a local SAS dataset or table.  This is the best option in terms of performance and convenience with SAS.
  2. Export the HDFS data from Hadoop into a csv or other delimited file and then copy the csv file to one's local machine.  I think this makes less sense because one then has to re-import the data into SAS, but this would at least work.  Trying to read raw HDFS data locally without a local Hadoop instance will not work at all.

Jim

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 10379 views
  • 4 likes
  • 4 in conversation