Parquet is a binary compressed columnar data storage format. SAS has no means of reading this format directly; SAS can only do it via other applications such as Hive or Impala. This is similar to SAS not being a able to read a SQL Server file directly, it can only do so by using the SQL Server APIs to communicate with SQL Server.
How did you do the download? Did you download an HDFS file or did you do some kind of export procedure? If you just downloaded an HDFS file, you cannot read it directly with SAS.
However, if the file is small enough to download, why not just read the data using Proc SQL from within SAS and save the data as a SAS dataset? Typically a "Create Table As" (CTAS) style SQL query can be used for this.
For example:
PROC SQL NOPRINT;
CREATE TABLE MyLib.SAS_Data_from_Hadoop AS
SELECT * FROM Hdp.Some_Hadoop_Table;
QUIT;
You may need to qualify the Select with a Where of course, and you may need to apply Length and Format statements just as you would with any other SAS dataset.
The above technique, the CTAS style query, is usually the best way to work with data using SAS if you need a local copy of the data.
Jim
Hi Jim,
Thanks for your quick response
The file is a processed file with some ETL's done and then .parquet is created.The .parquet file lands on the landing zone which is connected to linux machine.
Within your give resolution you are trying to use HDP as a library which I think connects to Hadoop cluster.I dont want to connect to Hadoop cluster as I already have parquet file on my storage.
Thanks,Vipin
> I dont want to connect to Hadoop cluster
You have no choice if you use SAS. SAS cannot read that file.
My interpretation is that the OP doesn't have a SAS/ACCESS to Hadoop licence, hence using SAS to make the copy would not be an (simple) option.
Work around would be to make an export to a flat file instead (of course the benefits of Parquet efficient storage is getting lost):
https://stackoverflow.com/questions/39419975/how-to-copy-and-convert-parquet-files-to-csv
In OP: I do have Sas access to hadoop licensed
Good link though.
I jumped to a conclusion, and didn't read properly. Double sins!
But then it makes no sense why to read local copy wen you can access the Hadoop file directly...?
If Hadoop's performance is slow, then it might make sense to have a local copy of some sort. However, HDFS data cannot be read by SAS, so it does not make sense, in my opinion, to just copy the HDFS file to one's local machine.
Some ideas that might make sense:
Jim
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.