About Aarya

Aarya · ‎01-07-2019

Hi LinusH, Thank you for sharing the link. Datasets from SAS will be created as tables in Hive and users will be going to access them directly. Only criterion is, If a SAS dataset of 10GB takes 15minutes to create a hive table using the script mentioned in my post, I want to bring that down to 7-8 minutes using any method will work for me.

Aarya · ‎01-07-2019

Thanks Tom,LinusH,and alexal. I was able to load the data after making the changes suggested and it worked. Next step is how can i speed up the process? It took 1 minute 25 seconds for 0.3G. I have to load upto 500G dataset and i checked with the team and they do not have "SAS® High-Performance Analytics license". Thanks for all your help.

Aarya · ‎01-06-2019

oh Tom, not sure how to thank you, you explained in such a simple and clear way. I was banging my head to understand what am I trying here, your explanation made me think otherwise. A big Thank you from the bottom of my heart. You just made my day. Sorry to bug you,but few clarifications. " So the physical files in that directory should be in all lower case with an extension of .sas7bdat." :--> I just need to specify the table name without ".sas7bdat" extension right? 1. If i need to create multiple hive tables, can i do that ? something like data myhive.myhivetable ; set mysas.mysasdataset ; data myhive1.myhivetable1 ; set mysas.mysasdataset; 2. Based on your experience, which is better approach? approach being followed now OR converting SAS to .csv and then loading it to hive? 3. SAS is on SERVER A and Hadoop is on SERVER B. Can i call the script from server B(Hadoop) rather than server A? If csv is a better approach, I would call it from sever B and create an external hive table and point the location from where I am creating csv. 4. Finally, I may need to load SAS dataset in 500GB size, is there any way to speed it up? I have access to only UNIX box and I am not sure what all options are possible. Once again, Thanks Tom for taking your time and replying in detail. Good day Sir..

Aarya · ‎01-06-2019

Thanks LinusH. I am able to create the table but data is not getting populated. I found out in sas community that it could be because of missing property <property> <name>dfs.client.use.datanode.hostname</name> <value>true</value> </property> I do not have permission to add that property. I will ask admin to make those changes. Is there any way to speed up the process?

Aarya · ‎01-04-2019

Hi Alexal, I just started working on SAS from past 2 days and i went through the links you provided me and i did not understand anything. Below is the script that I am using options set=SAS_HADOOP_RESTFUL=1; options set=SAS_HADOOP_JAR_PATH=<jar path>; options set=SAS_HADOOP_CONFIG_PATH=<config path>; options nofmterr; %let svr = %NRSTR('server.abc.com'); %let stng = %NRSTR('stored as parquet'); libname aaa hadoop server=&svr hdfs_tempdir='/tmp/sastmp' user = <username> password = <pxxx> schema=<schema name> port 10000 DBCREATE_TABLE_OPTS=&stng subprotocol=hive2; libname xxx '/workspace/abc/xyz'; data aaa.test; set xxx.abc.test; run; LOCATION: libname xxx '/workspace/abc/xyz'; This is the place where they drop ".sas7bdat" files and this location keeps changing. Based on that information, we make modidfication. I ran this script on a 0.3GB sas7bdat file and it created a test table successfully in HIVE. But when i query Hive table, select count(1) from test; I do not see any records. How a 0.3GB file does not contain any data? am i missing something here? how can i check the contents of sas7bdat file? Any help will be greatly appreciated.

Aarya · ‎01-04-2019

Thank you Alexal. I will go through the post and check.

Aarya · ‎01-04-2019

Thank you for your response Alexal. however, I am still not clear with the following. where the create table script is mentioed? Data step creates a table in your Hadoop library. There is no table script. :-->snippet i posted in my post creates a hive table and is working in DEV environment. creating a hive table needs a syntax. how it is creating a table is still not clear? where libname 'testingres' will be stored in UNIX machine? That libname uses SAS/ACCESS Interface to Hadoop. So the data will be stored in Hadoop.:---> I understand that data gets stored in Hadoop. but how do i create "testingres" libname and where it has to be present? In machine that has SAS or in Hadoop installed environment? I have SAS on server A and Hadoop on Server B. How can i add parallel option to the above script?

Aarya · ‎01-04-2019

Hello All, I am new to SAS and trying to understand how to load data(table) from SAS to HDFS. Code is written by someone and i am finding it difficult to understand. Can someone please help me in understanding the statements ? Snippet: libname testingres hadoop server=servername hdfs_tempdir = 'some location' user='abc' password='abcd' schema='abc' port=10000 DBCREATE_TABLE_OPTS='STORED AS PARQUET' subprotocol=hive2; data testingres.a100; set sashelp.cars; run; Above snippet creates a table in Hive with the name a100. What i want to understand is, 1. where the create table script is mentioed? 2. where libname 'testingres' will be stored in UNIX machine? 3. How does it know from where we need to pick the table that has to be ingested into HIVE. 4. In the above snippet "sashelp.cars" data will be loaded into a100 table right? 5. At present, to load 10GB of data, it takes around 18 minutes. Is there any 'parallel' option to increase the speed? OR how can i speedup the process. NOTE: I am using SAS 8 Version.

Online Status	Offline
Date Last Visited	‎01-14-2019 07:37 AM

Re: Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Re: Data ingestion from SAS to Hadoop: clarifications

Data ingestion from SAS to Hadoop: clarifications