BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
juanvg1972
Pyrite | Level 9

Hi,

 

I have to work with SAS dataset and SAS proceses in a Hadoop environment, I have some question.

 

- If I want to use Hive from libname I need SAS ACCESS TO HADOOP license, is correct?

- If I want to use a filename sentence to HDFS is not necesary any license isn't it?

- What licence do I need to use proc hadoop?, and proc ds2?

- Las question: is there anything similar to work in Spark environment?

 

If there is a more apropiated forum for this question, let me know.

 

Thanks in advance,

 

Juan

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20

If you have a data step that can't be rewritten to SQL, I think currently rewriting it to PROC DS2  and use embedded process is your only option.

For SQL (given you data is stored in Hive or other relational format), you can make use of implicit or explicit SQL pass through.

Other procedure that implicit creates pass through SQL:

FREQ
MEANS
RANK [Hadoop with Hive .13 and later]
REPORT
SORT [Hadoop with Hive .13 and later]
SUMMARY
TABULATE
TRANSPOSE [Hadoop and Teradata only]

You probably find this matrix useful:

http://support.sas.com/documentation/cdl/en/acreldb/69580/HTML/default/viewer.htm#p13td0l6w0329rn15u...

Data never sleeps

View solution in original post

3 REPLIES 3
LinusH
Tourmaline | Level 20

Hive requires ACCESS to Hadoop, yes.

Filename HDFS is covered by Base SAS, yes.

PROC HADOOP also in Base SAS.

PROC DS2 runs in Base SAS, but if you want it to execute within Hadoop, you need the embedded process, and add on to ACCESS if I recall right.

Spark can be utilized in Data Loader for Hadoop. This only mentioned once at support.sas.com, so I presume it's in its prime.

Data never sleeps
juanvg1972
Pyrite | Level 9

Thank you for your help Linush. Very useful

 

I would like yo know which is the best way to execute a sas program in Hadoop If I want to take advantage of parallel

execution of Hadoop. For example a sas program who makes somo data step and procs with data on Hadoop, I want

to benefit of Hadoop cluster paralelization, which is the best way??

 

Thanks

LinusH
Tourmaline | Level 20

If you have a data step that can't be rewritten to SQL, I think currently rewriting it to PROC DS2  and use embedded process is your only option.

For SQL (given you data is stored in Hive or other relational format), you can make use of implicit or explicit SQL pass through.

Other procedure that implicit creates pass through SQL:

FREQ
MEANS
RANK [Hadoop with Hive .13 and later]
REPORT
SORT [Hadoop with Hive .13 and later]
SUMMARY
TABULATE
TRANSPOSE [Hadoop and Teradata only]

You probably find this matrix useful:

http://support.sas.com/documentation/cdl/en/acreldb/69580/HTML/default/viewer.htm#p13td0l6w0329rn15u...

Data never sleeps

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2332 views
  • 1 like
  • 2 in conversation