BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
msd83
Calcite | Level 5

Hi all,

I am working with Hadoop and SAS.

I have some questions about making copies of data files from Hadoop to my local system.

  1. If I were to use the Hadoop LIBNAME statement to connect to my Hadoop cluster, does anyone have any example code I could look at to show how I would make a copy of a data file to my local machine to do some analysis with SAS. E.g. load the file straight into a SAS library folder as a permanent dataset.
  2. I also have the same question about using the FILENAME Statement.

Any help would be much appreciated.

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
JBailey
Barite | Level 11

Hi,

Here is an example of using the FILENAME statement.

/* FILENAME to Hadoop Example */
/*    Show the mechanics of writing to, and reading from, HDFS */

FILENAME hdp1 hadoop 'test.txt'
              cfg="C:\Hadoop_cfg\hadoop.xml"
              user='bob';

/* Write the file to HDFS */
data _null_;
   file hdp1;
   put ' Test Test Test';
run;

/* Read the file from HDFS */
data test;
   infile hdp1;
   input textline $15.;
run;

Here is an example of the Hadoop procedure.

filename cfg 'C:\Hadoop_cfg\hadoop.xml';

/* setup the environment  */
/*    Create /user/bob/Books directory */
/*    Copy war_and_peace.txt to HDFS. */
/*    Copy moby_dick.txt     to HDFS. */

proc hadoop options=cfg username="bob"  verbose;
   HDFS MKDIR='/user/bob/Books';
   HDFS COPYFROMLOCAL="C:\Hadoop_data\moby_dick.txt"
                  OUT='/user/bob/Books/moby_dick.txt';
   HDFS COPYFROMLOCAL="C:\Hadoop_data\war_and_peace.txt"
                  OUT='/user/bob/Books/war_and_peace.txt';
run;

/* Run the Word Count sample program on Moby Dick */
/* hadoop-examples-1.2.0.1.3.0.0-96.jar */

proc hadoop options=cfg user="bob"  verbose;
  mapreduce input='/user/bob/Books/moby_dick.txt'
     output='/user/bob/outBook'
     jar='C:\Hadoop_examples\hadoop-examples-1.2.0.1.3.0.0-96.jar'
     outputkey="org.apache.hadoop.io.Text"
     outputvalue="org.apache.hadoop.io.IntWritable"
     reduce="org.apache.hadoop.examples.WordCount$IntSumReducer"
     combine="org.apache.hadoop.examples.WordCount$IntSumReducer"
     map="org.apache.hadoop.examples.WordCount$TokenizerMapper";
run;

/* Copy the output from the MapReduce job to the laptop */
/* Clean up the directories and files                   */

proc hadoop options=cfg username="bob" password="Bogus" verbose;
   HDFS COPYTOLOCAL="/user/bob/outBook/part-r-00000"
                OUT="C:\Hadoop_data\output\moby_dick_wordcount.txt" OVERWRITE;
   HDFS delete='/user/bob/.staging';
   HDFS delete='/user/bob/Books';
   HDFS delete='/user/bob/outBook';
run;

Here are examples of using SAS/ACCESS to Hadoop.

libname myhdp hadoop server=hdp13 SUBPROTOCOL=hive2 user=myuser ;

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

/* Display SQL being sent to the database */

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

/* CTAS */
proc sql;
   connect to hadoop (server=hdp13 user=myuser subprotocol=hive2);

    execute (create table myuser_store_cnt
                 row format delimited fields terminated by '\001'
                 stored as textfile

                as
                   select customer_rk, count(*) as total_orders
                     from order_fact
                    group by customer_rk) by hadoop;

   disconnect from hadoop;
quit;

/* Create a SAS data set from Hadoop data */

proc sql;
   create table work.join_test as (
      select c.customer_rk, o.store_id
        from myhdp.customer_dim c
           , myhdp.order_fact o
       where c.customer_rk = o.customer_rk);
quit;


/* PROC FREQ example */
data myhdp.myuser_class;
   set sashelp.class;
run;

proc freq data=myhdp.sasxjb_class2;
   tables sex * age;
   where age > 9;
   title 'Catchy Title Goes Here';
run;

/* Clean up */

proc sql;
   connect to hadoop (server=duped user=myuser subprotocol=hive2);

   execute (drop table order_fact) by hadoop;
   execute (drop table customer_dim) by hadoop;
   execute (drop table myuser_store_cnt) by hadoop;
   execute (drop table myuser_class) by hadoop;

   drop table work.join_test;

   disconnect from hadoop;

quit;

View solution in original post

9 REPLIES 9
Patrick
Opal | Level 21

Hi

I didn't have the chance yet to get my hands on Hadoop but from my understanding this is (kind-of) from a SAS perspective nothing else than another MPP database. So you access a table in Hadoop via the hadoop engine and you can copy data to somewhere else using the engine for the target (eg. the SAS engine).

Especially with newer stuff you should always post your SAS version and OS. Here the link for a SAS 9.3 libname statement using SAS/Access to Hadoop: SAS/ACCESS(R) 9.3 for Relational Databases: Reference, Second Edition

You can create a "copy" of the data to a directory which your SAS Server can access - so for your local c:-drive you would need a locally installed SAS. You normally will use a remote SAS Server and if so then try and copy the file to a location which you can map from your PC (if you want it "local").

From the link I've posted:

libname Source hadoop server=hxpduped  user=hadoop_usr password=hadoop_pwd;

libname target 'path to a folder accessible by the SAS server';

data target.mytable;

     set source.<name of table in hadoop>;

run;

Alternatively use Proc Copy or Proc SQL create table.

For filenames check out this link which also provides sample code: SAS(R) 9.3 Statements: Reference

msd83
Calcite | Level 5

Thank you very much for your advice Patrick.

JBailey
Barite | Level 11

Hi,

Here is an example of using the FILENAME statement.

/* FILENAME to Hadoop Example */
/*    Show the mechanics of writing to, and reading from, HDFS */

FILENAME hdp1 hadoop 'test.txt'
              cfg="C:\Hadoop_cfg\hadoop.xml"
              user='bob';

/* Write the file to HDFS */
data _null_;
   file hdp1;
   put ' Test Test Test';
run;

/* Read the file from HDFS */
data test;
   infile hdp1;
   input textline $15.;
run;

Here is an example of the Hadoop procedure.

filename cfg 'C:\Hadoop_cfg\hadoop.xml';

/* setup the environment  */
/*    Create /user/bob/Books directory */
/*    Copy war_and_peace.txt to HDFS. */
/*    Copy moby_dick.txt     to HDFS. */

proc hadoop options=cfg username="bob"  verbose;
   HDFS MKDIR='/user/bob/Books';
   HDFS COPYFROMLOCAL="C:\Hadoop_data\moby_dick.txt"
                  OUT='/user/bob/Books/moby_dick.txt';
   HDFS COPYFROMLOCAL="C:\Hadoop_data\war_and_peace.txt"
                  OUT='/user/bob/Books/war_and_peace.txt';
run;

/* Run the Word Count sample program on Moby Dick */
/* hadoop-examples-1.2.0.1.3.0.0-96.jar */

proc hadoop options=cfg user="bob"  verbose;
  mapreduce input='/user/bob/Books/moby_dick.txt'
     output='/user/bob/outBook'
     jar='C:\Hadoop_examples\hadoop-examples-1.2.0.1.3.0.0-96.jar'
     outputkey="org.apache.hadoop.io.Text"
     outputvalue="org.apache.hadoop.io.IntWritable"
     reduce="org.apache.hadoop.examples.WordCount$IntSumReducer"
     combine="org.apache.hadoop.examples.WordCount$IntSumReducer"
     map="org.apache.hadoop.examples.WordCount$TokenizerMapper";
run;

/* Copy the output from the MapReduce job to the laptop */
/* Clean up the directories and files                   */

proc hadoop options=cfg username="bob" password="Bogus" verbose;
   HDFS COPYTOLOCAL="/user/bob/outBook/part-r-00000"
                OUT="C:\Hadoop_data\output\moby_dick_wordcount.txt" OVERWRITE;
   HDFS delete='/user/bob/.staging';
   HDFS delete='/user/bob/Books';
   HDFS delete='/user/bob/outBook';
run;

Here are examples of using SAS/ACCESS to Hadoop.

libname myhdp hadoop server=hdp13 SUBPROTOCOL=hive2 user=myuser ;

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

/* Display SQL being sent to the database */

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

/* CTAS */
proc sql;
   connect to hadoop (server=hdp13 user=myuser subprotocol=hive2);

    execute (create table myuser_store_cnt
                 row format delimited fields terminated by '\001'
                 stored as textfile

                as
                   select customer_rk, count(*) as total_orders
                     from order_fact
                    group by customer_rk) by hadoop;

   disconnect from hadoop;
quit;

/* Create a SAS data set from Hadoop data */

proc sql;
   create table work.join_test as (
      select c.customer_rk, o.store_id
        from myhdp.customer_dim c
           , myhdp.order_fact o
       where c.customer_rk = o.customer_rk);
quit;


/* PROC FREQ example */
data myhdp.myuser_class;
   set sashelp.class;
run;

proc freq data=myhdp.sasxjb_class2;
   tables sex * age;
   where age > 9;
   title 'Catchy Title Goes Here';
run;

/* Clean up */

proc sql;
   connect to hadoop (server=duped user=myuser subprotocol=hive2);

   execute (drop table order_fact) by hadoop;
   execute (drop table customer_dim) by hadoop;
   execute (drop table myuser_store_cnt) by hadoop;
   execute (drop table myuser_class) by hadoop;

   drop table work.join_test;

   disconnect from hadoop;

quit;

msd83
Calcite | Level 5

Thanks JBailey for your advice. I am changing what I want I want to do now.

The LIBNAME is perfect for what I am trying to do now.

I have a SAS programme where the datasets were stored on my local machine but in the future it will be stored on the Hadoop cluster.

Is it at simple as changing the library from referencing the local library to the Hadoop cluster.

Is there anything else to consider?

Thanks a lot.

JBailey
Barite | Level 11

You are quite welcome. If you want to copy data from a SAS data set to a Hadoop cluster you would do something similar to this.

/* assuming that you are using HiveServer2 on your Hadoop cluster */

libname myhdp hadoop server=myhdp  subprotocol=hive2 user=myuser password=mypasswd ;

libname  mysas 'C:\SASDataLivesHere';

proc append base=myhdp.mydata

                     data=mysas.mydata;

run;

msd83
Calcite | Level 5

Thanks again JBailey.

AnnaBrown
Community Manager

Glad you're getting some good advice, msd83. Feel free to mark answers as "Helpful" or "Correct" as you see fit. This is useful for other community members who may have the same problem.

Best,

Anna


Join us for SAS Community Trivia
SAS Bowl XXIX, The SAS Hackathon
Wednesday, March 8, 2023, at 10 AM ET | #SASBowl

JBailey
Barite | Level 11

My pleasure. I am happy that I could help.

msd83
Calcite | Level 5

OK sure Anna.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 8376 views
  • 7 likes
  • 4 in conversation