Solved: Re: Copying files from a Hadoop cluster to SAS

msd83 · Posted 04-09-2014 07:03 AM

Hi all,

I am working with Hadoop and SAS.

I have some questions about making copies of data files from Hadoop to my local system.

If I were to use the Hadoop LIBNAME statement to connect to my Hadoop cluster, does anyone have any example code I could look at to show how I would make a copy of a data file to my local machine to do some analysis with SAS. E.g. load the file straight into a SAS library folder as a permanent dataset.
I also have the same question about using the FILENAME Statement.

Any help would be much appreciated.

Thanks!

JBailey · Posted 04-16-2014 01:50 PM

Hi,

Here is an example of using the FILENAME statement.

/* FILENAME to Hadoop Example */
/* Show the mechanics of writing to, and reading from, HDFS */

FILENAME hdp1 hadoop 'test.txt'
cfg="C:\Hadoop_cfg\hadoop.xml"
user='bob';

/* Write the file to HDFS */
data _null_;
file hdp1;
put ' Test Test Test';
run;

/* Read the file from HDFS */
data test;
infile hdp1;
input textline $15.;
run;

Here is an example of the Hadoop procedure.

filename cfg 'C:\Hadoop_cfg\hadoop.xml';

/* setup the environment */
/*    Create /user/bob/Books directory */
/*    Copy war_and_peace.txt to HDFS. */
/*    Copy moby_dick.txt     to HDFS. */

proc hadoop options=cfg username="bob" verbose;
   HDFS MKDIR='/user/bob/Books';
   HDFS COPYFROMLOCAL="C:\Hadoop_data\moby_dick.txt"
                  OUT='/user/bob/Books/moby_dick.txt';
   HDFS COPYFROMLOCAL="C:\Hadoop_data\war_and_peace.txt"
                  OUT='/user/bob/Books/war_and_peace.txt';
run;

/* Run the Word Count sample program on Moby Dick */
/* hadoop-examples-1.2.0.1.3.0.0-96.jar */

proc hadoop options=cfg user="bob" verbose;
mapreduce input='/user/bob/Books/moby_dick.txt'
     output='/user/bob/outBook'
     jar='C:\Hadoop_examples\hadoop-examples-1.2.0.1.3.0.0-96.jar'
     outputkey="org.apache.hadoop.io.Text"
     outputvalue="org.apache.hadoop.io.IntWritable"
     reduce="org.apache.hadoop.examples.WordCount$IntSumReducer"
     combine="org.apache.hadoop.examples.WordCount$IntSumReducer"
     map="org.apache.hadoop.examples.WordCount$TokenizerMapper";
run;

/* Copy the output from the MapReduce job to the laptop */
/* Clean up the directories and files */

proc hadoop options=cfg username="bob" password="Bogus" verbose;
   HDFS COPYTOLOCAL="/user/bob/outBook/part-r-00000"
                OUT="C:\Hadoop_data\output\moby_dick_wordcount.txt" OVERWRITE;
   HDFS delete='/user/bob/.staging';
   HDFS delete='/user/bob/Books';
   HDFS delete='/user/bob/outBook';
run;

Here are examples of using SAS/ACCESS to Hadoop.

libname myhdp hadoop server=hdp13 SUBPROTOCOL=hive2 user=myuser ;

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

/* Display SQL being sent to the database */

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

/* CTAS */
proc sql;
connect to hadoop (server=hdp13 user=myuser subprotocol=hive2);

    execute (create table myuser_store_cnt
                 row format delimited fields terminated by '\001'
                 stored as textfile

                as
                   select customer_rk, count(*) as total_orders
                     from order_fact
                    group by customer_rk) by hadoop;

disconnect from hadoop;
quit;

/* Create a SAS data set from Hadoop data */

proc sql;
   create table work.join_test as (
      select c.customer_rk, o.store_id
        from myhdp.customer_dim c
           , myhdp.order_fact o
       where c.customer_rk = o.customer_rk);
quit;

/* PROC FREQ example */
data myhdp.myuser_class;
set sashelp.class;
run;

proc freq data=myhdp.sasxjb_class2;
   tables sex * age;
   where age > 9;
   title 'Catchy Title Goes Here';
run;

/* Clean up */

proc sql;
connect to hadoop (server=duped user=myuser subprotocol=hive2);

   execute (drop table order_fact) by hadoop;
   execute (drop table customer_dim) by hadoop;
   execute (drop table myuser_store_cnt) by hadoop;
   execute (drop table myuser_class) by hadoop;

drop table work.join_test;

disconnect from hadoop;

quit;

View solution in original post

Patrick · Posted 04-09-2014 08:03 AM

Hi

I didn't have the chance yet to get my hands on Hadoop but from my understanding this is (kind-of) from a SAS perspective nothing else than another MPP database. So you access a table in Hadoop via the hadoop engine and you can copy data to somewhere else using the engine for the target (eg. the SAS engine).

Especially with newer stuff you should always post your SAS version and OS. Here the link for a SAS 9.3 libname statement using SAS/Access to Hadoop: SAS/ACCESS(R) 9.3 for Relational Databases: Reference, Second Edition

You can create a "copy" of the data to a directory which your SAS Server can access - so for your local c:-drive you would need a locally installed SAS. You normally will use a remote SAS Server and if so then try and copy the file to a location which you can map from your PC (if you want it "local").

From the link I've posted:

libname Source hadoop server=hxpduped user=hadoop_usr password=hadoop_pwd;

libname target 'path to a folder accessible by the SAS server';

data target.mytable;

set source.<name of table in hadoop>;

run;

Alternatively use Proc Copy or Proc SQL create table.

For filenames check out this link which also provides sample code: SAS(R) 9.3 Statements: Reference

msd83 · Posted 04-09-2014 11:28 AM

Thank you very much for your advice Patrick.

JBailey · Posted 04-16-2014 01:50 PM

Hi,

Here is an example of using the FILENAME statement.

/* FILENAME to Hadoop Example */
/* Show the mechanics of writing to, and reading from, HDFS */

FILENAME hdp1 hadoop 'test.txt'
cfg="C:\Hadoop_cfg\hadoop.xml"
user='bob';

/* Write the file to HDFS */
data _null_;
file hdp1;
put ' Test Test Test';
run;

/* Read the file from HDFS */
data test;
infile hdp1;
input textline $15.;
run;

Here is an example of the Hadoop procedure.

filename cfg 'C:\Hadoop_cfg\hadoop.xml';

/* setup the environment */
/*    Create /user/bob/Books directory */
/*    Copy war_and_peace.txt to HDFS. */
/*    Copy moby_dick.txt     to HDFS. */

proc hadoop options=cfg username="bob" verbose;
   HDFS MKDIR='/user/bob/Books';
   HDFS COPYFROMLOCAL="C:\Hadoop_data\moby_dick.txt"
                  OUT='/user/bob/Books/moby_dick.txt';
   HDFS COPYFROMLOCAL="C:\Hadoop_data\war_and_peace.txt"
                  OUT='/user/bob/Books/war_and_peace.txt';
run;

/* Run the Word Count sample program on Moby Dick */
/* hadoop-examples-1.2.0.1.3.0.0-96.jar */

proc hadoop options=cfg user="bob" verbose;
mapreduce input='/user/bob/Books/moby_dick.txt'
     output='/user/bob/outBook'
     jar='C:\Hadoop_examples\hadoop-examples-1.2.0.1.3.0.0-96.jar'
     outputkey="org.apache.hadoop.io.Text"
     outputvalue="org.apache.hadoop.io.IntWritable"
     reduce="org.apache.hadoop.examples.WordCount$IntSumReducer"
     combine="org.apache.hadoop.examples.WordCount$IntSumReducer"
     map="org.apache.hadoop.examples.WordCount$TokenizerMapper";
run;

/* Copy the output from the MapReduce job to the laptop */
/* Clean up the directories and files */

proc hadoop options=cfg username="bob" password="Bogus" verbose;
   HDFS COPYTOLOCAL="/user/bob/outBook/part-r-00000"
                OUT="C:\Hadoop_data\output\moby_dick_wordcount.txt" OVERWRITE;
   HDFS delete='/user/bob/.staging';
   HDFS delete='/user/bob/Books';
   HDFS delete='/user/bob/outBook';
run;

Here are examples of using SAS/ACCESS to Hadoop.

libname myhdp hadoop server=hdp13 SUBPROTOCOL=hive2 user=myuser ;

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

/* Display SQL being sent to the database */

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

/* CTAS */
proc sql;
connect to hadoop (server=hdp13 user=myuser subprotocol=hive2);

    execute (create table myuser_store_cnt
                 row format delimited fields terminated by '\001'
                 stored as textfile

                as
                   select customer_rk, count(*) as total_orders
                     from order_fact
                    group by customer_rk) by hadoop;

disconnect from hadoop;
quit;

/* Create a SAS data set from Hadoop data */

proc sql;
   create table work.join_test as (
      select c.customer_rk, o.store_id
        from myhdp.customer_dim c
           , myhdp.order_fact o
       where c.customer_rk = o.customer_rk);
quit;

/* PROC FREQ example */
data myhdp.myuser_class;
set sashelp.class;
run;

proc freq data=myhdp.sasxjb_class2;
   tables sex * age;
   where age > 9;
   title 'Catchy Title Goes Here';
run;

/* Clean up */

proc sql;
connect to hadoop (server=duped user=myuser subprotocol=hive2);

   execute (drop table order_fact) by hadoop;
   execute (drop table customer_dim) by hadoop;
   execute (drop table myuser_store_cnt) by hadoop;
   execute (drop table myuser_class) by hadoop;

drop table work.join_test;

disconnect from hadoop;

quit;

msd83 · Posted 04-20-2014 10:56 AM

Thanks JBailey for your advice. I am changing what I want I want to do now.

The LIBNAME is perfect for what I am trying to do now.

I have a SAS programme where the datasets were stored on my local machine but in the future it will be stored on the Hadoop cluster.

Is it at simple as changing the library from referencing the local library to the Hadoop cluster.

Is there anything else to consider?

Thanks a lot.

JBailey · Posted 04-21-2014 03:14 PM

You are quite welcome. If you want to copy data from a SAS data set to a Hadoop cluster you would do something similar to this.

/* assuming that you are using HiveServer2 on your Hadoop cluster */

libname myhdp hadoop server=myhdp subprotocol=hive2 user=myuser password=mypasswd ;

libname mysas 'C:\SASDataLivesHere';

proc append base=myhdp.mydata

data=mysas.mydata;

run;

msd83 · Posted 04-22-2014 08:43 AM

Thanks again JBailey.

AnnaBrown · Posted 04-22-2014 09:58 AM

Glad you're getting some good advice, msd83. Feel free to mark answers as "Helpful" or "Correct" as you see fit. This is useful for other community members who may have the same problem.

Best,

Anna

Access SAS Innovate on-demand content now!

JBailey · Posted 04-22-2014 11:32 AM

My pleasure. I am happy that I could help.

msd83 · Posted 04-23-2014 06:23 AM

OK sure Anna.

Registration is open