SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Copying files from a Hadoop cluster to SAS

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

Copying files from a Hadoop cluster to SAS

Hi all,

I am working with Hadoop and SAS.

I have some questions about making copies of data files from Hadoop to my local system.

  1. If I were to use the Hadoop LIBNAME statement to connect to my Hadoop cluster, does anyone have any example code I could look at to show how I would make a copy of a data file to my local machine to do some analysis with SAS. E.g. load the file straight into a SAS library folder as a permanent dataset.
  2. I also have the same question about using the FILENAME Statement.

Any help would be much appreciated.

Thanks!


Accepted Solutions
Solution
‎04-16-2014 01:50 PM
SAS Employee
Posts: 215

Re: Copying files from a Hadoop cluster to SAS

Hi,

Here is an example of using the FILENAME statement.

/* FILENAME to Hadoop Example */
/*    Show the mechanics of writing to, and reading from, HDFS */

FILENAME hdp1 hadoop 'test.txt'
              cfg="C:\Hadoop_cfg\hadoop.xml"
              user='bob';

/* Write the file to HDFS */
data _null_;
   file hdp1;
   put ' Test Test Test';
run;

/* Read the file from HDFS */
data test;
   infile hdp1;
   input textline $15.;
run;

Here is an example of the Hadoop procedure.

filename cfg 'C:\Hadoop_cfg\hadoop.xml';

/* setup the environment  */
/*    Create /user/bob/Books directory */
/*    Copy war_and_peace.txt to HDFS. */
/*    Copy moby_dick.txt     to HDFS. */

proc hadoop options=cfg username="bob"  verbose;
   HDFS MKDIR='/user/bob/Books';
   HDFS COPYFROMLOCAL="C:\Hadoop_data\moby_dick.txt"
                  OUT='/user/bob/Books/moby_dick.txt';
   HDFS COPYFROMLOCAL="C:\Hadoop_data\war_and_peace.txt"
                  OUT='/user/bob/Books/war_and_peace.txt';
run;

/* Run the Word Count sample program on Moby Dick */
/* hadoop-examples-1.2.0.1.3.0.0-96.jar */

proc hadoop options=cfg user="bob"  verbose;
  mapreduce input='/user/bob/Books/moby_dick.txt'
     output='/user/bob/outBook'
     jar='C:\Hadoop_examples\hadoop-examples-1.2.0.1.3.0.0-96.jar'
     outputkey="org.apache.hadoop.io.Text"
     outputvalue="org.apache.hadoop.io.IntWritable"
     reduce="org.apache.hadoop.examples.WordCount$IntSumReducer"
     combine="org.apache.hadoop.examples.WordCount$IntSumReducer"
     map="org.apache.hadoop.examples.WordCount$TokenizerMapper";
run;

/* Copy the output from the MapReduce job to the laptop */
/* Clean up the directories and files                   */

proc hadoop options=cfg username="bob" password="Bogus" verbose;
   HDFS COPYTOLOCAL="/user/bob/outBook/part-r-00000"
                OUT="C:\Hadoop_data\output\moby_dick_wordcount.txt" OVERWRITE;
   HDFS delete='/user/bob/.staging';
   HDFS delete='/user/bob/Books';
   HDFS delete='/user/bob/outBook';
run;

Here are examples of using SAS/ACCESS to Hadoop.

libname myhdp hadoop server=hdp13 SUBPROTOCOL=hive2 user=myuser ;

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

/* Display SQL being sent to the database */

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

/* CTAS */
proc sql;
   connect to hadoop (server=hdp13 user=myuser subprotocol=hive2);

    execute (create table myuser_store_cnt
                 row format delimited fields terminated by '\001'
                 stored as textfile

                as
                   select customer_rk, count(*) as total_orders
                     from order_fact
                    group by customer_rk) by hadoop;

   disconnect from hadoop;
quit;

/* Create a SAS data set from Hadoop data */

proc sql;
   create table work.join_test as (
      select c.customer_rk, o.store_id
        from myhdp.customer_dim c
           , myhdp.order_fact o
       where c.customer_rk = o.customer_rk);
quit;


/* PROC FREQ example */
data myhdp.myuser_class;
   set sashelp.class;
run;

proc freq data=myhdp.sasxjb_class2;
   tables sex * age;
   where age > 9;
   title 'Catchy Title Goes Here';
run;

/* Clean up */

proc sql;
   connect to hadoop (server=duped user=myuser subprotocol=hive2);

   execute (drop table order_fact) by hadoop;
   execute (drop table customer_dim) by hadoop;
   execute (drop table myuser_store_cnt) by hadoop;
   execute (drop table myuser_class) by hadoop;

   drop table work.join_test;

   disconnect from hadoop;

quit;

View solution in original post


All Replies
Respected Advisor
Posts: 4,173

Re: Copying files from a Hadoop cluster to SAS

Hi

I didn't have the chance yet to get my hands on Hadoop but from my understanding this is (kind-of) from a SAS perspective nothing else than another MPP database. So you access a table in Hadoop via the hadoop engine and you can copy data to somewhere else using the engine for the target (eg. the SAS engine).

Especially with newer stuff you should always post your SAS version and OS. Here the link for a SAS 9.3 libname statement using SAS/Access to Hadoop: SAS/ACCESS(R) 9.3 for Relational Databases: Reference, Second Edition

You can create a "copy" of the data to a directory which your SAS Server can access - so for your local c:-drive you would need a locally installed SAS. You normally will use a remote SAS Server and if so then try and copy the file to a location which you can map from your PC (if you want it "local").

From the link I've posted:

libname Source hadoop server=hxpduped  user=hadoop_usr password=hadoop_pwd;

libname target 'path to a folder accessible by the SAS server';

data target.mytable;

     set source.<name of table in hadoop>;

run;

Alternatively use Proc Copy or Proc SQL create table.

For filenames check out this link which also provides sample code: SAS(R) 9.3 Statements: Reference

Contributor
Posts: 23

Re: Copying files from a Hadoop cluster to SAS

Thank you very much for your advice Patrick.

Solution
‎04-16-2014 01:50 PM
SAS Employee
Posts: 215

Re: Copying files from a Hadoop cluster to SAS

Hi,

Here is an example of using the FILENAME statement.

/* FILENAME to Hadoop Example */
/*    Show the mechanics of writing to, and reading from, HDFS */

FILENAME hdp1 hadoop 'test.txt'
              cfg="C:\Hadoop_cfg\hadoop.xml"
              user='bob';

/* Write the file to HDFS */
data _null_;
   file hdp1;
   put ' Test Test Test';
run;

/* Read the file from HDFS */
data test;
   infile hdp1;
   input textline $15.;
run;

Here is an example of the Hadoop procedure.

filename cfg 'C:\Hadoop_cfg\hadoop.xml';

/* setup the environment  */
/*    Create /user/bob/Books directory */
/*    Copy war_and_peace.txt to HDFS. */
/*    Copy moby_dick.txt     to HDFS. */

proc hadoop options=cfg username="bob"  verbose;
   HDFS MKDIR='/user/bob/Books';
   HDFS COPYFROMLOCAL="C:\Hadoop_data\moby_dick.txt"
                  OUT='/user/bob/Books/moby_dick.txt';
   HDFS COPYFROMLOCAL="C:\Hadoop_data\war_and_peace.txt"
                  OUT='/user/bob/Books/war_and_peace.txt';
run;

/* Run the Word Count sample program on Moby Dick */
/* hadoop-examples-1.2.0.1.3.0.0-96.jar */

proc hadoop options=cfg user="bob"  verbose;
  mapreduce input='/user/bob/Books/moby_dick.txt'
     output='/user/bob/outBook'
     jar='C:\Hadoop_examples\hadoop-examples-1.2.0.1.3.0.0-96.jar'
     outputkey="org.apache.hadoop.io.Text"
     outputvalue="org.apache.hadoop.io.IntWritable"
     reduce="org.apache.hadoop.examples.WordCount$IntSumReducer"
     combine="org.apache.hadoop.examples.WordCount$IntSumReducer"
     map="org.apache.hadoop.examples.WordCount$TokenizerMapper";
run;

/* Copy the output from the MapReduce job to the laptop */
/* Clean up the directories and files                   */

proc hadoop options=cfg username="bob" password="Bogus" verbose;
   HDFS COPYTOLOCAL="/user/bob/outBook/part-r-00000"
                OUT="C:\Hadoop_data\output\moby_dick_wordcount.txt" OVERWRITE;
   HDFS delete='/user/bob/.staging';
   HDFS delete='/user/bob/Books';
   HDFS delete='/user/bob/outBook';
run;

Here are examples of using SAS/ACCESS to Hadoop.

libname myhdp hadoop server=hdp13 SUBPROTOCOL=hive2 user=myuser ;

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

/* Display SQL being sent to the database */

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

/* CTAS */
proc sql;
   connect to hadoop (server=hdp13 user=myuser subprotocol=hive2);

    execute (create table myuser_store_cnt
                 row format delimited fields terminated by '\001'
                 stored as textfile

                as
                   select customer_rk, count(*) as total_orders
                     from order_fact
                    group by customer_rk) by hadoop;

   disconnect from hadoop;
quit;

/* Create a SAS data set from Hadoop data */

proc sql;
   create table work.join_test as (
      select c.customer_rk, o.store_id
        from myhdp.customer_dim c
           , myhdp.order_fact o
       where c.customer_rk = o.customer_rk);
quit;


/* PROC FREQ example */
data myhdp.myuser_class;
   set sashelp.class;
run;

proc freq data=myhdp.sasxjb_class2;
   tables sex * age;
   where age > 9;
   title 'Catchy Title Goes Here';
run;

/* Clean up */

proc sql;
   connect to hadoop (server=duped user=myuser subprotocol=hive2);

   execute (drop table order_fact) by hadoop;
   execute (drop table customer_dim) by hadoop;
   execute (drop table myuser_store_cnt) by hadoop;
   execute (drop table myuser_class) by hadoop;

   drop table work.join_test;

   disconnect from hadoop;

quit;

Contributor
Posts: 23

Re: Copying files from a Hadoop cluster to SAS

Thanks JBailey for your advice. I am changing what I want I want to do now.

The LIBNAME is perfect for what I am trying to do now.

I have a SAS programme where the datasets were stored on my local machine but in the future it will be stored on the Hadoop cluster.

Is it at simple as changing the library from referencing the local library to the Hadoop cluster.

Is there anything else to consider?

Thanks a lot.

SAS Employee
Posts: 215

Re: Copying files from a Hadoop cluster to SAS

You are quite welcome. If you want to copy data from a SAS data set to a Hadoop cluster you would do something similar to this.

/* assuming that you are using HiveServer2 on your Hadoop cluster */

libname myhdp hadoop server=myhdp  subprotocol=hive2 user=myuser password=mypasswd ;

libname  mysas 'C:\SASDataLivesHere';

proc append base=myhdp.mydata

                     data=mysas.mydata;

run;

Contributor
Posts: 23

Re: Copying files from a Hadoop cluster to SAS

Thanks again JBailey.

Community Manager
Posts: 564

Re: Copying files from a Hadoop cluster to SAS

Glad you're getting some good advice, msd83. Feel free to mark answers as "Helpful" or "Correct" as you see fit. This is useful for other community members who may have the same problem.

Best,

Anna

SAS Employee
Posts: 215

Re: Copying files from a Hadoop cluster to SAS

My pleasure. I am happy that I could help.

Contributor
Posts: 23

Re: Copying files from a Hadoop cluster to SAS

OK sure Anna.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 4920 views
  • 7 likes
  • 4 in conversation