About JBailey

JBailey · ‎04-16-2014

Hi, Here is an example of using the FILENAME statement. /* FILENAME to Hadoop Example */ /* Show the mechanics of writing to, and reading from, HDFS */ FILENAME hdp1 hadoop 'test.txt' cfg="C:\Hadoop_cfg\hadoop.xml" user='bob'; /* Write the file to HDFS */ data _null_; file hdp1; put ' Test Test Test'; run; /* Read the file from HDFS */ data test; infile hdp1; input textline $15.; run; Here is an example of the Hadoop procedure. filename cfg 'C:\Hadoop_cfg\hadoop.xml'; /* setup the environment */ /* Create /user/bob/Books directory */ /* Copy war_and_peace.txt to HDFS. */ /* Copy moby_dick.txt to HDFS. */ proc hadoop options=cfg username="bob" verbose; HDFS MKDIR='/user/bob/Books'; HDFS COPYFROMLOCAL="C:\Hadoop_data\moby_dick.txt" OUT='/user/bob/Books/moby_dick.txt'; HDFS COPYFROMLOCAL="C:\Hadoop_data\war_and_peace.txt" OUT='/user/bob/Books/war_and_peace.txt'; run; /* Run the Word Count sample program on Moby Dick */ /* hadoop-examples-1.2.0.1.3.0.0-96.jar */ proc hadoop options=cfg user="bob" verbose; mapreduce input='/user/bob/Books/moby_dick.txt' output='/user/bob/outBook' jar='C:\Hadoop_examples\hadoop-examples-1.2.0.1.3.0.0-96.jar' outputkey="org.apache.hadoop.io.Text" outputvalue="org.apache.hadoop.io.IntWritable" reduce="org.apache.hadoop.examples.WordCount$IntSumReducer" combine="org.apache.hadoop.examples.WordCount$IntSumReducer" map="org.apache.hadoop.examples.WordCount$TokenizerMapper"; run; /* Copy the output from the MapReduce job to the laptop */ /* Clean up the directories and files */ proc hadoop options=cfg username="bob" password="Bogus" verbose; HDFS COPYTOLOCAL="/user/bob/outBook/part-r-00000" OUT="C:\Hadoop_data\output\moby_dick_wordcount.txt" OVERWRITE; HDFS delete='/user/bob/.staging'; HDFS delete='/user/bob/Books'; HDFS delete='/user/bob/outBook'; run; Here are examples of using SAS/ACCESS to Hadoop. libname myhdp hadoop server=hdp13 SUBPROTOCOL=hive2 user=myuser ; options sastrace=',,,d' sastraceloc=saslog nostsuffix; /* Display SQL being sent to the database */ options sastrace=',,,d' sastraceloc=saslog nostsuffix; /* CTAS */ proc sql; connect to hadoop (server=hdp13 user=myuser subprotocol=hive2); execute (create table myuser_store_cnt row format delimited fields terminated by '\001' stored as textfile as select customer_rk, count(*) as total_orders from order_fact group by customer_rk) by hadoop; disconnect from hadoop; quit; /* Create a SAS data set from Hadoop data */ proc sql; create table work.join_test as ( select c.customer_rk, o.store_id from myhdp.customer_dim c , myhdp.order_fact o where c.customer_rk = o.customer_rk); quit; /* PROC FREQ example */ data myhdp.myuser_class; set sashelp.class; run; proc freq data=myhdp.sasxjb_class2; tables sex * age; where age > 9; title 'Catchy Title Goes Here'; run; /* Clean up */ proc sql; connect to hadoop (server=duped user=myuser subprotocol=hive2); execute (drop table order_fact) by hadoop; execute (drop table customer_dim) by hadoop; execute (drop table myuser_store_cnt) by hadoop; execute (drop table myuser_class) by hadoop; drop table work.join_test; disconnect from hadoop; quit;

JBailey · ‎04-09-2014

Hi, Can you try something similar to this and post your SAS log? libname h1 hadoop server=hxpduped port=10000 user=myusr1 password=mypwd1 schema=default; options sastrace=',,,d' sastraceloc=saslog nostsuffix; data h1.cars; set sashelp.cars; run; select count(*) from h1.cars;

JBailey · ‎04-03-2014

Hi, Here is what I have for CDH 4.5. avro-1.7.4.jar commons-cli-1.2.jar commons-collections-3.2.1.jar commons-configuration-1.6.jar commons-httpclient-3.1.jar commons-logging-1.1.1.jar guava-11.0.2.jar hadoop-auth-2.0.0-cdh4.5.0.jar hadoop-common-2.0.0-cdh4.5.0.jar hadoop-core-2.0.0-mr1-cdh4.5.0.jar hadoop-hdfs-2.0.0-cdh4.5.0.jar hive-exec-0.10.0-cdh4.5.0.jar hive-jdbc-0.10.0-cdh4.5.0.jar hive-metastore-0.10.0-cdh4.5.0.jar hive-service-0.10.0-cdh4.5.0.jar libfb303-0.9.0.jar log4j-1.2.17.jar pig-0.11.0-cdh4.5.0-withouthadoop.jar pig-0.11.0-cdh4.5.0.jar protobuf-java-2.4.0a.jar slf4j-api-1.6.1.jar slf4j-log4j12-1.6.1.jar

JBailey · ‎03-14-2014

Testing SAS functionality using the Hadoop vendor VMs is hit-or-miss. Sometimes they work well. Other times they don't. The problem with the issuing an error message is that SAS doesn't know it is an error. It appears to work. The only indication is an empty table, we could check for that, but it would slow down processing. So we don't do that. In addition, we don't know what the cause of the error is. For example, I had this same-exact behavior happen today. Looked like it worked; table was empty. This was on a real cluster, so it wasn't caused by the same underlying problem. It was a JAR file issue (this is the cause of the vast majority of the problems I have experienced). SAS/ACCESS Interface to Impala is currently a Limited Availability (LA) release. It is not feature complete yet. It will be Generally Available (GA) sometime in July. There is an approval process for the LA release. If you are interested in trying it out, contact your SAS account executive. Their is an approval process, but I am pretty sure that you will get in.

JBailey · ‎03-13-2014

Hi, Just so you know, this problem became an obsession for me (I literally spent days working on it). I know more about CDH because of it. Plus, it was a lot of fun. Thank you so much for posting your question here. The good news: it is now working. The bad news: fixing it is really complicated. High-five Woody from Cloudera - he actually fixed it. Here is the problem... SAS connects just fine using a LIBNAME statement. The Hive server contacts the NameNode (NN). The NN creates the directory entry and puts a filename in metadata. This is why you can see the table name and other metadata. The NN then sends the DataNode (DN) hostname, and port number (50010), to the SAS machine. The DN sends the name “localhost” back to the SAS machine. This causes SAS to try to connect to a DN on the same machine it is running on. Hilarity ensues. Since there is no DataNode the data does not make it to the VM. Hence the empty Hive table. The fix involved giving the machine a real hostname - we used clouderavm - and then changing all occurrences of "localhost.localdomain" to new hostname. This required a lot of work. Woody says it is unreasonable to expect to be able to do this on your own. That makes me want to try it;) Watching him do this was eye-opening. He was using unpublished tricks and was flying around the machine like a mad-man. It was humbling to observe someone who truly knows what he is doing. I feel fortunate for having seen him work. Woody is planning on redoing the VM at some point. In the mean time, if you still want to use the VM with SAS, feel free to contact me and we can discuss how to proceed.

JBailey · ‎03-06-2014

Hi platonsan, Were you able to get this to work? I have reproduced it and going to work on it. If you have it fixed, I would love to hear what you did to fix it.

JBailey · ‎03-06-2014

I am glad you were able to sort-it-out. It gets a <airquote>complicated</airquote> sometimes. If you need help in the future, don't hesitate to contact us.

JBailey · ‎02-27-2014

SAS/ACCESS Interface to Vertica is ODBC based.

JBailey · ‎02-27-2014

Hi Mklbor, The SAS/ACCESS Interface to Vertica is ODBC-based. The ODBC driver is included with the SAS/ACCESS engine. <-- I was wrong about this. I had to install the Vertica client (basically the ODBC driver) in order to get this to work. We, SAS, don't ship the ODBC driver with the SAS/ACCESS engine. If you have any questions, please don't hesitate to ask. Best wishes, Jeff SAS/ACCESS Product Manager Message was edited - mea culpa by: Jeff Bailey

JBailey · ‎02-05-2014

This problem is typically caused by a configuration issue. Specifically, HDFS. Verify that HDFS is configured properly. It could be the hdfs port number, lack of write permissions, or the /tmp sticky bit issue. This is covered starting on page 555 in the SAS/ACCESS doc. http://support.sas.com/documentation/cdl/en/acreldb/66787/PDF/default/acreldb.pdf Just curious, how did you configure Hive2 in your virtual machine?

JBailey · ‎01-31-2014

Hi @CTorres, You are correct. It is worth mentioning that the SAS/ACCESS Interface to Microsoft SQL Server includes the DataDirect ODBC driver for UNIX/Linux. For Windows you would use ACCESS to ODBC and the ODBC driver that is included with Windows. Or OLE DB, as you point out above.

JBailey · ‎01-31-2014

Have you displayed the LIBNAME statement that is being generated so that you know you are using the same userid? It may be picking up something entirely different. @LinusH mentions authentication domains - that is a very good area to review.

JBailey · ‎01-24-2014

It was late Friday afternoon, but then it's usually late Friday when these calls come in. The voice on the phone hurriedly said, "It is taking over an hour to delete all the rows from my Teradata table! Does Teradata have a TRUNCATE command?" It was a consulting friend who was working on a Customer's Teradata Server. This meant that dropping and recreating the table was not an option. My friend needed the Teradata version of the ORACLE TRUNCATE command. One problem, there isn't a Teradata version of that command. What's so special about TRUNCATE? Well, the Oracle TRUNCATE command isn't logged which means there is very little overhead -- it is fast. Wicked fast! The command also resets the high water mark for the table. Oracle will read empty tablespace area, so resetting the high water mark speeds-up queries. The downside is that since there is no logging, it is an unrecoverable action. Fortunately there is a way to very rapidly delete all the rows from a Teradata table. It isn't as simple as TRUNCATE but it does work well. This method bypasses the Teradata journaling facility -- which makes it non recoverable. The official name for this is the Fast Path DELETE. The trick here is the DELETE command has to occur during a single transaction. You can invoke the Fast Path Delete via explicit pass-through using ANSI or Teradata mode. Here are some example: Teradata Mode Fast Path DELETE example: proc sql; connect to teradata (server=MyServ database=MyDB MODE=TERADATA user=MyUserid password=mypasswd); execute (BEGIN TRANSACTION; DELETE mytable ALL; END TRANSACTION; ) by teradata; quit; ANSI Mode Fast Path DELETE example: proc sql; connect to teradata (server=MyServ database=MyDB user=MyUserid password=mypasswd MODE=ANSI); execute (DELETE mytable ALL; COMMIT; ) by teradata; quit; There is a paper on SAS' web site that covers this topic as well as many others. You can find it here.

JBailey · ‎01-21-2014

Linus is right, this is a question about SAS/ACCESS engines and what kind of connections have been made. This is an expansive topic. It would be impossible to cover it all in a post. That being said ... The answer does depend on the data source. For example, SAS data sets have no provisions for simultaneous update (really transactions). You would have to place the SAS data set under the control of SAS Share for that. All the databases you mention have provisions for simultaneous update, I think. It also matters what is being done. For example, in many database systems if I am running a load utility the table is locked. In that case any job that tries to write will fail (or perhaps sit there a while). DB2 has mechanisms to allow reading during loads, but many others do not. If both jobs are doing updates, they probably will run correctly but could hang due to a deadlock. There is always a possibility that the "other" job will UPDATE/DELETE a row so you don't get the results that you expect. This isn't corruption, it's the way it works. If read consistency is important you can lock the table using READ_LOCK_TYPE= SAS LIBNAME statement option. It is best to not do this; usually dirty reads (reading uncommitted data) is fine. You have to be careful with locking. If a table is locke, the second job will sit and wait until the lock is released or the wait threshold is reached. If the threshold is reached the job will fail. Taking unnecessary locks is a great way to get your DBA to visit you. To know the exact behavior you need to know your database and what you are asking it to do. The SAS/ACCESS Reference is a great place to start. There are LIBNAME statement options that enable you to control this behavior. Also, talk to your DBA regarding your intent. They will help you out. Here is a link to the SAS 9.4 SAS/ACCESS reference. http://support.sas.com/documentation/cdl/en/acreldb/66787/PDF/default/acreldb.pdf

JBailey · ‎01-15-2014

Anika, Can you provide an example of what you are seeing? Which specific data types is this happening to? Best wishes, Jeff

Online Status	Offline
Date Last Visited	3 weeks ago