About moorsd

moorsd · ‎04-05-2017

Hi, I was wondering if anyone has come across the need to scan multiple workspace & batch server logs to extract datasets usage stats for a particular libname. i.e. over a monthly period which datasets were accessed, how many observation were read from a certain SAS library. Has anyone solved this? Did you do this in SAS or did you write some sort of batch script to accomplish this task? Any help on how to solve this quandry would be most welcome. Code examples would be even better... regards David FYI. I'm currently using SAS 9.4 M2 on AIX.

moorsd · ‎09-05-2016

Thanks for the help, and I have to apologise for not getting back to you sooner. I've not been in work for a while so didn't get around to accepting your solution.

moorsd · ‎08-01-2016

Would anyone know the best informat to turn the following character value: 25 July 2016 11:43:20 into a numeric datetime: thanks David

moorsd · ‎04-26-2016

I'm very pleased with my SAS swag. I've already put the bag and mug to good use 🙂

moorsd · ‎03-31-2016

Thanks for the update. Much appreciated. regards David

moorsd · ‎03-31-2016

I'm not sure this is posted in the right location but here goes.. Yesterday I got issued with my Acclaim Badges for SAS Certified Platform Administrator and Advanced Programmer. However I have a few outstanding. Does anyone know if these are being issued in batches? It seems odd to get issued with a couple but not all? Is anyone else waiting on the Acclaim badges for any SAS Certification they have completed? Have you received some but not all. Just wondering what the general SAS community thinks or knows anything about this? Thanks David

moorsd · ‎03-18-2016

Hi @LinusH Thanks for information. The 57m file is part of one monthly snapshot table. We may have over 1800+ tables (of differeing sizes) that we'll store in Hadoop. Currently the tables are stored in Dynamic Cluster tables in SPDS. Therefore, we plan to offload say 10 years of historic data to Hadoop and keep the most recent 'hot' data in SPDS. The data in Hadoop will get queried and merged with the data in SPDS data, so we need it to be as performant as possible. Hence my many questions re partioning, indexing & SPDE recently. Once again for your quick responses and suggestions, they're much appreciated. Cheers David

moorsd · ‎03-18-2016

You could try using %SYSFUNC and inputn/putn. See below: %let FGR = 3903918260; %let NEW_FGR = %sysfunc(putn(&FGR,comma16.)); %put NEW_FGR is: *** &NEW_FGR ***; Output the following to the Log: NEW_FGR is: *** 3,903,918,260 ***

moorsd · ‎03-18-2016

Does anyone know if SAS/ACCESS to Hadoop in 9.4 M3 supports creating partition files in Hive in a non-text format i.e. ORCFile? I've tried to specify creating a table as an ORCFile usign the LIBNAME option in SAS 9.4 M2: DBCREATE_TABLE_OPTS="stored as ORCFile" However, when I add the data set option to create a partition file: DBCREATE_TABLE_OPTS="PARTITIONED BY (x_facility_offer_cd VARCHAR(4))" The resulting HiveQL is showing the table is being created as a TEXTFILE: PARTITIONED BY (x_calling_system_cd VARCHAR(4)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE Also, it seems that indexes on Hive tables can't be created using the SAS/ACCESS to Hadoop in SAS 9.4 M2, is this fixed in M3? The indexing is an issue, as we can't partition ORCFile(which are pre-otimised with an internal index), so we're left with un-optimised TEXTFILE formats. Any help / info would be appreciated. Thanks David

moorsd · ‎03-18-2016

@LinusH Thanks for the information. Sadly as the customer has just spent a lot of time and money migrating from 9.2 to 9.4 M2 (which was the current distribution when the migration project started). The appitite to upgrade to M3 isn't there at the moment. I was just wondering we we could get the SPDE Hove SerDe working with 9.4 M2 as a stop gap. Yes, it would be a hack but Hive can use other SerDe's so why not the SAS 9.4 M3 one? A SAS upgrade is being considered by the client for 2017. However with a new version of SAS due soon (50 years of SAS, so time for a big announcement?) it might not be worth the bother going to M3, and see what a new version of SAS may bring to the table, Hadoop wise. Spark integration (i.e. SAS/ACCESS to Spark) would be nice. DataLoader 2.4 has started the ball rolling with Spark so maybe something like that may happen?

moorsd · ‎03-18-2016

Hi @LinusH The data we're using comprises of some narrow dimensional tables (with 18 vars), some transactional data (with 30 vars) and some snapshot tables (either 2500 or 5000 vars). All of these tables have approx 57m observations. The better compatility for SPDE come in the form of Indexes, formats and variable lengths being copied across from SPDS on SAS to SPDE on HDFS by default. With SPDE on HDFS you can also do BY-GROUP processing and the floating point digits are exactly replicated between SPDS and SPDE. Hive has a higher floating point representation for numeric data, as it supports different data types so this causes a difference. Also when retrieveing data from SPDE you don't have to set length statements. Data is pushed and pulled across in the same format. With hive if you pull character data back variables in SAS can be set to 32767. And if you have a lot of them like we do for the snapshot tables that have 5000+ variables this suddenly fills your temporary disk space in SAS. All of which would make transitioning from SPDS on Unix to Hadoop easier from a devloper and analysts point of view. The trade off for us will be at what point do you accept slower performance via SPDE vs Developer/analyst re-development time. Does this help? regards David

moorsd · ‎03-16-2016

Has anyone done much benchmarking of SPDE on HDFS vs Hive Tables. I've done some preliminary investigate and I'm finding that SPDE is approx 2-3x slower than using Hive tables. Initial tests include query data and writing data back to the SAS workspace server, writing data to HDFS, and joining different tables (of different sizes) between Hadoop and SAS. For Hive I'm using the ORCFile format with Cost Based Optimisation turned on and the execution engine is TEZ, so performance is good. I'm also guessing that the SPDE engine for HDFS will be using MapReduce rather than Tez? But I'm unsure how to confirm this when running a query via SAS. However, should querying performance for data via SPDE on HDFS be significantly slower than Hive? We're using SAS 9.4 M2 so we don't have Parallel write capabilities from M3, so I'd expect that might slow things down a little when writing to HDFS, but I was hoping that SPDE on HDFS would be a little more.. speedy!? Are there any easy performance improvments for SPDE other than the likes of: parallelread=yes parallelwrite=yes accelwhere=yes ?? Has anyone experimented IOBLOCKSIZE on HDFS? On the plus side, data compatibility between SAS data in either (SPD Server and BASE engine) is better on SPDE for HDFS than Hive. I just wonder if that's the trade off. Slower performance on SPDE but better compatibility? Has anyone else experienced something similar? Cheers, David

moorsd · ‎03-16-2016

Thanks to everyone who commented on this post. In the end I managed to install the new Vivaldi browser (based on Chrome) as this is only one i could sneak through the company firewall Now browsing and posting to SAS Communities is a joy regards David

moorsd · ‎03-16-2016

Hi Steve, As the SPDE SerDe is just two Jar files, can these be deployed manually to all the nodes in the Hadoop cluster, so this works with SAS 9.4 M2 Then in a Hive query just add something like: exec (ADD JAR /home/hadoop/SerDes/name_of_SAS_Serde.jar). Is it that you just need the SAS 9.4 M3 depot to get the SerDe from SAS, as we already have a SAS 9.4 M3 depot. Or is there something in the SAS 9.4 M3 SAS/ACCESS to Hadoop engine that is doing something different under the hood? Could you tell me more about the registration utility that SAS supplies. The ability to see SPDE files in Hive is something that would be ideal for my UK client, however it's taken nearly 12 months to migrate 10 SAS environments to SAS 9.4 M2 so the appetite to upgrade to SAS 9.4 M3 isn't there at the minute.. Thanks David

moorsd · ‎03-04-2016

Hi Steve, Many thanks for the information. It was very informative. But from what I've read around this topic you need to be running on SAS 9.4 M3 to have the access the SAS Hive SerDe for SPDE. The client I'm working at has SAS 9.4 M2 and Hive SerDe is not available for that release. So I guess we're stuck until they upgrade to SAS 9.4 M3 or some other version of SAS that may be released. Are you running a SAS & Hadoop workshop at this years Global Forum. If so, I'll pop along and say hello. thanks again, David

Online Status	Offline
Date Last Visited	‎04-10-2019 06:18 PM

Re: What do Gummy Bears and Wacky Weed have in Common?

Re: What to do in Denver?

Re: What to do in Denver?

Re: The SAS Supervisor

Re: Memory setup for HIVE in SAS

Re: Receiving AVRO Messages through KAFKA in a Spark Streaming Scala A...

Re: Your SASGF18 badge is waiting

Re: What to do in Denver?

Re: Your SASGF18 badge is waiting

Re: 22-322: Syntax error

What do Gummy Bears and Wacky Weed have in Common?

ESP 4.3 Failover Using Kafka

Re: What to do in Denver?

Re: SAS Global Forum App 2018 edition?

The SAS Supervisor

Re: Your SASGF18 badge is waiting

Re: What to do in Denver?

Re: I need your input on an SGF workshop - An Insider's Guide to SAS/A...

Re: I need your input on an SGF workshop - An Insider's Guide to SAS/A...

Re: Your Chance to Earn Communities Swag at SAS Global Forum!

Scan Workspace & Batch Server logs for dataset and library information

Re: datetime informat - help

datetime informat - help

Re: Your Chance to Earn Communities Swag at SAS Global Forum!

Re: Acclaim SAS Certification Badges

Acclaim SAS Certification Badges

Re: SPDE on HDFS vs Hive: Performance.

Re: Getting error when reformatting character data type to numeric dat...

SAS/ACCESS to Hadoop: Hive Partition Tables - SAS 9.4 M3

Re: Is it possible to register SPDE data stored on HDFS in metadata?

Re: SPDE on HDFS vs Hive: Performance.

SPDE on HDFS vs Hive: Performance.

Re: Internet Explorer 11 and SAS Communties Website - Unstable?

Re: Is it possible to register SPDE data stored on HDFS in metadata?

Re: Is it possible to register SPDE data stored on HDFS in metadata?

SAS Global Forum 2017

SAS Global Forum 2018

SAS Global Forum 2016