About Pav

Pav · ‎08-22-2024

Hi Mark, In this article on lookup, I can propose a hybrid approach: For big tables like customer and orders, the sort + datastep merge is the fasted technique and can stay below a 2Gb Memory limit. In order to avoid multiple data step merge steps, you could include 2 hash tables to lookup country and product during the data step merge of orders and customer. That way you get the best of 2 worlds: fast on disk lookups with data step merge for bigger tables and in memory hash tables for lookups with smaller tables. I also pay attention to the length of the 2 variables product and name, since the function CATX defaults to a length of 200bytes, this could increase IO. So I initialize the variables Product to $85 and Name to $50 with a length statement, just not to waste any diskspace when writing them out. Total Elapsed Time for this on my computer is between 00:15 and 00:20 secs, same as FEDSQL and sometimes even faster. With the option fullstimer, you can also monitor Memory usage, next to elapsed time and cpu time. With Proc FEDSQL _method, you can see that it is using a HashJoin in the backend. Also impressive is that the FedSQL reports using limited amounts of RAM Memory (around 130Mb). My Sort + Merge uses for sorting up to 1520Mb or RAM Memory (i have a configuration options of memsize of 8G, sortsize setting of 4G), the data step merge uses only 2Mb or Ram Memory. The Hash Complete method only uses 103Mb of Memory and completes in 00:43secs. Proc SQL _method shows that the lookups use 4 sorts in the backend, and can use up to 00:46secs of time when having up to 3150Mb of Memory at its disposal to perform the sorts and join merge. In more constrained environments where less Memory is available or where disk IO is not very fast SSD disks, then the performance of this step can degrade to more than 03min. /****************************************************************************** DATA step merge - Hash Hybrid Final ******************************************************************************/ /****************************************************************************** Preliminaries: Clear out the WORK library ******************************************************************************/ proc datasets library=work kill nolist nodetails; run; quit; %let StartTime=%qsysfunc(time()); proc sort data=lookup.orders out=orders; by CustomerID; run; data dsmergehash_final; if _n_=1 then do; if 0 then do; /* Get the lookup table variables into the PDV */ set lookup.countrycodes; set lookup.Products(keep=ProductID Item Manufacturer); /*set lookup.Customers(keep=CustomerID FirstName MI LastName);*/ end; /* Initialize the CountryCodes hash */ declare hash codes(dataset:'lookup.countrycodes'); codes.definekey('Country'); codes.definedata('Country_Name'); codes.defineDone(); /* Initialize the Products hash */ declare hash hProd(dataset:'lookup.Products '); hProd.definekey('ProductID'); hProd.definedata('Item','Manufacturer'); hProd.defineDone(); end; merge orders (in=ordered) lookup.Customers (keep=CustomerID FirstName MI LastName) ; by CustomerID; rc=codes.find(); rc=hProd.find(); length Product $85; Product=catx(' by ',Item,Manufacturer); length Name $50; Name=catx('. ',catx(' ',FirstName,MI),LastName); if ordered; drop rc Item Manufacturer FirstName MI LastName; run; %let EndTime=%qsysfunc(time()); %let ElapsedTime=%qsysfunc(putn(%sysevalf(%superq(EndTime)-%superq(StartTime)),e8601tm.)); %put NOTE: &=ElapsedTime; /****************************************************************************** Data validation ******************************************************************************/ title "DSMERGEHash_FINAL Info"; proc sql; select memname, nlobs format=comma12., nvar from dictionary.tables where libname='WORK' and memname ='DSMERGEHASH_FINAL' ; quit; title;

Pav · ‎06-25-2019

Hi, The ANOVA does a pairwise comparison and uses an adjusted P-value (Adj-P Value). First we compare the SalesPrices of houses with Fair Heating Quality systems. In the table of comparisons the adjusted P value for comparing SalesPrice of Fair Heating Qualtiy houses is significantly different between Winter and Summer. All the other comparisons are not significant for P<0.05 Then they compare the houses with Good Heating Quality: The conclusion should have said that there was a significant mean sale price difference for houses with good heating systems between the spring months and the Summer months. We will have to adapt that in the course notes. Thank you for pointing that out. Paul

Pav · ‎06-18-2015

Hi Kristin, If you have a local copy of SAS Installed and it says in the log: NOTE: This session is executing on the X64_7PRO platform. Then it is 64 bit. The Microsoft Office is usually 32 bit and doesn't provide to SAS (64bit) a compatible driver for Excel and Access files. You can install the Microsoft Access Database Engine x64bit 2010 Service Pack 1 to provide the correct driver for 64bit sas sessions. Previous versions of the driver conflicted with the existing Office installation, but this Service Pack 1 works for me. http://www.microsoft.com/en-US/Download/details.aspx?id=26605 Via this driver you will be able to use the File>Export>EXCEL or the proc export DBMS=EXCEL. Best regards, Paul

Pav · ‎06-23-2011

Did you try the quicknotes? These were the only ones that showed up in the ETL docs.

Online Status	Offline
Date Last Visited	‎08-22-2024 01:13 PM

Re: Manipulating Data in Base SAS® Part 4 – Lookup

Re: Module 1 Statistics 1 Introduction to ANOVA, Regression ...

Re: Using "export" under "file" to export to xlsx files despite the 32...

Re: Job Documentation Report

CASL: Parallel CAS Action Execution

Re: Manipulating Data in Base SAS® Part 4 – Lookup

Re: Module 1 Statistics 1 Introduction to ANOVA, Regression ...

Re: Using "export" under "file" to export to xlsx files despite the 32...

Re: Job Documentation Report

SAS Analytics Explorers