About Patrick

Patrick · ‎02-18-2025

@Ronein Given you've accepted @Ksharp 's approach I'd be really interested how much this improved performance. Can you please share the proc sort log portion with fullstimer.

Patrick · ‎02-18-2025

@Ronein You could try below to avoid a) sorting and b) writing the whole sorted table to disk. I did ask how you created this big source table in work because if that's done with an earlier data step then you incorporate the creation of the hash lookup table into this data step instead of the data _null_ step below because this will avoid a full pass through the data. /* create sample data */ data ACCOUNT_MONTHLY_DATA_RF_CS_12; do i = 1 to 30; UPDATE_DATE = '01JAN2025'd + i; FK_APPLICATION = round(1000 + ranuni(0) * 100); FK_MONTHLY_DATA_ACCOUNT = round(2000 + ranuni(0) * 100); REFERENCE_DATE = '01JAN2025'd + intnx('month', 0, i); output; output; end; drop i; run; /* create table with row_number of source table that contains the max reference_date */ data _null_; if _n_=1 then do; length row_num max_REFERENCE_DATE 8; dcl hash h1(); h1.defineKey('UPDATE_DATE', 'FK_APPLICATION', 'FK_MONTHLY_DATA_ACCOUNT'); h1.defineData('row_num','max_REFERENCE_DATE'); h1.defineDone(); call missing(row_num, max_REFERENCE_DATE ); end; set ACCOUNT_MONTHLY_DATA_RF_CS_12(keep=UPDATE_DATE FK_APPLICATION FK_MONTHLY_DATA_ACCOUNT REFERENCE_DATE) end=last; if h1.find() = 0 then do; if REFERENCE_DATE>max_REFERENCE_DATE then do; rc=h1.replace(key:UPDATE_DATE, key:FK_APPLICATION , key:FK_MONTHLY_DATA_ACCOUNT , data:_n_ , data:REFERENCE_DATE); end; end; else do; rc=h1.add(key:UPDATE_DATE, key:FK_APPLICATION , key:FK_MONTHLY_DATA_ACCOUNT , data:_n_ , data:REFERENCE_DATE); end; if last then h1.output(dataset:"work.row_num_with_max_REFERENCE_DATE(keep=row_num rename=(row_num=_n_))"); run; data want; if _n_=1 then do; dcl hash h1(dataset:'work.row_num_with_max_REFERENCE_DATE'); h1.defineKey('_n_'); h1.defineDone(); end; set ACCOUNT_MONTHLY_DATA_RF_CS_12; if h1.check()=0 then output; run;

Patrick · ‎02-18-2025

And how do you create this huge source table in WORK. Is this another data step or something else?

Patrick · ‎02-18-2025

If the relationship of table PUBLIC.X_TEST to the lookup tables is many:1 (or zero) then using a data step with hash lookups could perform better. CAS Table Lookup (Left Outer Join) on SAS Viya As long as the base table and the target table are in CAS the source for the hash tables doesn't need to be a CAS table for the process to execute in CAS. Below untested code - but should be close. %let outcas=casuser; data PUBLIC.X_TEST1(drop=_rc); if _N_ = 1 then do; /* create variables */ if 0 then do; set PUBLIC.X_TEST &outcas..SUMMARY_A1(keep=column mean Std rename=(column=_name_ mean=mean_recipe Std=std_recipe)) &outcas..SUMMARY_A2(keep=column Mean Std rename=(column=_name_ Mean=mean_batch_id Std=std_batch_id)) &outcas..NDIST_A2 (keep=_column_ _NDis_ rename=(_column_=_name_ _NDis_=_NDis_batch_id)) &outcas..NDIST_A1 (keep= _column_ _NDis_ rename=(_column_=_name_ _NDis_=_NDis_recipe)) ; end; /* define hash tables */ declare hash h_summary_a1(dataset: "&outcas..SUMMARY_A1(rename=(column=_name_ mean=mean_recipe Std=std_recipe))"); h_summary_a1.defineKey('RECIPE', '_name_'); h_summary_a1.defineData('mean_recipe', 'std_recipe'); h_summary_a1.defineDone(); declare hash h_summary_a2(dataset: "&outcas..SUMMARY_A2(rename=(column=_name_ Mean=mean_batch_id Std=std_batch_id))"); h_summary_a2.defineKey('batch_id', '_name_'); h_summary_a2.defineData('mean_batch_id', 'std_batch_id'); h_summary_a2.defineDone(); declare hash h_ndist_a2(dataset: "&outcas..NDIST_A2(rename=(_column_=_name_ _NDis_=_NDis_batch_id))"); h_ndist_a2.defineKey('batch_id', '_name_'); h_ndist_a2.defineData('_NDis_batch_id'); h_ndist_a2.defineDone(); declare hash h_ndist_a1(dataset: "&outcas..NDIST_A1(rename=(_column_=_name_ _NDis_=_NDis_recipe))"); h_ndist_a1.defineKey('recipe', '_name_'); h_ndist_a1.defineData('_NDis_recipe'); h_ndist_a1.defineDone(); end; call missing(of _all_); set PUBLIC.X_TEST; /* lookup */ _rc = h_summary_a1.find(); _rc = h_summary_a2.find(); _rc = h_ndist_a2.find(); _rc = h_ndist_a1.find(); run; I don't know how exactly this works but I have been told that as long as you only load the hash table without any further write operations (hash.add() etc.) there will only be a single instance of the hash table and not a copy per node.

Patrick · ‎02-17-2025

@Mahis There are various articles in the SAS Community library that demonstrate how you can share a job with user selections (prompts) for retrieving the data and creating a report on-the-fly in VA. Below the links to two of these articles: Sharing Jobs Using SAS Visual Analytics Using Jobs to Load Data in SAS Visual Analytics The first link covers static reports that won't need any data in CAS, the 2nd link is for interactive reports where the job loads first data into CAS (I believe for your use case that would need to be session scope). I'm still concerned about response times not meeting user expectation and would strongly suggest that you execute one of your queries against Hive out of SAS to determine how long this actually takes. Is it really not possible to create pre-aggregated tables (eventually one per report) that are much smaller in size but will support your reports? It could be a specialized table per report (or a few similar reports) pre-loaded into CAS via daily batch job. Whatever you do, ensure that you run Hive queries that minimize the data volumes before transferring to SAS. Afaik CAS as such can deal with a 1TB table IF you've got an environment that's sized for it (which is unlikely). But with CAS you can certainly load big tables. There isn't only RAM but also virtual memory. Of course once SAS has to swap data between RAM and virtual memory performance will decrease. You can also compress data in memory (again with a performance penalty) and there is also Duplicate Value Reduction (DVR) that depending on your data can reduce volumes significantly. If it was me then I'd spend some time to analyse and test: 1. Which data elements from source do I need for my reports? 2. Which level of aggregation is possible to support my reports? 3. How much can I reduce storage requirements by using compression and/or DVR? I'd likely run some tests for above with a sub-set of the data to get some metrics helping me to make a decision. The ideal outcome would be to reduce the data volumes to a level where a few pre-aggregated tables can support all reports - but of course if the reports require pre-aggregation by different categorical variables then more report specific CAS tables might be required. I'd only consider using queries against external data sources during user interaction as a last resort because I'm pretty sure that response times won't be satisfying (but yeah, test with some hive queries to confirm). ...and as an afterthought: Is this 1TB the actual Hive table or the size when you load the data to the SAS side. Hive got this data type of STRING without a length which SAS compute (sas7bdat) maps to CHAR with a length of 32KB (depending on the dbmax_text libname option). When loading into CAS the datatype normally becomes VARCHAR(*) which as such is good for STRING but CAS VARCHAR consumes at least 16 bytes so it's not that great for any character variable that doesn't need that many bytes to store the actual values or where the values are fixed length (use CHAR for such cases). What this means: When loading high volume data into SAS compute (.sas7bdat) or directly into CAS (.sashdat) then explicitly define the mapping of the HIVE to the SAS/CAS variable types especially if there are STRING data types in Hive.

Patrick · ‎02-17-2025

I didn't really understand where you're taking naming components AAA and BBB from so I've used variable code instead. Does below return what you're after? data have; input office $ dept $ year seq $ code $ me_amt me_pct; datalines; LA IT 2024 1 101 10000 0 LA IT 2024 1 102 66 12 MI FIN 2024 1 101 333 1 MI FIN 2024 1 102 98.7 12.3 ; run; proc sql noprint; select distinct cats('me_amt_',code), cats('me_pct_',code) into :me_amt separated by ' ', :me_pct separated by ' ' from have ; quit; proc sort data=have out=want; by office dept year seq; run; data want(drop=_: me_amt me_pct); set want; by office dept year seq; array a_me_amt{*} &me_amt; array a_me_pct{*} &me_pct; retain a_me_amt a_me_pct; do _i=1 to dim(a_me_amt); if scan(vname(a_me_amt[_i]),-1,'_')=code then do; a_me_amt[_i]=me_amt; a_me_pct[_i]=me_pct; end; end; if last.office then do; output; call missing(of _all_); end; run; proc print data=want; run;

Patrick · ‎02-17-2025

why do I get those brackets () for the pctincrease column in the output (see below)? I did not write any brackets() in my code. You are using SAS format PERCENTw.d to print the numbers. This format prints negative numbers in brackets for some reason I cannot write the complete value("Borrowed for health or medical purposes (% age 15+)") of the indicatorname column The percent sign % is also a SAS macro token. If you use it as a wildcard in a SQL like expression always use single quotes so SAS doesn't treat it as macro token (if you haven't gotten to SAS Macro language yet just take this use single quotes as a rule. You'll understand later the why). The % in a SQL like expression is a wildcard for one or multiple characters. If you want to search for the actual character % then you should escape it as done in below sample code. The search as such for the full string works for me which makes me think that the search string you're using doesn't match the data (which includes whitespace characters like if you believe it's blank but what's stored is actually a tab). data work.have; length indicatorname $60; indicatorname='Borrowed for health or medical purposes (% age 15+)'; output; run; proc sql; select * from work.have where indicatorname like 'Borrowed for health or medical purposes (^% age 15+)' escape '^' ; select * from work.have where indicatorname = 'Borrowed for health or medical purposes (% age 15+)' ; quit;

Patrick · ‎02-17-2025

Below code shows you what's installed and licensed. Does this show SAS/Access to JDBC? /* what is installed */ proc product_status; run; /* what is licensed */ proc setinit; run;

Patrick · ‎02-17-2025

@Ronein Thank you for sharing the log with fullstimer and some characteristics about your table. That helps! From the looks of it the Proc Sort as such runs efficiently. There is enough memory (zero page swaps) and also the cpu times look reasonable. The real time is much higher than the CPU times which means time is spent waiting for resources (if your environment is busy) and input/output operations. Given the number of rows of your table and especially these 2KB variables I'd assume it's I/O operations that consume most of the time. I don't believe you can achieve a much better real time unless you reduce your data volume, get faster disk and/or eventually if you use the SPDE engine and potentially don't sort at all but create indexes instead. Questions 1. Why do you end-up with such a massive table in WORK? 2. Why do you need the table sorted? What do you intend to do with it? Sorting data in SAS: can you skip it? It's often beneficial to save really big tables using the SPDE engine. ...but it depends on how you get to this big table and the intended use.

Patrick · ‎02-16-2025

How large is "very large" in GB? Report consumers normally expect "immediate" results when using interactive reports. The query and data transfer will require time. Hive queries have latency and are better suited for batch processing. Ideally use Impala for interactive processes. Would it be an option to pre-load aggregated data for all years into CAS and create reports with links for drill through if required. The drill through then can query the hive table for the leaf where a user wants to see the detail.

Patrick · ‎02-14-2025

@citizben "message_json" is a JSON type variable containing longer than 32767 characters Redshift doesn't document a data type JSON. Is it a VARCHAR? https://docs.aws.amazon.com/redshift/latest/dg/c_Supported_data_types.html SAS 9.4 type tables are limited to character variables of max 32767 bytes. SAS Viya CAS tables allow for longer character variables. Which SAS version are you using? Please let us know the result when running %put &=sysvlong; Same as below from my environment. 69 %put &=sysvlong; SYSVLONG=9.04.01M7P080620 With SAS9.4 you will likely need to split your variable into chunks of 32767 characters (=creating multiple variables) that you then put together on the SAS side when writing to a text file. Should you have Viya CAS then you should be able to load the long variable from Redshift directly into a CAS Varchar and then write to a text file directly out of CAS. Also SAS9.4 M5 and later got a Varchar data type that can store more than 32767 characters and that's available both within DS2 and the SAS data step BUT you can't write it to a SAS table and more importantly I believe the SAS/Access engine won't allow you to retrieve more than a 32767 character string from a database so you'll never get the full string into SAS even though you could directly write it out to a text file without the need to store it in a SAS table. https://support.sas.com/resources/papers/proceedings18/2690-2018.pdf

Patrick · ‎02-12-2025

@mvalsamis Under the assumption that the first day of the first week always starts on July 1st, irrespective of the day of the week, the following should work. data dim_date; format date date9.; /* do date='01jan2025'd to '31dec2025'd; */ do date='27jun2025'd to '08jul2025'd; day_of_week_number=weekday(date); week_of_year_number=week(date); /* financial week with count starting 01July<year> irrespective of day of week */ fin_week=int((date-intnx('year.7', date, 0, 'b'))/7)+1; output; end; run; proc print data=dim_date; run;

Patrick · ‎02-12-2025

The IN Operator only allows for a query expression or constants but not for functions. You can pass SAS date constants in the form 'ddmonyyyy'd proc sql; create table myExample as select datepart(myDate) as fmtDate format mmddyy8., sum(myStuff) as sum_stuff from myTable where datepart(myDate) in ('15nov2024'd,'01dec2024'd,'01jan2025'd) group by myDate; quit; myDate is a SQL Server DateTime field Does that mean your source table myTable is a SQL Server table? If so then ensure that SAS can push the where clause to the database for processing so it doesn't first pull all the data from SQL Server before sub-setting. The datepart() function could cause a problem here. Passing SAS Functions to Microsoft SQL Server Happy to provide additional advice if above is your situation. I would need to know if only the source or also the target table are in SQL Server and the exact where clause you want to use. As a comment to below syntax provided by @Tom: This will only work for processing on the SAS side because the result of the %sysfunc() portion is a number that expresses a SAS date. SAS won't be able to convert this number to the matching SQL date for in-database processing. If you use SAS date literals like '01jan2024'd then SAS will be able to convert the date to the SQL Server equivalent for in-database processing. where datepart(myDate) in (%sysfunc(mdy(11,15,2024)) %sysfunc(mdy(12,1,2024)) %sysfunc(mdy(1,1,2025)))

Patrick · ‎02-12-2025

@SAS242424 Thanks for sharing! Here my five cents: I guess what's "right" will very much depend on your data, your environment, the requirements and the usage of your data. I assume with "class action data" you actually want to query your data in different ways - like once per claim type, the next time per case status and then per ... If that's true then I guess no single sort order will suffice to avoid "full table scans". @ChrisHemedinger was apparently too humble to cite himself but it might be worth your while to read Sorting data in SAS: can you skip it? With your data and SAS on a laptop it might be worth to consider storing the data on your c-drive under a library with the SPDE engine and with indexes created that match your most common where clauses or by groups.

Patrick · ‎02-12-2025

@Ronein Like below: /* %let round_to=500; */ %let round_to=1000; data test; do val=1,249.9,250,250.1,499.1,500,500.1,749.9,750,750.1,17524.5,25475.37,6475.3; round_val=round(val,&round_to); output; end; run; proc print data=test; run; Above updated after @Kurt_Bremser 's remark

Online Status	Offline
Date Last Visited	yesterday

Re: Help Shape the Future of VA: Quick 5-Minute Survey

Re: The ADDRLONG function is not available beginning with SAS 9.4M9

Re: oauth_bearer does not seem to support encoded tokens

Re: How do I upgrade IP's from basic Sku to Standar Sku?

Re: struggle with join toward a table with 3 billions rows

Re: I want to transfer DI Studio code to SAS Viya and remove the DI wr...

Re: Macro variable not resolved - ONLY in scheduled jobs. Works fine w...

Re: struggle with join toward a table with 3 billions rows

Re: check when tera view table was lastly updated

Re: sas to tera

The ADDRLONG function is not available beginning with SAS 9.4M9

Re: The ADDRLONG function is not available beginning with SAS 9.4M9

Re: The ADDRLONG function is not available beginning with SAS 9.4M9

SAS Community OF edition (only forums)

Re: The ADDRLONG function is not available beginning with SAS 9.4M9

Re: The ADDRLONG function is not available beginning with SAS 9.4M9

Re: Macro variable not resolved - ONLY in scheduled jobs. Works fine w...

Re: struggle with join toward a table with 3 billions rows

Re: Seeking Effective Methods for SAS Code Optmization and Refactoring...

Re: how can I obtain an identical matchcode between 2 similar names ?

How do I add a row number to a table in SAS code?

Re: You like me, you really like me!

Re: Ling run time for proc sort

Re: Ling run time for proc sort

Re: Ling run time for proc sort

Re: slow speed using fedsql

Re: How to Query a Table in SAS VA (Viya 4) Without Loading to CAS?

Re: How can i transpose 2 variable in one go?

Re: help with question on page87-88, practice, SQL1 essential

Re: SAS/ACCESS to JDBC on AIX Installation

Re: Ling run time for proc sort

Re: How to Query a Table in SAS VA (Viya 4) Without Loading to CAS?

Re: SAS proc sql pass-through AWS json field is cut short

Re: Calculate number of week in accounting year based on number of wee...

Re: PROC SQL Where Clause Function within IN

Re: Sorting Very Large Files with SAS

Re: Round by 500

CoDe SAS German