About Rabelais

Rabelais · ‎05-06-2025

Many thanks for your analysis Rick. It would be a dream to have a magic function which would link the duplicate values in a matrix in such a way that when calling a function the evaluation on a single value would be "instantly" propagated to its duplicate values and then those duplicate values would be skipped from the following function evaluations. For example, by running def_rates_linked = link_dup_values(def_rates); p1 = probit(def_rates_linked); probit would perform just 6 evaluations and 6 propagations (since def_rates contains 6 unique values), ideally requiring a fraction of a second.

Rabelais · ‎05-05-2025

I have a program running in about half an hour, and I'm trying to optimize each part of the code to reduce the execution time. There is a loop which computes, among other things, many times the probit of a matrix (each time the matrix is different). The matrix is big but most of the values are duplicated, so I wonder if there is an hack to speed up the probit by computing it only on the unique values rather than on the many duplicated ones. This is a small 10x7 example reproducing how the matrix is constructed and it's structure proc iml; call randseed(42); min = 1E-10; max = 1-min; num_loans = {4 26 24 1 31 7 54}; num_loans_rep = repeat(num_loans, 10); pd = { /*probability of default*/ 0.026 0.008 0.008 0.038 0.037 0.005 0.003 , 0.003 0.001 0.012 0.043 0.009 0.011 0.031 , 0.039 0.010 0.019 0.036 0.002 0.040 0.015 , 0.002 0.020 0.003 0.019 0.001 0.007 0.031 , 0.040 0.001 0.049 0.045 0.013 0.044 0.002 , 0.015 0.021 0.005 0.027 0.046 0.040 0.022 , 0.031 0.017 0.041 0.016 0.005 0.024 0.026 , 0.019 0.026 0.042 0.039 0.021 0.006 0.030 , 0.005 0.011 0.012 0.002 0.017 0.002 0.035 , 0.050 0.032 0.025 0.042 0.034 0.007 0.016 }; defaulted = randfun({10 7},'binomial', pd, num_loans_rep); def_rates = defaulted/num_loans; /*bound def_rates values between min and max*/ def_rates = min <> def_rates >< max; p = probit(def_rates); print num_loans_rep, defaulted, def_rates, p; call tabulate(value, freq, def_rates); print (value`)[l='value'] (freq`)[l='freq']; quit; As you can see about 70% of the matrix in input to the probit function ( def_rates ) is min = 1E-10 , and there are other duplicated values. In the following code there is a larger example 10000x10000 in which the execution time becomes relevant. For simplicity here the matrix def_rates is not constructed as before, but the idea is the same: lot of values are duplicated, so the probit function "wastes" lot of time to compute the same values. In the code I propose three alternative methods to the benchmark: method 2 initializes the output matrix by copying everywhere the probit of the most frequent value ( p_min ) and then replaces it with the correct values where def_rates is different from min ; method 3 computes the sparse matrix of the input pre-bounding matrix ( def_rates0 which is 70% zero), computes the probit of the sparse matrix, goes back to the full matrix and replaces the zeros with p_min ; method 4 finds the unique values of the input matrix and loops over them to compute its probit and assign it to the corresponing locations in the output matrix. %macro tic; %global t_start; %let t_start = %sysfunc(datetime()); %mend; %macro toc; %PUT WARNING- %sysevalf(%sysfunc(datetime()) - &t_start); %mend; proc iml; call randseed(42); N = 10000; b = j(N,N,.); t = 5; /*70% of values in b are 0*/ call randgen(b,'binomial',0.068,t); def_rates0 = b/(t+1); min = 1E-10; max = 1-min; def_rates = min <> def_rates0 >< max; /*method 1 - benchmark (6.5 sec)*/ %tic; p1 = probit(def_rates); %toc; /*method 2 (3.4 sec)*/ %tic; p_min = probit(min); p2 = j(N,N,p_min); idx = loc(def_rates ^= min); p2[idx] = probit(def_rates[idx]); %toc; /*method 3 (5.3 sec)*/ %tic; s = sparse(def_rates0); p3 = probit(s[,1]) || s[,2:3]; p3 = full(p3,N,N); p3[loc(def_rates0 = 0)] = p_min; %toc; /*method 4 (8 sec)*/ %tic; call tabulate(value, freq, def_rates); p4 = def_rates; do i=1 to ncol(value); p4[loc(p4 = value[i])] = probit(min <> value[i]); end; %toc; max_diff2 = max(abs(p2-p1)); max_diff3 = max(abs(p3-p1)); max_diff4 = max(abs(p4-p1)); print (value`)[l='value'] (freq`)[l='freq']; if N < 20 then print def_rates, s, p1; print max_diff2 max_diff3 max_diff4; quit; Method 2 is the best and takes almost half of the time of the benchmark, however, since there are only 6 unique values in the input matrix, I dream of a way faster method to solve the problem. For example, probit(1E-10) is computed 70317776 times... I wonder if there is a way to let the probit function (and more generally, any function) to have a "memory" of the already computed values, so that it could simply recall them instead of computing them again and again. Actually, I don't know how vectorization operations work under the hood, I guess that multiple values are computed at the same time, hence it may be that the concept of a "memory" doesn't have any sense. My setup: SAS EG 8.3 Update 3 (8.3.3.181) (64-bit) SAS/IML 15.2

Rabelais · ‎08-20-2024

@Tom @Quentin So the code by @mkeintz in order to work should be changed in this way (the only differences are at lines 8 and 15). Do you agree? proc sql noprint; create view vcars as select case when type='Hybrid' then 'Hyb' when type='Sedan' then 'Sed' when type='Sports' then 'Spo' when type='Truck' then 'Trk' else type end as type_new ,* from sashelp.cars; quit; data want; set sashelp.cars (obs=0) vcars(drop=type rename=(type_new=type)); run;

Rabelais · ‎08-19-2024

I tried your code (second code block) in sas enterprise guide 8.3 Update 3 (8.3.3.181) (64-bit) but it doesn't work, in the sense that the variable "type" of the table "want" is equal to the variable "type" of the table "sashelp.cars". In fact if I run proc compare base=sashelp.cars comp=want; run; the output is "No unequal values were found. All values compared are exactly equal." The are no warnings in the log, but I noticed a strange thing which is depicted in the screenshot below: the output of the sql code creating the view contains both the "type" variables (the new one is at the beginning). Is this supposed to happen?

Rabelais · ‎08-02-2024

@Tom I did not set any value for borderleftwidth, so I guess SAS uses the default one

Rabelais · ‎08-02-2024

@ballardw It works! Didn't know we can pass a list of widths, many thanks!

Rabelais · ‎08-01-2024

Thanks, but absolute_column_width does set the same width to all columns, doesn't it? What if I need to set a different width to each column?

Rabelais · ‎08-01-2024

I'm trying to set a specific width to each column in my report and I expect to find the same widths in the excel file created by ods excel, but when I open the file in excel the widths are changed (bigger)... why does this happen? Let's see an example data test; very_long_variable_name=123; description="lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt"; status="pass"; output; run; proc template; define style styles.excel_update; parent=styles.excel; class header, body, data / fontfamily=Calibri fontsize=11pt color=black fontweight=medium backgroundcolor=_undef_ just=left; end; run; ods excel file="your_path\test.xlsx" style=styles.excel_update options(flow="tables" absolute_row_height="20px"); proc report data=test; columns _ALL_; define very_long_variable_name / style(column)={width=1.86in}; define description / style(column)={width=6.26in}; define status / style(column)={width=0.48in}; run; ods excel close; To choose the width values 1.86in, 6.26in and 0.48in I opened the excel file and used the "auto fit width" feature. Then by using the "Page Layout" view I can see the exact width inches of each column, as shown in the image below. The following is a screenshot from the xlsx created by the previous code. As you can see the columns' width are larger. for example the first column has a width of 2.35in, even if I set it to be equal to 1.86in (the other columns are 7.80in and 0.67in) Why the columns' width are changed? Is there a way in the SAS code to set the column' widths so that the excel file will have the same widths?

Rabelais · ‎07-31-2024

Hi, I'm trying to replicate the default plain style in Excel by using proc template and ods excel. Let's see an example to understand what I'd like to generate using SAS. As you can see from the image below, it is the default style when manually creating an excel file. Basically, this style can be summarized by the following features: transparent background color black text color Calibri font with 11pt size and left alignment 20 px row height and 64 pixels column width no wrap text, i.e. the text inside cells must not be wrapped into multiple lines This is what I obtain with the code below data test; test=123; description="lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt"; status="pass"; output; run; proc template; define style styles.excel_update; parent=styles.excel; class body, header, data / fontfamily=Calibri fontsize=11pt just=left color=black fontweight=medium backgroundcolor=_undef_ nobreakspace=on asis=off cellpadding=0 cellspacing=0 height=20 CELLWIDTH=64 /* tagattr="wrap:no"*/ ; end; run; ods excel file="your_path\test.xlsx" style=styles.excel_update /* options(*/ /* flow="tables"*/ /* absolute_Column_Width="64px"*/ /* absolute_row_height="20px"*/ /* )*/ ; proc report data=test; columns _ALL_; run; ods excel close; As you can see, features 4 and 5 are not achieved. In particular, notice that the colums width is 54 pixels instead of 64, and the row height of the rows below the table is 18 pixels instead of 20. Moreover, the text not only is wrapped into multiple lines, but line breaks are added after each word. By uncommenting the lines /* options(*/ /* flow="tables"*/ /* absolute_Column_Width="64px"*/ /* absolute_row_height="20px"*/ /* )*/ I obtain the following result, which is very good, but has minor flaws: the row height of the rows below the table is 18 pixels instead of 20 line breaks are not added to text (thanks to flow="tables" ) but wrap is still active (I tried also by uncommenting tagattr="wrap:no" but nothing happens) the number 123 is displayed at the center of the cell, this is because what the cell actually contains is " 123" instead of the number 123 (EDIT: this is caused by height=20 CELLWIDTH=64 ) Question 1: why nobreakspace=on asis=off don't work? In the documentation I read that ASIS=OFF trims leading spaces and ignores line breaks, and that NOBREAKSPACE=ON does not let SAS break a line at a space character. So why in the second screenshot we see that lines are broken and that line breaks are not ignored? Question 2: is it possible to achieve the desired result without using ods excel options, but just by using proc template style attributes? Question 3: if the answer to Question 2 is "no", why do the three flaws described above happen and how to solve them?

Rabelais · ‎03-01-2024

@ballardw I added pictures with code's output, i.e. HAVE and WANT datasets

Rabelais · ‎03-01-2024

@Tom Curiosity is the lust of the mind! As a new SAS user, I'm asking myself lot of questions. Q2 & Q3: (1) Lot of tasks can be done by using different approaches and since I'm working with large datasets (this is not the case since generating even a big sequence is fast) I usually look for the most efficient one. (2) Since I'm new to SAS I often wonder if my code can be improved. (3) Also, sometimes I'm simply curious about different ways of doing the same task. Q1: see the toy example below (it's just 10 rows, but I'm working with millions). Basically I need to "expand" the dataset HAVE so that "i" goes from 1 to N. I was wondering if the second data step and the proc sql could be easily combined into a single proc sql or data step. I tried using PROC EXPAND to interpolate the missing "i" values, but what it does is rather to replace a missing value with an interpolated one on an already present row, it doesn't add new rows. data have; input i c; cards; 7 0.2 3 0.5 6 0.3 run; data sequence; do i=1 to 10; output; end; run; proc sql; create table want as select a.i, b.c from sequence a left join have b on a.i = b.i; quit;

Rabelais · ‎02-29-2024

I see that you use an input table and count the rows. But what if N is 10000, 10000000 or bigger? Isn't possible to create such a table without using an input table?

Rabelais · ‎02-29-2024

I'm trying to "translate" this data step to SQL %let N = 1000; data want; do i=1 to &N; output; end; run; which simply creates such a table I tried many codes using functions from other platform/languages but none of them work, for example /*SNOWFLAKE*/ proc sql; select seq4() as number from table(generator(rowcount => &N)) order by 1; quit; /*SQL SERVER*/ proc sql; SELECT * FROM generate_series(0,&N); quit; proc sql; SELECT * FROM UNNEST(GENERATE_ARRAY(0, &N)); quit; Any ideas? Is there a similar function in SAS?

Rabelais · ‎02-20-2024

I guess that this data NewWant; set Have CharAndNumVars(in=new); if new; run; does the same job, doesn't it?

Rabelais · ‎02-20-2024

It works like a charm! I'm going to study your code to understand how it works. Thank you very much for all the suggestions and for sharing your valueable knowledge

Online Status	Offline
Date Last Visited	‎06-11-2025 09:45 AM

Re: Speed up function computation on a large matrix containing mostly ...

Speed up function computation on a large matrix containing mostly dupl...

Re: Recode variable keeping the same name and position

Re: Recode variable keeping the same name and position

Re: columns' width is changed when exporting to excel (using proc repo...

Re: columns' width is changed when exporting to excel (using proc repo...

Re: columns' width is changed when exporting to excel (using proc repo...

columns' width is changed when exporting to excel (using proc report a...

Achieve the default Excel style with proc template and ods excel

Re: Create a table with increasing integers from 1 to N in SQL

Re: ERROR: File Name value exceeds maximum length of 201 characters

Re: Create a table with increasing integers from 1 to N in SQL

Re: Create a table with increasing integers from 1 to N in SQL

Re: columns' width is changed when exporting to excel (using proc repo...

Re: Create a table with increasing integers from 1 to N in SQL

Re: Recode variable keeping the same name and position

Re: How to rename the column of row names by setting the same name as ...

Re: Speed up function computation on a large matrix containing mostly ...

Speed up function computation on a large matrix containing mostly dupl...

Re: Recode variable keeping the same name and position

Re: Recode variable keeping the same name and position

Re: columns' width is changed when exporting to excel (using proc repo...

Re: columns' width is changed when exporting to excel (using proc repo...

Re: columns' width is changed when exporting to excel (using proc repo...

columns' width is changed when exporting to excel (using proc report a...

Achieve the default Excel style with proc template and ods excel

Re: Create a table with increasing integers from 1 to N in SQL

Re: Create a table with increasing integers from 1 to N in SQL

Re: Create a table with increasing integers from 1 to N in SQL

Create a table with increasing integers from 1 to N in SQL

Re: How to rename the column of row names by setting the same name as ...

Re: How to rename the column of row names by setting the same name as ...