I have a table sizing problem that I don't understand. doing a simple left join with a small table i added a variable to a table. The strange thing is that this new table with an extra variable (13 instead of 12) has gone from 8.54Gb to 21.17Gb! I can't figure it out, I think the physical memory should have increased only a little.
I report the complete results of the dictionary of the 2 tables in question (note the "filesize" variable)
Then yes, that is correct.
You have 67,298,471 observations X 200 characters/bytes = 13,459,694,200 bytes = 13.5 GB
200
Then yes, that is correct.
You have 67,298,471 observations X 200 characters/bytes = 13,459,694,200 bytes = 13.5 GB
Show the code for your "simple join". It is likely that you added records, at a guess about doubling the number of records, because of the particular join.
Consider this small example. A "simple join" of two data sets of 6 observations each but the result has 16 observations.
data one; input a b; datalines; 1 1 1 2 1 3 2 11 2 22 3 33 ; data two; input a z; datalines; 1 111 1 222 1 333 2 1111 2 2222 2 3333 ; proc sql; create table example as select one.a, one.b, two.z from one left join two on one.a = two.a ; quit;
Table size for many discussions is more about the number of observations and variables than bytes.
this is the code. I also report the data from the dictionary columns of the 2 tables in question
proc sql;
create table RUOLO_DANNI2 as
select RU.*
,C.DES_EDWH as RUOLO_DES
from RUOLO_DANNI1 RU
left join EDWH_ODS.CODICE_TIPO_RUOLO C
on
RU.COD_TIPO_RUOLO = C.COD_EDWH
and C.ts_inizio_validita <= ru.ts_inizio_validita <= C.ts_fine_validita
and compress(C.cod_source)='1';
quit;
You added 200 bytes to your obslen, indicating that the new variable is $200 in length. This usually happens when the new variable is assigned a value from a function without properly defining it beforehand with a LENGTH statement.
Please show your code.
I simply brought the variable forward from another table without specifying the length, thus inheriting the original length
proc sql;
create table RUOLO_DANNI2 as
select RU.*
,C.DES_EDWH as RUOLO_DES
from RUOLO_DANNI1 RU
left join EDWH_ODS.CODICE_TIPO_RUOLO C
on
RU.COD_TIPO_RUOLO = C.COD_EDWH
and C.ts_inizio_validita <= ru.ts_inizio_validita <= C.ts_fine_validita
and compress(C.cod_source)='1';
quit;
Well, an almost tripling of the observation size will also result in tripling the dataset size.
Use the COMPRESS=YES dataset option if the new variable is mostly empty.
Use the COMPRESS=YES system option (or COMPRESS=YES dataset option if you only want it to apply to one dataset) and it will take a lot less space on the disk.
Why attach a spreadsheet to share 8 numbers?
memname | nobs | obslen | nvar | compress | filesize |
RUOLO_DANNI1 | 67,298,471 | 136 | 12 | NO | 9,169,469,440 |
RUOLO_DANNI2 | 67,298,471 | 336 | 13 | NO | 22,734,503,936 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.