I have a table sizing problem that I don't understand. doing a simple left join with a small table i added a variable to a table. The strange thing is that this new table with an extra variable (13 instead of 12) has gone from 8.54Gb to 21.17Gb! I can't figure it out, I think the physical memory should have increased only a little.
I report the complete results of the dictionary of the 2 tables in question (note the "filesize" variable)
Then yes, that is correct.
You have 67,298,471 observations X 200 characters/bytes = 13,459,694,200 bytes = 13.5 GB
200
Then yes, that is correct.
You have 67,298,471 observations X 200 characters/bytes = 13,459,694,200 bytes = 13.5 GB
Show the code for your "simple join". It is likely that you added records, at a guess about doubling the number of records, because of the particular join.
Consider this small example. A "simple join" of two data sets of 6 observations each but the result has 16 observations.
data one; input a b; datalines; 1 1 1 2 1 3 2 11 2 22 3 33 ; data two; input a z; datalines; 1 111 1 222 1 333 2 1111 2 2222 2 3333 ; proc sql; create table example as select one.a, one.b, two.z from one left join two on one.a = two.a ; quit;
Table size for many discussions is more about the number of observations and variables than bytes.
this is the code. I also report the data from the dictionary columns of the 2 tables in question
proc sql;
create table RUOLO_DANNI2 as
select RU.*
,C.DES_EDWH as RUOLO_DES
from RUOLO_DANNI1 RU
left join EDWH_ODS.CODICE_TIPO_RUOLO C
on
RU.COD_TIPO_RUOLO = C.COD_EDWH
and C.ts_inizio_validita <= ru.ts_inizio_validita <= C.ts_fine_validita
and compress(C.cod_source)='1';
quit;
You added 200 bytes to your obslen, indicating that the new variable is $200 in length. This usually happens when the new variable is assigned a value from a function without properly defining it beforehand with a LENGTH statement.
Please show your code.
I simply brought the variable forward from another table without specifying the length, thus inheriting the original length
proc sql;
create table RUOLO_DANNI2 as
select RU.*
,C.DES_EDWH as RUOLO_DES
from RUOLO_DANNI1 RU
left join EDWH_ODS.CODICE_TIPO_RUOLO C
on
RU.COD_TIPO_RUOLO = C.COD_EDWH
and C.ts_inizio_validita <= ru.ts_inizio_validita <= C.ts_fine_validita
and compress(C.cod_source)='1';
quit;
Well, an almost tripling of the observation size will also result in tripling the dataset size.
Use the COMPRESS=YES dataset option if the new variable is mostly empty.
Use the COMPRESS=YES system option (or COMPRESS=YES dataset option if you only want it to apply to one dataset) and it will take a lot less space on the disk.
Why attach a spreadsheet to share 8 numbers?
memname | nobs | obslen | nvar | compress | filesize |
RUOLO_DANNI1 | 67,298,471 | 136 | 12 | NO | 9,169,469,440 |
RUOLO_DANNI2 | 67,298,471 | 336 | 13 | NO | 22,734,503,936 |
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.