Hello,
I have a data set that I needed to concatenate the rows for each ID. I had to use LENGTH statement to allocate enough bytes for the new variables created. Allocated 2000 bytes. However many of the text length doesn't even exceed 500-600 bytes. Because I allocated 2K bytes and even though I used a strip and trim functions later to get rid of the embedded blank spaces; to size of the dataset increased by 7 times the original dataset. from 12 GB (CUST_REMARK_TB_SAS) to 70GB (CUST_REMARK_CAT)
can anyone tell me why this is happening. Please help me fix this problem.
data bnknts.CUST_REMARK_CAT;
length CAT_REMK_TEXT $2000.;/*allocate 2K bytes for concatanation operation*/
do until (last.remk_placed_date);
set bnknts.CUST_REMARK_TB_SAS;
by cust_id remk_placed_date;
CAT_REMK_TEXT=catx(' ',CAT_REMK_TEXT,remark_text);
end;
cust_remk_text=lowcase(strip(trim(CAT_REMK_TEXT)));
LEN=length(cust_remk_text);
drop remark_text CAT_REMK_TEXT remk_occ_nbr;
run;
By default, SAS increases the dataset size by the size of your CAT_REMK_TEXT variable including blank space. To remove the blank space from your dataset use the COMPRESS = YES option:
data bnknts.CUST_REMARK_CAT (compress = yes);
By default, SAS increases the dataset size by the size of your CAT_REMK_TEXT variable including blank space. To remove the blank space from your dataset use the COMPRESS = YES option:
data bnknts.CUST_REMARK_CAT (compress = yes);
then what is the remedy to make the size smaller. As you can see in my code I am using cust_remk_text=lowcase(strip(trim(CAT_REMK_TEXT))) to eliminate it but not working. \thanks
@eserates wrote:
then what is the remedy to make the size smaller. As you can see in my code I am using cust_remk_text=lowcase(strip(trim(CAT_REMK_TEXT))) to eliminate it but not working. \thanks
How would that make the value stored any smaller? All you did was take of the trailing spaces, then remove the leading spaces (and the now already gone trailing spaces) and then stick it back into the same 2,000 byte variable where the spaces will be appended again.
SAS stores character variables as fixed length.
If you want to save disk space make sure to use the COMPRESS= dataset option. Otherwise compression is used when writing the data to disk and every observation will require 2000 bytes for just that variable.
thanks for your responses. I will try COMPRESS option.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.