BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eserates
Fluorite | Level 6

Hello,

I have a data set that I needed to concatenate the rows for each ID. I had to use LENGTH statement to allocate enough bytes for the new variables created. Allocated 2000 bytes. However many of the text length doesn't even exceed 500-600 bytes. Because I allocated 2K bytes and even though I used a strip and trim functions later to get rid of the embedded blank spaces; to size of the dataset increased by 7 times the original dataset. from 12 GB (CUST_REMARK_TB_SAS)  to 70GB (CUST_REMARK_CAT)

can anyone tell me  why  this is happening. Please help me fix this problem.

 

data bnknts.CUST_REMARK_CAT;
	length CAT_REMK_TEXT $2000.;/*allocate 2K bytes for concatanation operation*/

	do until (last.remk_placed_date);
		set bnknts.CUST_REMARK_TB_SAS;
		by cust_id remk_placed_date;
		CAT_REMK_TEXT=catx(' ',CAT_REMK_TEXT,remark_text);
	end;

	cust_remk_text=lowcase(strip(trim(CAT_REMK_TEXT)));
	LEN=length(cust_remk_text);
	drop remark_text CAT_REMK_TEXT remk_occ_nbr;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

By default, SAS increases the dataset size by the size of your CAT_REMK_TEXT variable including blank space. To remove the blank space from your dataset use the COMPRESS = YES option:

data bnknts.CUST_REMARK_CAT (compress = yes);

View solution in original post

6 REPLIES 6
SASKiwi
PROC Star

By default, SAS increases the dataset size by the size of your CAT_REMK_TEXT variable including blank space. To remove the blank space from your dataset use the COMPRESS = YES option:

data bnknts.CUST_REMARK_CAT (compress = yes);
eserates
Fluorite | Level 6

then what is the remedy to make the size smaller. As you can see in my code I am using cust_remk_text=lowcase(strip(trim(CAT_REMK_TEXT))) to eliminate it but not working. \thanks

Tom
Super User Tom
Super User

@eserates wrote:

then what is the remedy to make the size smaller. As you can see in my code I am using cust_remk_text=lowcase(strip(trim(CAT_REMK_TEXT))) to eliminate it but not working. \thanks


How would that make the value stored any smaller? All you did was take of the trailing spaces, then remove the leading spaces (and the now already gone trailing spaces) and then stick it back into the same 2,000 byte variable where the spaces will be appended again.

Tom
Super User Tom
Super User

SAS stores character variables as fixed length.

 

If you want to save disk space make sure to use the COMPRESS= dataset option. Otherwise compression is used when writing the data to disk and every observation will require 2000 bytes for just that variable.

https://documentation.sas.com/?docsetId=ledsoptsref&docsetTarget=n014hy7167t2asn1j7qo99qv16wa.htm&do...

eserates
Fluorite | Level 6

thanks for your responses. I will try COMPRESS option. 

Reeza
Super User
If you know that the maximum length is 600, consider changing the length to 600 instead of 2000. It should reduce space as well. SAS has to allocate that much space regardless of it being used or not so that's what takes up the space.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1571 views
  • 2 likes
  • 4 in conversation