BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
anming
Pyrite | Level 9
data scoredata1;
set scoredata0;
length phone_update $10; /*without this the length will be 200*/
phone_update = TRANWRD(phone,'000','408');
run;

when I use TRANWRD to replace '000' of 'phone' by '408', I find that a length definition is needed for the new variable of 'phone_update". Otherwise, the length of phone_update will be 200. Will all new variables defined by a function require to define its length?   

I note that this is different for a trim and CATX functions:

data scoredata2;
set scoredata1;
/*the concatenation operator and Trim function*/
student_name1 = trim(last_name) || ', ' || trim(first_name);
/*the CATX function enables you to concatenate character strings,
remove leading and trailing blanks, and insert separators*/
length student_name2 $25;
student_name2 = catx(', ',last_name,first_name);
run;

the student_name1 by trim is defined by the input variables, last_name and first-name, while the length of student_name2 is 200 if removing the length function.

 

I am a bit confused when a length definition is needed. will a length of 200 slow down the operation and take too much memory space? The length of variables can be easily checked through

proc contents data=xxx

run;

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

The documentation for each of the character functions usually indicates the default length of the resulting variable if a length is not assigned but 200 is common. Trim, Strip, Compress and Compbl functions do not change the length because the only operation they do is shift or remove characters inside the value. So if the start is 20 characters any shifting/removing will fit inside 20.

 

"Too much memory", possibly depending on number of variables though disk space is more likely the issue.

Moving extra bits around for read/ write operations may slow things down. If you only have a few variables and data sets of less than a 1000 records you may not notice the difference. When you get to 1000's of variables and millions of records then the performance is more likely to be noticeable.

 

Part of the reason is that depending on the functions used the length of results is unpredictable. Tranwrd for example could replace 3 characters with 100. Instead of spending a lot of time "guessing" from your code possibly results SAS picks 200 (which used to be the maximum length of character variables). Another reason is that with simple assignment statements the first use of the variable would set the length. So if your first result was small then other results later might not fit.

 

This is another example of the Maxim Know thy Data.

View solution in original post

2 REPLIES 2
Kurt_Bremser
Super User
With TRANWRD, the result can be longer than the source. Therefore the function causes the data step compiler to use the default length of 200, if no length is specified. See Maxim 47.
ballardw
Super User

The documentation for each of the character functions usually indicates the default length of the resulting variable if a length is not assigned but 200 is common. Trim, Strip, Compress and Compbl functions do not change the length because the only operation they do is shift or remove characters inside the value. So if the start is 20 characters any shifting/removing will fit inside 20.

 

"Too much memory", possibly depending on number of variables though disk space is more likely the issue.

Moving extra bits around for read/ write operations may slow things down. If you only have a few variables and data sets of less than a 1000 records you may not notice the difference. When you get to 1000's of variables and millions of records then the performance is more likely to be noticeable.

 

Part of the reason is that depending on the functions used the length of results is unpredictable. Tranwrd for example could replace 3 characters with 100. Instead of spending a lot of time "guessing" from your code possibly results SAS picks 200 (which used to be the maximum length of character variables). Another reason is that with simple assignment statements the first use of the variable would set the length. So if your first result was small then other results later might not fit.

 

This is another example of the Maxim Know thy Data.

sas-innovate-white.png

🚨 Early Bird Rate Extended!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Lock in the best rate now before the price increases on April 1.

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 566 views
  • 0 likes
  • 3 in conversation