Hello
I am creating a data set .
I want to ask a few questions:
1-W is a char var.
Is it mandatory to put $ in input statement after var name w?
I am using length statement so I think that then it is not essential to add $ in input statement
2-In length statement should it be written as $4. or $4 or $ 4 (with space between $ and 4)?
3-In which case we use length statement when we create a char var?
Is it essential to use length statement or just if the char var is long?(what is minimum length of char var that is the required to define it on length statement?)
4-How do you know what is the required length that can be used?
Should I count letters in the longest value and then put this number in length statement?
What happen if for example the required length is 4 but I define length of 100?
Is it just a loose of memory?
DATA tbl;
Length W $ 4;
input X W;
cards;
789 1234
009 0009
1 9999
;
Run;
Here are some answers:
Length and Informat while related are not the same thing.
Especially in the world (old I know but still valid) of fixed column input.
There are times when you may have to read data with a different informat for one file even while setting the length longer than the actual data.
Consider these three data steps that attempt to read a fixed column data where the first three characters should be read into one variable and the next 3 into a different numeric variable.
data example; length x $ 20; input x y; datalines; abc123 ; data example2; length x $ 20; input x @4 y; datalines; abc123 ; data example3; length x $ 20; input x $3. y; datalines; abc123 ;
One issue with assigning longer length than needed is the results of some functions will pad the result with the missing characters. Consider the following code:
data _null_; length x $ 20; x='abc'; y = quote(x); put y=; run;
When you run the above step the log will show a result of
y="abc "
Notice all of the spaces after the c before the closing quote character.
This will happen with a large number of character functions.
Which is why the Strip() function (or older combination of trim(left(var)) ) is used.
The final bit is "know thy data". If you have a document describing a data source that indicates the longest value that will be in a field then I would follow that document.
Sometimes the whole process is iterative because your data isn't documented and you have to make some guesses. Some of those guesses will be wrong and you may have to go back and change things to accommodate later knowledge.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.