/*So neither version of code actually gives a length of 17 to Smoking_Status. It caps out at 14 in the output (I have noticed that 14 is the length of the string Smoking_Status). In either version of the code, if I move the length statement to the beginning of the data step the length is still incorrect. PLEASE HELP!*/
/*The key's solution:*/
libname certdata XLSX '/home/aaronbailey790/cert/heart.xlsx'; data work.heart;
set certdata.heart(drop=AgeAtDeath DeathCause); where Status='Alive';
if AgeCHDdiag=. then delete;
length Smoking_Status $17;
if 0<=Smoking<6 then Smoking_Status='Non-Smoker (0-5)';
else if 6<=Smoking<=15 then Smoking_Status='Moderate (6-15)';
else if 16<=Smoking<=25 then Smoking_Status='Heavy (16-25)';
else if Smoking>25 then Smoking_Status='Very Heavy (> 25)'; else Smoking_Status='Error';
run;
\*My solution*\
proc import datafile = '/home/aaronbailey790/cert/heart.xlsx'
dbms = xlsx
out = work.temp
replace;
run;
data work.heart (drop = AgeAtDeath DeathCause);
set work.temp;
length Smoking_Status $17;
if 0<= Smoking < 6 then Smoking_Status = 'None (0-5)';
else if 6 <= Smoking <=15 then Smoking_Status = 'Moderate (6-15)';
else if 16 <= Smoking <=25 then Smoking_Status = 'Very Heavy (>25)';
else Smoking_Status = 'Error';
if AgeCHDdiag = . then delete;
if Status = 'Alive' then output;
run;
@aaronbailey79 wrote:
/*So neither version of code actually gives a length of 17 to Smoking_Status. It caps out at 14 in the output (I have noticed that 14 is the length of the string Smoking_Status). In either version of the code, if I move the length statement to the beginning of the data step the length is still incorrect. PLEASE HELP!*/
/*The key's solution:*/
libname certdata XLSX '/home/aaronbailey790/cert/heart.xlsx'; data work.heart;
set certdata.heart(drop=AgeAtDeath DeathCause); where Status='Alive';
if AgeCHDdiag=. then delete;
length Smoking_Status $17;if 0<=Smoking<6 then Smoking_Status='Non-Smoker (0-5)';
else if 6<=Smoking<=15 then Smoking_Status='Moderate (6-15)';else if 16<=Smoking<=25 then Smoking_Status='Heavy (16-25)';
else if Smoking>25 then Smoking_Status='Very Heavy (> 25)'; else Smoking_Status='Error';
run;
\*My solution*\
proc import datafile = '/home/aaronbailey790/cert/heart.xlsx'
dbms = xlsx
out = work.temp
replace;
run;
data work.heart (drop = AgeAtDeath DeathCause);
set work.temp;
length Smoking_Status $17;
if 0<= Smoking < 6 then Smoking_Status = 'None (0-5)';
else if 6 <= Smoking <=15 then Smoking_Status = 'Moderate (6-15)';
else if 16 <= Smoking <=25 then Smoking_Status = 'Very Heavy (>25)';
else Smoking_Status = 'Error';
if AgeCHDdiag = . then delete;
if Status = 'Alive' then output;
run;
Please describe exactly how you have determined that the length is not 17.
If you are looking at a table viewer there will not be any indication of trailing blanks and the column boundaries typically adjust not to show such.
Same with proc print or other output.
data work.example; input smoking; length Smoking_Status $17; if 0<= Smoking < 6 then Smoking_Status = 'None (0-5)'; else if 6 <= Smoking <=15 then Smoking_Status = 'Moderate (6-15)'; else if 16 <= Smoking <=25 then Smoking_Status = 'Very Heavy (>25)'; else Smoking_Status = 'Error'; datalines; 0 1 2 3 6 7 10 15 16 20 25 26 ; Proc contents data=work.example; run;
And the variable descriptions from the above:
Alphabetic List of Variables and Attributes | |||
---|---|---|---|
# | Variable | Type | Len |
2 | Smoking_Status | Char | 17 |
1 | smoking | Num | 8 |
When I look at proc contents (after moving the length statement to the beginning of the data step),
I see the length is 17, but the informat is $14. I will change that directly.
If you do not specify INFORMAT then SAS will assign one based on how the values were assigned.
Informat, format and a length are related but depending on the actual values you may not see the possible differences. If you assign a Format much shorter than the length of a variable you might (not will only might) see truncated values. You only will see the truncation if there are actually more characters than the format is going to display. The other way, a long format and few characters may, depending on the display used, show a lot of space after the actually present characters.
It can depend on the ODS destination used whether you see any effect of different formats.
And the format is 14$.
Got it. I had to follow the length statement with a format statement:
length Smoking_Status $17;
format Smoking_Status $17.;
How I knew there was a problem: The value in a cell should have been
Moderate (6-15)
but it was only
Moderate (6-15
The same thing happened in other places.
@aaronbailey79 wrote:
How I knew there was a problem: The value in a cell should have been
Moderate (6-15)
but it was only
Moderate (6-15
The same thing happened in other places.
Yep. Format only displaying part of the value because of the assumed format assigned.
An exercise for the interested reader is don't use a length statement and see what happens with this block:
if smoking < 0 then Smoking_Status='Error'; else if 0<=Smoking<6 then Smoking_Status='Non-Smoker (0-5)'; else if 6<=Smoking<=15 then Smoking_Status='Moderate (6-15)'; else if 16<=Smoking<=25 then Smoking_Status='Heavy (16-25)'; else if Smoking>25 then Smoking_Status='Very Heavy (> 25)';
Make guess before running the code. Then check on the results and see what proc contents shows.
Note: this is an important point and will bite you at some point. Also one of the reasons to consider custom formats instead of creating text variables when a value only depends on a single variable.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.