- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Query 1:
For the below data set the value is coming as CAPON
data cha;
input name $6.;
datelines;
CAPONE
;
run;
but when I change the format to name $7. I am getting value asCAPONE
what is the reason here for this trunctaion. Please explain.
Query 2.
data chg;
set cha;
y=length(CAPONE);
do i =1 to length(CAPONE);
z=substr("CAPONE",i,1);
output;
end;
run;
here the value of y = 12 . Why its coming like 12 .Please explain.
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So your first data step is reading the first 6 bytes from the line into NAME. If you are getting 'CAPON' and not 'CAPONE' then the letter C is in the second column of the line and not the first. The $ informat removes leading spaces.
data cha;
input name $6.;
datelines;
CAPONE
;
635 data cha; 636 input name $6.; 637 datelines; --------- 14 WARNING 14-169: Assuming the symbol DATALINES was misspelled as datelines. NOTE: The data set WORK.CHA has 1 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.07 seconds cpu time 0.04 seconds 639 ; 640 641 proc print; 642 run; NOTE: There were 1 observations read from the data set WORK.CHA. NOTE: PROCEDURE PRINT used (Total process time): real time 0.10 seconds cpu time 0.03 seconds
Note that your first data step is not using any FORMATs. It is using the INFORMAT of $6. If you change that to use the $7. informat instead then the first 7 characters of the line will be read. Also since you did not tell SAS explicitly what type or length to use to define NAME is it guessing you want it have a type and length that matches in the informat you used in the INPUT statement where you first referenced NAME.
Your second data step is referencing a variable CAPONE that does not exist. SAS will default CAPONE to NUMERIC. You then use it as input to a function that expects a string so SAS will convert it to a string using the BEST12. format. So the LENGTH() of that string will be 12 since SAS right aligns the strings generated by numeric formats by default.
The DO loop is then pulling out the letters from the string constant 'CAPONE'. You will get errors when I is larger than 6 since that string only takes 6 bytes.
643 644 data chg; 645 set cha; 646 y=length(CAPONE); 647 do i =1 to length(CAPONE); 648 z=substr("CAPONE",i,1); 649 output; 650 end; 651 run; NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 646:12 647:21 NOTE: Variable CAPONE is uninitialized. NOTE: Invalid second argument to function SUBSTR at line 648 column 7. NOTE: Invalid second argument to function SUBSTR at line 648 column 7. NOTE: Invalid second argument to function SUBSTR at line 648 column 7. NOTE: Invalid second argument to function SUBSTR at line 648 column 7. NOTE: Invalid second argument to function SUBSTR at line 648 column 7. NOTE: Invalid second argument to function SUBSTR at line 648 column 7. name=CAPON y=12 CAPONE=. i=13 z= _ERROR_=1 _N_=1 NOTE: There were 1 observations read from the data set WORK.CHA. NOTE: The data set WORK.CHG has 12 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 652 653 proc print; 654 run; NOTE: There were 12 observations read from the data set WORK.CHG. NOTE: PROCEDURE PRINT used (Total process time): real time 0.02 seconds cpu time 0.01 seconds
Obs name 1 CAPON Obs name y CAPONE i z 1 CAPON 12 . 1 C 2 CAPON 12 . 2 A 3 CAPON 12 . 3 P 4 CAPON 12 . 4 O 5 CAPON 12 . 5 N 6 CAPON 12 . 6 E 7 CAPON 12 . 7 8 CAPON 12 . 8 9 CAPON 12 . 9 10 CAPON 12 . 10 11 CAPON 12 . 11 12 CAPON 12 . 12
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data chg;
set cha;
y=length(name);
do i =1 to length(name);
z=substr(name,i,1);
output;
end;
run;
Now the length is coming as 6 and looping is also till 6 . So its correct .
Can anyone please explain the all the three conditions ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your CHA data set has a single variable - NAME.
It has one value, CAPONE.
Your code is referencing CAPONE as both a variable and as a string. I think you shoudl be using the NAME there (very bad variable name and easily leads to confusion).
data chg;
set cha;
y=length(name);
do i =1 to length(name);
z=substr(name, i, 1);
output;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My doubt is in above 2 query why the length value is coming different :
Query 1:
For the below data set the value is coming as CAPON
data cha;
input name $6.;
datelines;
CAPONE
;
run;
but when I change the format to name $7. I am getting value as CAPONE
what is the reason here for this truncation. Please explain.
Query 2.
data chg;
set cha;
y=length(CAPONE);
do i =1 to length(CAPONE);
z=substr("CAPONE",i,1);
output;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So your first data step is reading the first 6 bytes from the line into NAME. If you are getting 'CAPON' and not 'CAPONE' then the letter C is in the second column of the line and not the first. The $ informat removes leading spaces.
data cha;
input name $6.;
datelines;
CAPONE
;
635 data cha; 636 input name $6.; 637 datelines; --------- 14 WARNING 14-169: Assuming the symbol DATALINES was misspelled as datelines. NOTE: The data set WORK.CHA has 1 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.07 seconds cpu time 0.04 seconds 639 ; 640 641 proc print; 642 run; NOTE: There were 1 observations read from the data set WORK.CHA. NOTE: PROCEDURE PRINT used (Total process time): real time 0.10 seconds cpu time 0.03 seconds
Note that your first data step is not using any FORMATs. It is using the INFORMAT of $6. If you change that to use the $7. informat instead then the first 7 characters of the line will be read. Also since you did not tell SAS explicitly what type or length to use to define NAME is it guessing you want it have a type and length that matches in the informat you used in the INPUT statement where you first referenced NAME.
Your second data step is referencing a variable CAPONE that does not exist. SAS will default CAPONE to NUMERIC. You then use it as input to a function that expects a string so SAS will convert it to a string using the BEST12. format. So the LENGTH() of that string will be 12 since SAS right aligns the strings generated by numeric formats by default.
The DO loop is then pulling out the letters from the string constant 'CAPONE'. You will get errors when I is larger than 6 since that string only takes 6 bytes.
643 644 data chg; 645 set cha; 646 y=length(CAPONE); 647 do i =1 to length(CAPONE); 648 z=substr("CAPONE",i,1); 649 output; 650 end; 651 run; NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 646:12 647:21 NOTE: Variable CAPONE is uninitialized. NOTE: Invalid second argument to function SUBSTR at line 648 column 7. NOTE: Invalid second argument to function SUBSTR at line 648 column 7. NOTE: Invalid second argument to function SUBSTR at line 648 column 7. NOTE: Invalid second argument to function SUBSTR at line 648 column 7. NOTE: Invalid second argument to function SUBSTR at line 648 column 7. NOTE: Invalid second argument to function SUBSTR at line 648 column 7. name=CAPON y=12 CAPONE=. i=13 z= _ERROR_=1 _N_=1 NOTE: There were 1 observations read from the data set WORK.CHA. NOTE: The data set WORK.CHG has 12 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 652 653 proc print; 654 run; NOTE: There were 12 observations read from the data set WORK.CHG. NOTE: PROCEDURE PRINT used (Total process time): real time 0.02 seconds cpu time 0.01 seconds
Obs name 1 CAPON Obs name y CAPONE i z 1 CAPON 12 . 1 C 2 CAPON 12 . 2 A 3 CAPON 12 . 3 P 4 CAPON 12 . 4 O 5 CAPON 12 . 5 N 6 CAPON 12 . 6 E 7 CAPON 12 . 7 8 CAPON 12 . 8 9 CAPON 12 . 9 10 CAPON 12 . 10 11 CAPON 12 . 11 12 CAPON 12 . 12
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Aexor wrote:
Query 1:
For the below data set the value is coming as CAPON
data cha;
input name $6.;
datelines;
CAPONE
;
run;but when I change the format to name $7. I am getting value asCAPONE
what is the reason here for this trunctaion. Please explain.
Query 2.
data chg;
set cha;
y=length(CAPONE);
do i =1 to length(CAPONE);
z=substr("CAPONE",i,1);
output;
end;
run;here the value of y = 12 . Why its coming like 12 .Please explain.
Thanks!
Just exactly how are you looking at the data to determine the value is CAPON and not CAPONE?
It is not uncommon in a table viewer that the column width does not match the length of a variable and it can appear truncated when in fact it just a viewer issue.
Since your data set does not have a variable named CAPONE you created one with the statement (unless it is a different CHA data set) and without something defining it as character it is a NUMERIC variable. The length function
y=length(CAPONE);
From the documentation for the Length function
. If string is a numeric constant, variable, or expression (either initialized or uninitialized), SAS automatically converts the numeric value to a right-justified character string by using the BEST12. format. In this case, LENGTH returns a value of 12 and writes a note in the SAS log stating that the numeric values have been converted to character values.
If your log shows a note about numeric conversion to character on the line with the Y= statement that is exactly what happened. When I run your code to create Cha and then Chg this is the result from the LOG;
301 data cha; 302 input name $6.; 303 datalines; NOTE: The data set WORK.CHA has 1 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 305 ; 306 data chg; 307 set cha; 308 y=length(CAPONE); 309 do i =1 to length(CAPONE); 310 z=substr("CAPONE",i,1); 311 output; 312 end; 313 run; NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 308:13 309:22 NOTE: Variable CAPONE is uninitialized. NOTE: Invalid second argument to function SUBSTR at line 310 column 9. NOTE: Invalid second argument to function SUBSTR at line 310 column 9. NOTE: Invalid second argument to function SUBSTR at line 310 column 9. NOTE: Invalid second argument to function SUBSTR at line 310 column 9. NOTE: Invalid second argument to function SUBSTR at line 310 column 9. NOTE: Invalid second argument to function SUBSTR at line 310 column 9. name=CAPONE y=12 CAPONE=. i=13 z= _ERROR_=1 _N_=1 NOTE: There were 1 observations read from the data set WORK.CHA. NOTE: The data set WORK.CHG has 12 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
I have highlighted the line numbers so you can see where the "conversion" took place, also that Capone is uninitialized (that means a value is never encountered) and all the invalid data comments when looping over the (empty) variable Capone.
Did you read the LOG? That is quite often the quickest way to determine why something is or is not happening.
Such as you have DATALINES misspelled in the first data step and it won't execute at all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can check for yourself whether the data actually contains a leading blank as in the example below:
data cha;
input name $char6.;
do i=1 to 6;
letter = substr(name, i, 1);
put i= letter=;
end;
datelines;
CAPONE
;
The $CHAR informat will preserve any leading blanks that appear in the incoming data. That's likely the scenario, and you can run this type of program to easily confirm whether this is the problem. As @Tom mentioned, the $6. informat reads six characters only, but left-hand justifies whatever it finds within those six characters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Also the main message windows on this forum reformat text seriously.
Consider that this code typed into my editor and pasted into a text box opened on this forum with </> icon above the message window.
data cha; input name $6.; datalines; CAPONE CAPONE CAPONE CAPONE CAPONE ; run;
will run with CAPONE losing characters after the first value because the input statement when used as Input Name $6. tells SAS to read exactly 6 characters from the current position of the input pointer. Which will be the first column at each iteration of the input statement.
This is what happens when the exact same code is pasted into the main message window:
data cha;
input name $6.;
datalines;
CAPONE
CAPONE
CAPONE
CAPONE
CAPONE
;
run;
Notice that all of the lines with spaces have been moved left. So what you showed for the first data step may not be what you ran at all.
You can modify the Input statement to tell SAS to skip the default space delimiters until the first non-space is encountered buy using the modifier : before the informat.
data cha; input name :$6.; datalines; CAPONE CAPONE CAPONE CAPONE CAPONE ; run;
which generates 5 values of CAPONE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks much for valuable inputs
- Tags:
- nks