BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Aexor
Lapis Lazuli | Level 10

Query 1:

For the below data set the value is coming as CAPON

data cha;
input name $6.;
datelines;
CAPONE
;
run;

but when I change the format to name $7. I am getting value asCAPONE
what is the reason here for this trunctaion. Please explain.

 

Query 2.

 

data chg;
set cha;
y=length(CAPONE);
do i =1 to length(CAPONE);
z=substr("CAPONE",i,1);
output;
end;
run;

here the value of y = 12 . Why its coming like 12 .Please explain.

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So your first data step is reading the first 6 bytes from the line into NAME.  If you are getting 'CAPON' and not 'CAPONE' then the letter C is in the second column of the line and not the first.  The $ informat removes leading spaces. 

data cha;
  input name $6.;
datelines;
 CAPONE
;
635   data cha;
636     input name $6.;
637   datelines;
      ---------
      14

WARNING 14-169: Assuming the symbol DATALINES was misspelled as datelines.

NOTE: The data set WORK.CHA has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.04 seconds


639   ;
640
641   proc print;
642   run;

NOTE: There were 1 observations read from the data set WORK.CHA.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.10 seconds
      cpu time            0.03 seconds

Note that your first data step is not using any FORMATs.  It is using the INFORMAT of $6.  If you change that to use the $7. informat instead then the first 7 characters of the line will be read.  Also since you did not tell SAS explicitly what type or length to use to define NAME is it guessing you want it have a type and length that matches in the informat you used in the INPUT statement where you first referenced NAME.

 

Your second data step is referencing a variable CAPONE that does not exist.  SAS will default CAPONE to NUMERIC.  You then use it as input to a function that expects a string so SAS will convert it to a string using the BEST12. format.  So the LENGTH() of that string will be 12 since SAS right aligns the strings generated by numeric formats by default.

 

The DO loop is then pulling out the letters from the string constant 'CAPONE'.  You will get errors when I is larger than 6 since that string only takes 6 bytes.

643
644   data chg;
645     set cha;
646     y=length(CAPONE);
647     do i =1 to length(CAPONE);
648       z=substr("CAPONE",i,1);
649     output;
650     end;
651   run;

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
      646:12   647:21
NOTE: Variable CAPONE is uninitialized.
NOTE: Invalid second argument to function SUBSTR at line 648 column 7.
NOTE: Invalid second argument to function SUBSTR at line 648 column 7.
NOTE: Invalid second argument to function SUBSTR at line 648 column 7.
NOTE: Invalid second argument to function SUBSTR at line 648 column 7.
NOTE: Invalid second argument to function SUBSTR at line 648 column 7.
NOTE: Invalid second argument to function SUBSTR at line 648 column 7.
name=CAPON y=12 CAPONE=. i=13 z=  _ERROR_=1 _N_=1
NOTE: There were 1 observations read from the data set WORK.CHA.
NOTE: The data set WORK.CHG has 12 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


652
653   proc print;
654   run;

NOTE: There were 12 observations read from the data set WORK.CHG.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds
Obs    name

 1     CAPON

Obs    name      y    CAPONE     i    z

  1    CAPON    12       .       1    C
  2    CAPON    12       .       2    A
  3    CAPON    12       .       3    P
  4    CAPON    12       .       4    O
  5    CAPON    12       .       5    N
  6    CAPON    12       .       6    E
  7    CAPON    12       .       7
  8    CAPON    12       .       8
  9    CAPON    12       .       9
 10    CAPON    12       .      10
 11    CAPON    12       .      11
 12    CAPON    12       .      12

 

View solution in original post

8 REPLIES 8
Aexor
Lapis Lazuli | Level 10
Addition to this :

data chg;
set cha;
y=length(name);
do i =1 to length(name);
z=substr(name,i,1);
output;
end;
run;

Now the length is coming as 6 and looping is also till 6 . So its correct .

Can anyone please explain the all the three conditions ?
Reeza
Super User

Your CHA data set has a single variable - NAME. 

It has one value, CAPONE.

 

Your code is referencing CAPONE as both a variable and as a string. I think you shoudl be using the NAME there (very bad variable name and easily leads to confusion).

 

data chg;
set cha;
y=length(name);
do i =1 to length(name);
z=substr(name, i, 1);
output;
end;
run;
Aexor
Lapis Lazuli | Level 10
yes I used Name only in my third query and output came perfect.

My doubt is in above 2 query why the length value is coming different :
Query 1:

For the below data set the value is coming as CAPON

data cha;
input name $6.;
datelines;
CAPONE
;
run;

but when I change the format to name $7. I am getting value as CAPONE
what is the reason here for this truncation. Please explain.



Query 2.



data chg;
set cha;
y=length(CAPONE);
do i =1 to length(CAPONE);
z=substr("CAPONE",i,1);
output;
end;
run;
Tom
Super User Tom
Super User

So your first data step is reading the first 6 bytes from the line into NAME.  If you are getting 'CAPON' and not 'CAPONE' then the letter C is in the second column of the line and not the first.  The $ informat removes leading spaces. 

data cha;
  input name $6.;
datelines;
 CAPONE
;
635   data cha;
636     input name $6.;
637   datelines;
      ---------
      14

WARNING 14-169: Assuming the symbol DATALINES was misspelled as datelines.

NOTE: The data set WORK.CHA has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.04 seconds


639   ;
640
641   proc print;
642   run;

NOTE: There were 1 observations read from the data set WORK.CHA.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.10 seconds
      cpu time            0.03 seconds

Note that your first data step is not using any FORMATs.  It is using the INFORMAT of $6.  If you change that to use the $7. informat instead then the first 7 characters of the line will be read.  Also since you did not tell SAS explicitly what type or length to use to define NAME is it guessing you want it have a type and length that matches in the informat you used in the INPUT statement where you first referenced NAME.

 

Your second data step is referencing a variable CAPONE that does not exist.  SAS will default CAPONE to NUMERIC.  You then use it as input to a function that expects a string so SAS will convert it to a string using the BEST12. format.  So the LENGTH() of that string will be 12 since SAS right aligns the strings generated by numeric formats by default.

 

The DO loop is then pulling out the letters from the string constant 'CAPONE'.  You will get errors when I is larger than 6 since that string only takes 6 bytes.

643
644   data chg;
645     set cha;
646     y=length(CAPONE);
647     do i =1 to length(CAPONE);
648       z=substr("CAPONE",i,1);
649     output;
650     end;
651   run;

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
      646:12   647:21
NOTE: Variable CAPONE is uninitialized.
NOTE: Invalid second argument to function SUBSTR at line 648 column 7.
NOTE: Invalid second argument to function SUBSTR at line 648 column 7.
NOTE: Invalid second argument to function SUBSTR at line 648 column 7.
NOTE: Invalid second argument to function SUBSTR at line 648 column 7.
NOTE: Invalid second argument to function SUBSTR at line 648 column 7.
NOTE: Invalid second argument to function SUBSTR at line 648 column 7.
name=CAPON y=12 CAPONE=. i=13 z=  _ERROR_=1 _N_=1
NOTE: There were 1 observations read from the data set WORK.CHA.
NOTE: The data set WORK.CHG has 12 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


652
653   proc print;
654   run;

NOTE: There were 12 observations read from the data set WORK.CHG.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds
Obs    name

 1     CAPON

Obs    name      y    CAPONE     i    z

  1    CAPON    12       .       1    C
  2    CAPON    12       .       2    A
  3    CAPON    12       .       3    P
  4    CAPON    12       .       4    O
  5    CAPON    12       .       5    N
  6    CAPON    12       .       6    E
  7    CAPON    12       .       7
  8    CAPON    12       .       8
  9    CAPON    12       .       9
 10    CAPON    12       .      10
 11    CAPON    12       .      11
 12    CAPON    12       .      12

 

ballardw
Super User

@Aexor wrote:

Query 1:

For the below data set the value is coming as CAPON

data cha;
input name $6.;
datelines;
CAPONE
;
run;

but when I change the format to name $7. I am getting value asCAPONE
what is the reason here for this trunctaion. Please explain.

 

Query 2.

 

data chg;
set cha;
y=length(CAPONE);
do i =1 to length(CAPONE);
z=substr("CAPONE",i,1);
output;
end;
run;

here the value of y = 12 . Why its coming like 12 .Please explain.

Thanks!


Just exactly how are you looking at the data to determine the value is CAPON and not CAPONE?

 

It is not uncommon in a table viewer that the column width does not match the length of a variable and it can appear truncated when in fact it just a viewer issue.

 

Since your data set does not have a variable named CAPONE you created one with the statement (unless it is a different CHA data set) and without something defining it as character it is a NUMERIC variable. The length function

y=length(CAPONE);

From the documentation for the Length function

. If string is a numeric constant, variable, or expression (either initialized or uninitialized), SAS automatically converts the numeric value to a right-justified character string by using the BEST12. format. In this case, LENGTH returns a value of 12 and writes a note in the SAS log stating that the numeric values have been converted to character values.

If your log shows a note about numeric conversion to character on the line with the Y= statement that is exactly what happened. When I run your code to create Cha and then Chg this is the result from the LOG;

301  data cha;
302  input name $6.;
303  datalines;

NOTE: The data set WORK.CHA has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


305  ;
306  data chg;
307     set cha;
308     y=length(CAPONE);
309     do i =1 to length(CAPONE);
310        z=substr("CAPONE",i,1);
311        output;
312     end;
313  run;

NOTE: Numeric values have been converted to character values at the places given by:
      (Line):(Column).
      308:13   309:22
NOTE: Variable CAPONE is uninitialized.
NOTE: Invalid second argument to function SUBSTR at line 310 column 9.
NOTE: Invalid second argument to function SUBSTR at line 310 column 9.
NOTE: Invalid second argument to function SUBSTR at line 310 column 9.
NOTE: Invalid second argument to function SUBSTR at line 310 column 9.
NOTE: Invalid second argument to function SUBSTR at line 310 column 9.
NOTE: Invalid second argument to function SUBSTR at line 310 column 9.
name=CAPONE y=12 CAPONE=. i=13 z=  _ERROR_=1 _N_=1
NOTE: There were 1 observations read from the data set WORK.CHA.
NOTE: The data set WORK.CHG has 12 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

I have highlighted the line numbers so you can see where the "conversion" took place, also that Capone is uninitialized (that means a value is never encountered) and all the invalid data comments when looping over the (empty) variable Capone.

 

Did you read the LOG? That is quite often the quickest way to determine why something is or is not happening.

Such as you have DATALINES misspelled in the first data step and it won't execute at all.

 

 

 

Astounding
PROC Star

You can check for yourself whether the data actually contains a leading blank as in the example below:

data cha;
input name $char6.;
do i=1 to 6;
   letter = substr(name, i, 1);
   put i= letter=;
end;
datelines;
 CAPONE
;

The $CHAR informat will preserve any leading blanks that appear in the incoming data.  That's likely the scenario, and you can run this type of program to easily confirm whether this is the problem.  As @Tom mentioned, the $6. informat reads six characters only, but left-hand justifies whatever it finds within those six characters.

ballardw
Super User

Also the main message windows on this forum reformat text seriously.

Consider that this code typed into my editor and pasted into a text box opened on this forum with </> icon above the message window.

data cha;
input name $6.;
datalines;
CAPONE
 CAPONE
  CAPONE
   CAPONE
    CAPONE
;
run;

will run with CAPONE losing characters after the first value because the input statement when used as Input Name $6. tells SAS to read exactly 6 characters from the current position of the input pointer. Which will be the first column at each iteration of the input statement.

This is what happens when the exact same code is pasted into the main message window:

 

data cha;
input name $6.;
datalines;
CAPONE
CAPONE
CAPONE
CAPONE
CAPONE
;
run;

 

Notice that all of the lines with spaces have been moved left. So what you showed for the first data step may not be what you ran at all.

 

You can modify the Input statement to tell SAS to skip the default space delimiters until the first non-space is encountered buy using the modifier : before the informat.

data cha;
input name :$6.;
datalines;
CAPONE
 CAPONE
  CAPONE
   CAPONE
    CAPONE
;
run;

which generates 5 values of CAPONE.

 

Aexor
Lapis Lazuli | Level 10

Thanks much for valuable inputs 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 965 views
  • 5 likes
  • 5 in conversation