BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

Please see code to create a data set with long char variables.

In way1  It is written  informat myline $200. lineTYpe $20.;

My question: How can I know that the right number to write is $200 and $20?  Do you count it with your eyes in order to get  these numbers?

 

In way2 it is written input myLine $74. lineType $20.;

My question: How can I know that the right number to write is $74 and $20?

  Do you count it with your eyes in order to get  these numbers?

 

Also I want to ask if there are more ways to create this data set with long char vars

 
/*****Create a dataset with long strings****/
/*****Create a dataset with long strings****/
/*****Create a dataset with long strings****/
/**Way1**/
/*The default length for a SAS character variable is 8*/
/*Use informats to specify the maximum length possible */
data have;
informat myline $200. lineTYpe $20.;
infile datalines delimiter='|';
input myLine lineType ;
datalines;
L1. This is my very long line of unpredicted length. | line_one
L2. This is my second very long line of unpredicted length.| line_simple
L3. This is my third, third, third very long line of unpredicted length.| line_another
;
run;
/**Way2**/ /*This is called formatted input*/ /*SAS will read exactly the number of characters into a column that are specified, regardless of delimiters*/ data have; input myLine $74. lineType $20.; datalines; L1. This is my very long line of unpredicted length. line_one L2. This is my second very long line of unpredicted length. line_simple L3. This is my third, third, third very long line of unpredicted length. line_another ; run;
5 REPLIES 5
Kurt_Bremser
Super User

Text editors will show you column numbers, so you can use them to help in determining string lengths.

If in doubt, read first with a clearly too long length, then determine max(length(var)) in SQL, so you can set the correct length in a second read.

ballardw
Super User

 

 

Your data source or sharing agreement should specify the characteristics of the data files such as variable order, type, length, file type. So ask the source to provide documentation.

 

Relying on methods of content to set properties, such as Proc Import for delimited text files and the option Guessingrows=max examines the entire data set before setting lengths, can have serious problems with fields that are only occasionally populated. So you might think a field is supposed to be 1 character wide if the first file you read is missing data in that column. BUT when it shows up with data in a later version and it contains 500 characters your program would only read the first character.

Tom
Super User Tom
Super User

If you want to set the length for an unknown variable then set it to the maximum it will need to be to be useful for you.  What does it contain?  Is it a name? It is very unlikely a person will have a name that is longer than 200 bytes.  But it is very likely someone in your dataset will have a name that needs more than 10 bytes.

 

Your first data step should be setting the LENGTH of the variable, not attaching an INFORMAT to it.  Having the first place that SAS sees the variable be an INFORMAT statement will force the data step compiler to GUESS what length to define for the variable.  There is no use to attaching the $ informat to character variables, SAS already knows how to read character variables so it does not need special instructions for reading them.

 

Here is how you could read your pipe delimited text.

data have;
  length myline $200 lineTYpe $20 ;
  infile datalines dsd dlm='|' truncover ;
  input myLine lineType ;
datalines;
L1. This is my very long line of unpredicted length. | line_one
L2. This is my second very long line of unpredicted length.| line_simple
L3. This is my third, third, third very long line of unpredicted length.| line_another
;

If you know your data is in fixed columns then either turn on the RULER line in your editor, or use the LIST statement to dump a few lines to the SAS log and look at the RULER line it generates.  If it is just the check where field 2 starts in your example you might want to only dump a few of the lines.

data _null_;
  input ;
  list;
  if _n_ > 3 then stop;
datalines;
....
;

If you are reading variable length records with fixed format input then you should use the TRUNCOVER option on the infile statement.   If the data is in-line (DATALINES or CARDS) then the records will be a padded to a multiple of 80 bytes long.  But if some of the lines are less than 80 and some are more than 80 then you still might need to use an INFILE statement so that you can add the TRUNCOVER option.  If the variable needs to use a different length than the INPUT statement needs to use then just define the variables before referencing them.

data have;
  infile datalines truncover;
  length myLine $200 lineType $20 ;
  input myLine $74. lineType $20.;
datalines;
L1. This is my very long line of unpredicted length.                      line_one
L2. This is my second very long line of unpredicted length.               line_simple
L3. This is my third, third, third very long line of unpredicted length.  line_another
;

 

Ronein
Meteorite | Level 14
Thank you
In this code that you sent THE LENGTH and Input are with different numbers. Why? How did you choose to write 200 ,20,74?
data have;
infile datalines truncover;
length myLine $200 lineType $20 ;
input myLine $74. lineType $20.;
datalines;
Tom
Super User Tom
Super User

@Ronein wrote:
Thank you
In this code that you sent THE LENGTH and Input are with different numbers. Why? How did you choose to write 200 ,20,74?
data have;
infile datalines truncover;
length myLine $200 lineType $20 ;
input myLine $74. lineType $20.;
datalines;

Re-read the original post:

If the variable needs to use a different length than the INPUT statement needs to use then just define the variables before referencing them.

You are the one creating the code to make the dataset. So it is up to you to decide what you need. 

 

For example you might have multiple data step to read in multiple files ( from different sources/subjects/time periods etc) and want to standardize the dataset structure so that you can combine the resulting SAS datasets.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1262 views
  • 1 like
  • 4 in conversation