BookmarkSubscribeRSS Feed
michokwu
Quartz | Level 8

Hello,

I read a csv file to SAS and one of the variables has two lengths. I tried to fix it by including a length statement but I got the same result. Is there a way to fix it? It is a character variable and the length should be 6. Thank you.  

 

data new; /*without the length statement*/
        infile 'path/dataset.csv';
        informat variable1 $6.;
        input variable1 $;
run;

data new; /*with the length statement*/
        infile 'path/dataset.csv';
        informat variable1 $6.;
        length variable1 $6.
        input variable1 $;
distlen=length(variable1); run;

 

frequency distribution of lengthfrequency distribution of length

 

7 REPLIES 7
Tom
Super User Tom
Super User

The LENGTH() function returns the length of the value, not how the variable is defined.

SAS stores all character variables as fixed length so the LENGTH() function ignores and trailing spaces.

Is there some issue you are actually having with your data?  

michokwu
Quartz | Level 8
Thank you. I wouldn't say issues, just that some variables were stored in a wrong format.
I only used the length() function to create a variable that I could run a proc freq on. I tried to attach a pic of the output
ballardw
Super User

@michokwu wrote:
Thank you. I wouldn't say issues, just that some variables were stored in a wrong format.
I only used the length() function to create a variable that I could run a proc freq on. I tried to attach a pic of the output

The number one cause of "stored in a wrong format" is using Proc format or one of the widgets that call the proc to read the data. The procedure examines only a few rows of data. If all of the values for a variable in those few rows contain only digits then almost certainly the value will be numeric even if you think it should be character such as to preserve leading zeroes in Zip codes or account numbers.

Character variable lengths are set as well using only the first few rows. Depending on the file type involved that could be as few as 8. Which often leads to the values being shorter than needed/expected.

 

Does this sound like a possible cause?

andreas_lds
Jade | Level 19

The length of an alphanumeric variable is the number of chars it can contain (using single-byte encoding). The function length returns the number of chars actually stored, trailing blanks aren't considered.

Right now both steps you have posted create the same dataset, so the second step always overwrites the result of the first step.

michokwu
Quartz | Level 8
Thanks. Yes, both steps create the same dataset, the second step was an attempt to fix the issue with a variable having two lengths
Tom
Super User Tom
Super User

Why do you have the INFORMAT statement?  An informat is instructions for how to read text into values. SAS does not need any special instructions for reading text into text variables.  If you want to tell SAS how much storage it should reserve to store the variable use the LENGTH statement.

 

Your second data step is missing a semi-colon.  So you are never reading any data from the CSV file. Instead you are attaching the $ informat to the variables named INPUT and VARIABLE1 ;

michokwu
Quartz | Level 8
Thanks for pointing that out the missing semi-colon. It's an error from copying and pasting it here.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1438 views
  • 0 likes
  • 4 in conversation