BookmarkSubscribeRSS Feed
evp000
Quartz | Level 8

Hi all,

 

Say I have this data in a csv file (C:\Users\xxxx\Desktop\test.csv):

 

/*
co11, longstring, col3
a,one two three four,b
c,five six seven eight nine,d
*/

 

When I read it with an input statement, I have to specify a length (or informat) but I want it to take the length of the longest value. 

If I use proc import the length is longer than the longest value.  Why?  

 

proc import datafile = 'C:\Users\xxxx\Desktop\test.csv'
out = test3
dbms = csv
replace;
getnames = yes;
run;


proc print; run;
proc contents; run;            /* length = 15 */ 

 

proc sql;
select max(length(longstring))             /* = 14 */ 
from test3;
quit;

 

How would I read it so that the length is 14?  I've tested this on other data too, where the longest value is 24 and the length created is 26. 

 

Thanks. 

 

16 REPLIES 16
FreelanceReinh
Jade | Level 19

Hi, this is an interesting question. However, I was not able to replicate your result. With your sample .csv data I obtain length 25 from both PROC CONTENTS and PROC SQL -- in perfect agreement with the fact that "five six seven eight nine" is a string consisting of 25 characters. (Actually, the names of the second and third variable were prefixed by an underscore due to the leading blanks in the first line of your data.)

evp000
Quartz | Level 8

Thanks for testing it.  I had shortened the text when I ran the test, but I put it back to the example and now I get: 

 

longstring Char 29 $29. $29.

 

longest value:
25

 

I checked for leading or trailing blanks (in notepad) and don't see any.

FreelanceReinh
Jade | Level 19

Can you perhaps attach the small sample .csv file, so that we can check if the same discrepancies occur using our SAS installations?

evp000
Quartz | Level 8

Here it is.  change the 'txt' back to 'csv'. 

 

Thanks again for checking.

 

 

FreelanceReinh
Jade | Level 19

Thanks for providing the file. Using your PROC IMPORT code I still get length 25 in both cases:

longstring Char 25 $25. $25.

 

I have also opened the file with a hex editor and haven't found anything suspicious.

 

Finally, I retyped the file with Notepad and compared it to your file with a comparison tool and found no difference.

 

So, it's still a mystery to me what happened on your side.

 

 

FreelanceReinh
Jade | Level 19

Just an idea: Could it be a DBCS issue?

What do you get from %put %sysfunc(getoption(dbcs)); ?

evp000
Quartz | Level 8

NODBCS

FreelanceReinh
Jade | Level 19

Ok, thanks.

 

When you provided the sample file you wrote "change the 'txt' back to 'csv'". This sounds to me as if the file you attached was not the original .csv file from your PROC IMPORT experiment. And this, in turn, would obviously open the possibility that some strange characterisic of the .csv file got lost in the process of "converting" it to a .txt file. How did you create the .txt from the .csv file?

evp000
Quartz | Level 8

I just copied the file and renamed it with the txt extension, since you can't attach csv files. 

evp000
Quartz | Level 8

Ok, thanks for trying it out. 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Why do you need to make it length of the longest variable out of interest, it really doesn't matter that much.

 

As for length of the variable this would of course be defined up front in you Data Transfer Specification agreement which will detail the strcuture of the data being sent, you do have one yes?  Take the length from there.

 

If not, then you will need to read the file, and then post process it with some logic, howevedo suggest you look at transfer agreements as a better, more robust solution:

 

proc sql;
  create table LENGTHS as
  select "NAME" as NAME,max(length(NAME)) as M from SASHELP.CLASS;
quit;

data _null_;
  set lengths;
  call execute('data want; length '||strip(name)||' $'||strip(put(m,best.))||'.; set sashelp.class; run;');
run;
evp000
Quartz | Level 8

It's really just a QC issue.  We are to keep the "original" length, but of course that length is going to be different depending on how the data was read.  In this case the production version and the QC version don't match.  

 

Yes, thanks, I know I can do post-processing to change it.

 

For our data transfer specs, length is not specified if the data comes in a csv file.  

 

Thanks for your INPUT.

 

 

Reeza
Super User
If this is a file that is going to be read across a testing and prod environment it seems worth it to me to write a data step to import the data, rather than rely on Proc Import.

The data step will allow you to specify the lengths as desired.

I think the expectation that PROC IMPORT will produce the same results across different environments is a big assumption and can't be trusted.
evp000
Quartz | Level 8

Yes, specifying the length is what I'm pushing for.  And it would be consistent across projects. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1757 views
  • 0 likes
  • 5 in conversation