12-04-2015 09:58 AM
Say I have this data in a csv file (C:\Users\xxxx\Desktop\test.csv):
co11, longstring, col3
a,one two three four,b
c,five six seven eight nine,d
When I read it with an input statement, I have to specify a length (or informat) but I want it to take the length of the longest value.
If I use proc import the length is longer than the longest value. Why?
proc import datafile = 'C:\Users\xxxx\Desktop\test.csv'
out = test3
dbms = csv
getnames = yes;
proc print; run;
proc contents; run; /* length = 15 */
select max(length(longstring)) /* = 14 */
How would I read it so that the length is 14? I've tested this on other data too, where the longest value is 24 and the length created is 26.
12-04-2015 10:20 AM
Hi, this is an interesting question. However, I was not able to replicate your result. With your sample .csv data I obtain length 25 from both PROC CONTENTS and PROC SQL -- in perfect agreement with the fact that "five six seven eight nine" is a string consisting of 25 characters. (Actually, the names of the second and third variable were prefixed by an underscore due to the leading blanks in the first line of your data.)
12-04-2015 10:27 AM
Thanks for testing it. I had shortened the text when I ran the test, but I put it back to the example and now I get:
longstring Char 29 $29. $29.
I checked for leading or trailing blanks (in notepad) and don't see any.
12-04-2015 11:02 AM
Thanks for providing the file. Using your PROC IMPORT code I still get length 25 in both cases:
longstring Char 25 $25. $25.
I have also opened the file with a hex editor and haven't found anything suspicious.
Finally, I retyped the file with Notepad and compared it to your file with a comparison tool and found no difference.
So, it's still a mystery to me what happened on your side.
12-04-2015 11:28 AM
When you provided the sample file you wrote "change the 'txt' back to 'csv'". This sounds to me as if the file you attached was not the original .csv file from your PROC IMPORT experiment. And this, in turn, would obviously open the possibility that some strange characterisic of the .csv file got lost in the process of "converting" it to a .txt file. How did you create the .txt from the .csv file?
12-04-2015 10:25 AM
Why do you need to make it length of the longest variable out of interest, it really doesn't matter that much.
As for length of the variable this would of course be defined up front in you Data Transfer Specification agreement which will detail the strcuture of the data being sent, you do have one yes? Take the length from there.
If not, then you will need to read the file, and then post process it with some logic, howevedo suggest you look at transfer agreements as a better, more robust solution:
proc sql; create table LENGTHS as select "NAME" as NAME,max(length(NAME)) as M from SASHELP.CLASS; quit; data _null_; set lengths; call execute('data want; length '||strip(name)||' $'||strip(put(m,best.))||'.; set sashelp.class; run;'); run;
12-04-2015 10:30 AM
It's really just a QC issue. We are to keep the "original" length, but of course that length is going to be different depending on how the data was read. In this case the production version and the QC version don't match.
Yes, thanks, I know I can do post-processing to change it.
For our data transfer specs, length is not specified if the data comes in a csv file.
Thanks for your INPUT.
12-04-2015 11:34 AM