DATA Step, Macro, Functions and more

Reading csv data, var length

Reply
Contributor
Posts: 74

Reading csv data, var length

Hi all,

 

Say I have this data in a csv file (C:\Users\xxxx\Desktop\test.csv):

 

/*
co11, longstring, col3
a,one two three four,b
c,five six seven eight nine,d
*/

 

When I read it with an input statement, I have to specify a length (or informat) but I want it to take the length of the longest value. 

If I use proc import the length is longer than the longest value.  Why?  

 

proc import datafile = 'C:\Users\xxxx\Desktop\test.csv'
out = test3
dbms = csv
replace;
getnames = yes;
run;


proc print; run;
proc contents; run;            /* length = 15 */ 

 

proc sql;
select max(length(longstring))             /* = 14 */ 
from test3;
quit;

 

How would I read it so that the length is 14?  I've tested this on other data too, where the longest value is 24 and the length created is 26. 

 

Thanks. 

 

Trusted Advisor
Posts: 1,115

Re: Reading csv data, var length

Hi, this is an interesting question. However, I was not able to replicate your result. With your sample .csv data I obtain length 25 from both PROC CONTENTS and PROC SQL -- in perfect agreement with the fact that "five six seven eight nine" is a string consisting of 25 characters. (Actually, the names of the second and third variable were prefixed by an underscore due to the leading blanks in the first line of your data.)

Contributor
Posts: 74

Re: Reading csv data, var length

Thanks for testing it.  I had shortened the text when I ran the test, but I put it back to the example and now I get: 

 

longstring Char 29 $29. $29.

 

longest value:
25

 

I checked for leading or trailing blanks (in notepad) and don't see any.

Trusted Advisor
Posts: 1,115

Re: Reading csv data, var length

Can you perhaps attach the small sample .csv file, so that we can check if the same discrepancies occur using our SAS installations?

Contributor
Posts: 74

Re: Reading csv data, var length

[ Edited ]

Here it is.  change the 'txt' back to 'csv'. 

 

Thanks again for checking.

 

 

Trusted Advisor
Posts: 1,115

Re: Reading csv data, var length

Thanks for providing the file. Using your PROC IMPORT code I still get length 25 in both cases:

longstring Char 25 $25. $25.

 

I have also opened the file with a hex editor and haven't found anything suspicious.

 

Finally, I retyped the file with Notepad and compared it to your file with a comparison tool and found no difference.

 

So, it's still a mystery to me what happened on your side.

 

 

Trusted Advisor
Posts: 1,115

Re: Reading csv data, var length

Just an idea: Could it be a DBCS issue?

What do you get from %put %sysfunc(getoption(dbcs)); ?

Contributor
Posts: 74

Re: Reading csv data, var length

NODBCS

Trusted Advisor
Posts: 1,115

Re: Reading csv data, var length

Ok, thanks.

 

When you provided the sample file you wrote "change the 'txt' back to 'csv'". This sounds to me as if the file you attached was not the original .csv file from your PROC IMPORT experiment. And this, in turn, would obviously open the possibility that some strange characterisic of the .csv file got lost in the process of "converting" it to a .txt file. How did you create the .txt from the .csv file?

Contributor
Posts: 74

Re: Reading csv data, var length

I just copied the file and renamed it with the txt extension, since you can't attach csv files. 

Contributor
Posts: 74

Re: Reading csv data, var length

Ok, thanks for trying it out. 

Super User
Super User
Posts: 7,392

Re: Reading csv data, var length

Why do you need to make it length of the longest variable out of interest, it really doesn't matter that much.

 

As for length of the variable this would of course be defined up front in you Data Transfer Specification agreement which will detail the strcuture of the data being sent, you do have one yes?  Take the length from there.

 

If not, then you will need to read the file, and then post process it with some logic, howevedo suggest you look at transfer agreements as a better, more robust solution:

 

proc sql;
  create table LENGTHS as
  select "NAME" as NAME,max(length(NAME)) as M from SASHELP.CLASS;
quit;

data _null_;
  set lengths;
  call execute('data want; length '||strip(name)||' $'||strip(put(m,best.))||'.; set sashelp.class; run;');
run;
Contributor
Posts: 74

Re: Reading csv data, var length

It's really just a QC issue.  We are to keep the "original" length, but of course that length is going to be different depending on how the data was read.  In this case the production version and the QC version don't match.  

 

Yes, thanks, I know I can do post-processing to change it.

 

For our data transfer specs, length is not specified if the data comes in a csv file.  

 

Thanks for your INPUT.

 

 

Super User
Posts: 17,724

Re: Reading csv data, var length

If this is a file that is going to be read across a testing and prod environment it seems worth it to me to write a data step to import the data, rather than rely on Proc Import.

The data step will allow you to specify the lengths as desired.

I think the expectation that PROC IMPORT will produce the same results across different environments is a big assumption and can't be trusted.
Contributor
Posts: 74

Re: Reading csv data, var length

Yes, specifying the length is what I'm pushing for.  And it would be consistent across projects. 

Ask a Question
Discussion stats
  • 16 replies
  • 491 views
  • 0 likes
  • 5 in conversation