Bug in Code Related to CSV file

Accepted Solution Solved
Reply
Super Contributor
Posts: 418
Accepted Solution

Bug in Code Related to CSV file

Hello everyone. I have the following piece of code, which I use to read in the FIRST ROW of any given csv file, and then create a dataset which contains the names within the columns of that file (in the order they are found in. So column A=1, B=2, etc..

The problem I am having is that on some csv files, it will run for an extremely long time, and the log file will show the following 'notes' when running.

_N_=9656 _ERROR_=0 varnumcsv=. name=

_N_=9657 _ERROR_=0 varnumcsv=. name=

_N_=9658 _ERROR_=0 varnumcsv=. name=

_N_=9659 _ERROR_=0 varnumcsv=. name=

_N_=9660 _ERROR_=0 varnumcsv=. name=

It almost looks like the process is attemping to read every single row of the csv file and not simply the first row (like it is ignorning the obs=1 statement in the code).

Would anyone happen to know what is going wrong with this piece of code? I see no logical reason for it to do this!

Thanks!

data testme;

infile "G:\myfolder\Mymadeupfile.csv"

  dlm=','  lrecl=32000 dsd obs=1;

length varnumcsv 8. name $8000.;

input name $ @@;

put _all_;

;

varnumCSV=_n_;

run;

If there is a logical problem in this code could someone please show an alternative method to grab the names of the first rows of a csv file (without reading in the entire file). I tried some other methods but they run into problems with special characters like ', commas, etc..

Thanks!


Accepted Solutions
Solution
‎05-07-2014 12:30 AM
Respected Advisor
Posts: 4,934

Re: Bug in Code Related to CSV file

Posted in reply to Anotherdream

Wouldn't this be safer:

data names;

length name $32;

infile "G:\myfolder\Mymadeupfile.csv"

     lrecl=32000 dsd obs=1 length=L column=C;

do until(C>L);

  input name @;

  output;

  end;

run;

PG

PG

View solution in original post


All Replies
Respected Advisor
Posts: 3,799

Re: Bug in Code Related to CSV file

Posted in reply to Anotherdream

How many commas do you have on that first row.  I think some of them may be null fields.

More the PUT _ALL_; to after the assignment of varnum.

Super User
Posts: 11,343

Re: Bug in Code Related to CSV file

Posted in reply to Anotherdream

Any real reason you are using @@? If you only want to read the first line why hold the input pointer?

Respected Advisor
Posts: 3,799

Re: Bug in Code Related to CSV file

Because each name is read into an observation using the data step loop.  Without @@ there would only be one name read and one observation output.

Solution
‎05-07-2014 12:30 AM
Respected Advisor
Posts: 4,934

Re: Bug in Code Related to CSV file

Posted in reply to Anotherdream

Wouldn't this be safer:

data names;

length name $32;

infile "G:\myfolder\Mymadeupfile.csv"

     lrecl=32000 dsd obs=1 length=L column=C;

do until(C>L);

  input name @;

  output;

  end;

run;

PG

PG
Respected Advisor
Posts: 3,799

Re: Bug in Code Related to CSV file

There is no need to complicate the program with options that don't address the issue.  Using @@ in this way does work as expected it expect has a whole bunch of commas on line one of "Mymadeupfile.csv".  Or the OP may not be showing the real code or tell the whole story.

As written the program reads every comma delimited field to the end-of-line which is also end-of-file with OBS=1 INFILE statement option.

Adding the code to stop or not output the name when it is missing may or may not be helpful but makes to difference to the use of @@.

filename FT15F001 temp;
data testme;
   infile FT15F001 dlm=',' dsd obs=1 lrecl=32000;
  
length varnumcsv 8 name $32;
  
input name $ @@; 
   varnumCSV=_n_;
   put _all_;
  
parmcards;
name1,name,name3,,,,,,,,,,,,,
name1,name,name3
name1,name,name3
name1,name,name3
name1,name,name3
name1,name,name3
;;;;
run;
Super Contributor
Posts: 418

Re: Bug in Code Related to CSV file

Posted in reply to data_null__

Wow hello everyone thanks for all the replies! PG stats I will try your methodolgy but I want t oadress some of the points here!

_null_ as always you are extremely helpful, however I am 100% showing the entire code and the file does not have trailing commas (I don't think it does.. when I open it in txt editor it doesnt' show any). I am going to actually include the file below so you guys can try to run it and see if you run into the same problem!

Peter.C I actually tried this EXACT if clause, and it actually seemed to make no difference, and the file still has the same problem. I was confused as to why this would be as well.

PGStats your methodology does seem to work! I actually don't understand it in any way however so I will do some research into the length and column options (first time i've ever seen them!) If you want to expand on what it's doing feel free, but I will mark your answer as the correct one in the future!

Althoutht _null_ I agree with you that your methodology should ALSO work, so i'm curious to see if you get the same bug on this file!

For some background i've been using this exact code to read the names of files for ~1 year, and this file was the first time i've run into this problem.

Attachment
Respected Advisor
Posts: 3,799

Re: Bug in Code Related to CSV file

Posted in reply to Anotherdream

I copied your file to my /home and ran it and it worked fine.  It ended when it was suppose to.  I'm out of ideas (didn't take long) :smileyconfused:.


options generic=1;

filename FT15F001 '~/MyMadeUpFile.csv';
data testme;
   infile FT15F001 dlm=',' dsd obs=1 lrecl=32000;
  
length varnumcsv 8 name $8000;
  
input name $ @@; 
   varnumCSV=_n_;
   put _all_;
  
run;
options generic=0;

19         filename FT15F001 '~/MyMadeUpFile.csv';
20         data testme;
21            infile FT15F001 dlm=',' dsd obs=1 lrecl=32000;
22            length varnumcsv 8 name $8000;
23            input name $ @@;
24            varnumCSV=_n_;
25            put _all_;
26            run;

NOTE:
The infile FT15F001 is:
      (system-specific pathname),
      (system-specific file attributes)

varnumcsv=
1 name=FakeColumn _ERROR_=0 _N_=1
varnumcsv=
2 name=FakeColumn1 _ERROR_=0 _N_=2
varnumcsv=
3 name=FakeColumn3 _ERROR_=0 _N_=3
varnumcsv=
4 name=FakeColumn4 _ERROR_=0 _N_=4
varnumcsv=
5 name=FakeColumn5 _ERROR_=0 _N_=5
varnumcsv=
6 name=FakeColumn6 _ERROR_=0 _N_=6
varnumcsv=
7 name=FakeColumn7 _ERROR_=0 _N_=7
varnumcsv=
8 name=FakeColumn8 _ERROR_=0 _N_=8
varnumcsv=
9 name=FakeColumn9 _ERROR_=0 _N_=9
varnumcsv=
10 name=FakeColumn10 _ERROR_=0 _N_=10
NOTE:
1 record was read from the infile (system-specific pathname).
      The minimum record length was
120.
      The maximum record length was
120.
NOTE: SAS went to a new line when INPUT statement reached past the end of a
line.
NOTE: The data set WORK.TESTME has
10 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time          
0.01 seconds
      cpu time           
0.02 seconds
Super Contributor
Posts: 418

Re: Bug in Code Related to CSV file

Posted in reply to data_null__

Wierd.. I wonder if there is some kind of global system setting that I messed up somehow?

I tried running it on my co-workers computer and it seemed to work... So something is clearly going on different on the machines and not the code.

I will use PG stats methodology until we can perhaps look into my install or system option differences (since it works on both machines)!

Thanks so much everyone for your help!

Respected Advisor
Posts: 3,799

Re: Bug in Code Related to CSV file

Posted in reply to Anotherdream

Can you show the full log were the program doesn't work as expected?

Trusted Advisor
Posts: 3,215

Re: Bug in Code Related to CSV file

Posted in reply to Anotherdream

I can understand anotherdream's question/remark reading: http://support.sas.com/documentation/cdl/en/lestmtsref/63323/HTML/default/viewer.htm#n1rill4udj0tfun...

The obs is documented as the inputrecord is being the same as the automatic observation count in the resulting dataset. But that is not necessary equal.

An input record can result in many obs in a SAS-dataset or many input records can result in one obs in SAS dataset. Confusing as the obs using infile looks to be defined different


The observed behavior of anotherdream is not a wanted logical situation. It looks to have lost the counting of processed input records.

It could be bypassed when using a dropover/missover option, when that has been caused by an automatic proceeding to the next input record.     

The stopping condition on input records using the obs-option according the doc is not working. Really a bug.

---->-- ja karman --<-----
Valued Guide
Posts: 2,177

Re: Bug in Code Related to CSV file

Posted in reply to Anotherdream

provides a solution, so all I can offer is tbat your code provides no way to stop the data step because that trailing @@ holds the input buffer from iteration to iteration of the datastep.

Without using the LENGTH= and COLUMN= options on that INFILE statement your original code might have been effective if you add the line 

IF NAME =' ' THEN STOP ;

instead of that PUT statement.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 311 views
  • 0 likes
  • 6 in conversation