BookmarkSubscribeRSS Feed
Naksh
Calcite | Level 5

Hi There,

 

I have data in csv, few observations data of ID variable got comma within it and double quoted. I tried to read data using Proc Import code and facing issue:

 

proc import datafile="~sample.csv" out=temp dbms=csv;
run;

 

data sample:

Start date, ID, name

2013-07-01,12984,"mat"

2013-07-01,"0428204,2916796","tod"

2013-07-01,"0428204,2916796","steve"

2013-07-01,98432,"mar"

 

Log:

 

ERROR: Import unsuccessful. See SAS Log for details.

NOTE: Invalid data for 'ID'n in line 2 12-27.

 

when I check the output dataset, I see missing value assigned for ID variable.

 

Is anyone facing the similar issue ? how to read numeric variable data with double quotes and comma with in the data range ?

 

Thanks,

Naksh

7 REPLIES 7
ballardw
Super User

From the appearance that should not be treated as numeric. If it were to be a number and have commas in the value I would expect to see something like:

"04,282,042,916,796"

Also leading zeroes are not typical of numeric values for most things. What actual number would you expect with reading "0428204,2916796"?

 

With Proc import I would suggest using the option: Guessingrows=max;

 

proc import datafile="~sample.csv" out=temp dbms=csv;
   guessingrows=max;
run;

that will have the procedure examine more rows of the data before assigning properties to variables and then your ID will almost certainly be character.

 

 

Or write a data step to read the data.

 

Question: why do you want Id to be numeric? Will you ever do arithmetic with the Id value (add, subtract, multiply, divide, calculate mean, sum or similar)?

Naksh
Calcite | Level 5

HI Ballardw,

 

thank you for your reply. I tried to run the code using guessingrows=any. the proc import processed and read the all records but the double quoted records read as follow:

 

2013-07-01 12,984 mat
2013-07-01 4.2820429E12 tod
2013-07-01 4.2820429E12 steve
2013-07-01 98,432 mar

 

No specific reason to read that ID column as Numeric, mostly I wont be doing any calculations on it. Proc Import it self trying it as numeric variable.

 

Thanks,

 

 

 

Tom
Super User Tom
Super User

The quotes are there because of the comma.  Without them those lines would look like they had 5 values instead of 4.  The quotes around tod and steve are not needed, but cause no harm.

 

PROC IMPORT should handle that file fine.  Did you forget to tell it to check the whole file before finalizing its guesses about what type of variable to create for each column in the CSV file?

 

So assuming you want to read a file named sample.csv that is in your home directory (instead of trying to read the home directory of a user named sample.csv) your PROC IMPORT code should look like:

proc import datafile="~/sample.csv" dbms=csv out=temp replace;
  guessingrows=max;
run;

Let's try your example lines (removing the spurious spaces before the last two column headers).

filename csv temp;
options parmcards=csv;
parmcards;
Start date,ID,name
2013-07-01,12984,"mat"
2013-07-01,"0428204,2916796","tod"
2013-07-01,"0428204,2916796","steve"
2013-07-01,98432,"mar"
;

proc import file=csv dbms=csv out=test replace ;
  guessingrows=max;
run;

Results:

Tom_0-1713986136536.png

Tom_1-1713986159563.png

Looking at the example values in the ID column of the CSV file it looks like perhaps something has removed the leading zero from the values that did not have commas.  Perhaps somewhere along the way you read the file with EXCEL or something else that misunderstood the type of values the column was supposed to contain?  You might want to add a test to add them back, otherwise you might have trouble when trying to match those ID values to others you might have in other datasets.

data test_fixed;
  set test;
  if 1 <= lengthn(id) < 7 then id=put(input(id,7.),z7.);
run;

Tom_0-1713986593840.png

 

 

 

Naksh
Calcite | Level 5

HI Tom,

 

Thank you for your reply. 

looks like the proc import with guessingrows=max is working able to read all records without missing those 20 records. but those 20 records stored with some scientific number like 4.2820429E12.

 

 

Thanks,

ballardw
Super User

@Naksh wrote:

HI Tom,

 

Thank you for your reply. 

looks like the proc import with guessingrows=max is working able to read all records without missing those 20 records. but those 20 records stored with some scientific number like 4.2820429E12.

 

 

Thanks,


The LOG from Proc Import should show a data step created to read the file.

I would:

1) copy the code from the log

2) paste into the editor

3) clean up by removing line numbers and making the code look nice

4) change the informat and format for that variable to $15. or similar

5) save that as a code file to read the data

6) run the new program.

You have now "written" a data step to read the CSV.

Some folks don't like the use of the INFORMAT to set variable properties and many of the formats aren't needed but it works (or SAS wouldn't have the Proc Import create such).

Tom
Super User Tom
Super User

Those values with commas in them are definitely NOT numeric.  

Since you are reading the CSV file into a DATASET you cannot have numeric and character values in the same variable (unlike in a spreadsheet program like EXCEL where anything goes).  So the variable will definitely need to be CHARACTER.

 

If you know the comma separated values are valid representations of numbers then you could possible post process the data to convert those comma separated values into two (or more?) numeric values.  But you will need to decide if you want to do that by making extra observations 

data test_fixed;
  set test(rename=(id=idlist));
  do index=1 to max(1,countw(idlist,','));
     id = input(scan(idlist,index,','),32.);
    output;
  end;
run;

or making extra variables.

data test_fixed;
  set test(rename=(id=idlist));
  array id [10];
  do index=1 to min(dim(id),countw(idlist,','));
     id[index] = input(scan(idlist,index,','),32.);
  end;
run;

 

Naksh
Calcite | Level 5

Hi All,

 

Thank you for your reply and suggestion. looks like the file got corrupted while downloading hence the data populated wrongly for few columns. after getting the right file, able to read all data without issues.

 

Thank you,

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 387 views
  • 2 likes
  • 3 in conversation