Solved: Problem to import several CSV files

christinagting0 · Posted 11-28-2016 02:52 PM

I'm trying to proc import a csv file but for some reason I'm getting an error in the log. I've imported CSV files before so I'm not sure what the problem is.

If I convert my csv file to an excel file, I can successfully import the file for some reason.

When I try to import as a csv this is the code I am using:

proc import out=BN_CSV
datafile='xxxx'
dbms=csv replace;
getnames=yes; 
guessingrows=200;
run;

When I run this code I get this in the log:

10699  proc import out=BN_CSV
10700      datafile='xxxxx'
10701      dbms=csv replace;
10702      *Optional statements are below;
10703      getnames=yes;
10704      guessingrows=200;
10705  run;


ERROR: Import unsuccessful.  See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           1.47 seconds
      cpu time            1.45 seconds

Sorry for the long code...I don't understand what the problem is! Can someone tell me??

ballardw · Posted 11-28-2016 06:44 PM

@christinagting0 wrote:

Hey Balardw-thanks for replying and going through my giant log!

So I guess I didn't realize that there are multiple issues in the first place. I'm very new to SAS and it's diffifult for me to parse out things in the log into identifyable issues.

So it seems from your post there are 2 issues?

1. Duplicate variables-this is fine as I understand why there might be duplicate variables and it's ok that SAS is assigned 2, 3, 4 after the variable name-but will SAS add these suffixes consistently for my other datasets that I am trying to import? If SAS does it consistently then when I try to merge I can be confident that the correct duplicate variables are matched up with the corresponding duplicate variables in the other dataset

The variable names will be the same. However the variable types may not and the lengths of character variables will almost always mismatch. Note these informat statements from your code:

informat MED_REC_PMED1 $37. ;

informat MED_REC_PMED2 $20. ;

That says the first medication was set to a length of 37 characters and the second to 20. If in the other file the maximum lengths of the recorded values are different then when you combine them there will be problems with cutting one to fit with the other (truncation).

However the fix is very easy. the code part that you pasted can be pasted into your editor, remove the line numbers, and edit the like variables to have the same length. You actually should have a document describing your source file but you can get by examing the lengths and then adding a little fudge factor. I would likely make all of the MED_REC_PMED variable $50. but if you suspect longer you can set that to 60 or what ever. Identify all of the similar variables and have the same lengths assigned. It is just text in the editor so is easy. SAVE the file. Rerun the program with your new lengths. Then to read the other file(s) change the INFILE statement to the file you want to read and the data set name to reflect what you want.

Note that all of the $1. likely indicate blank data and you may have to make a good guess.

Your varaibles coming in as VAR157 most likely means that there were no column headers for those at all.

BEST are used to read numerics are very generic. You may want to double check any of the date informats MONYY7. to be sure that you expect data with like JAN2016.

The data issue usually means you need to look at your own data, usually with a Text editor such as Notepad.

I suspect that the record you are reading with problems is misbehaving due to this bit of your data :

g,Od,5 years ,11,Vit e ,1000,Od, ',,Vit b 12,,Od, ',,Magnesium,,Od

Note the single quote marks. The parser is treating all fo the text between them as a single value (look in the documentation about DSD on the infile statement). You can see that in the part that shows this:

MED_REC_OTHERMEDS_START02=,,Vit b 12,,Od,

This likely need to be corrected on the input file. Find out why there are fields with a single quote in them. If you cannot find values to replace them with then easiest would be to delete the ' in those fields.

The actual error shows up because that bit inside the quotes with the commas means that 5 variables worth of fields were read into that one and so the values assigned to every varaible after that is offset (and wrong). Since most of them were character they take the characters but the specific errors expected numbers and ran into Erythromycin, and BN.

View solution in original post

ballardw · Posted 11-28-2016 03:44 PM

Which issue are you asking about? The error is only related to the content of one variable for one record where the program is attempting to read Erythromycin,BN 651 as the numeric variables indicated by:

NOTE: Invalid data for MED_REC_ALCOHOL_QUALIFYING in line 163 637-648.
NOTE: Invalid data for MED_REC_ALCOHOL_FREQUENCY in line 163 650-651.

Possibly that line is missing a couple of delimiters or has a few where it shouldn't.

The comment about the duplicates is telling you that your original column headings have duplicate text. So SAS added 2, 3, 4 etc to some of them to create a valid non-duplicated column heading.

At a guess I would say that your original data has the column heading: "MED REC PMED DOSE" nine times in the data. Note the pattern of repeated variables. You have a PMED, DOSE, FREQ, START and Reason. I would guess that PMED is a medication and the others are the related fields. And may for some of your further analysis benefit from transforming into a single record per medication set (with sufficient identification information maintainted).

christinagting0 · Posted 11-28-2016 03:54 PM

Hey Balardw-thanks for replying and going through my giant log!

So I guess I didn't realize that there are multiple issues in the first place. I'm very new to SAS and it's diffifult for me to parse out things in the log into identifyable issues.

So it seems from your post there are 2 issues?

1. Duplicate variables-this is fine as I understand why there might be duplicate variables and it's ok that SAS is assigned 2, 3, 4 after the variable name-but will SAS add these suffixes consistently for my other datasets that I am trying to import? If SAS does it consistently then when I try to merge I can be confident that the correct duplicate variables are matched up with the corresponding duplicate variables in the other dataset

2. As for the issue: NOTE: Invalid data for MED_REC_ALCOHOL_QUALIFYING in line 163 637-648. I'm not sure how to solve this?

ballardw · Posted 11-28-2016 06:44 PM