BookmarkSubscribeRSS Feed
Wken1122
Obsidian | Level 7

Hi,
I am importing a csv. file to SAS studio, but unsure what is the reason, it gains extra rows.
Note: on the csv file there are 5000 rows but after the import it became 5288 rows.
The following is the log;

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
NOTE: ODS statements in the SAS Studio environment may disable some output features.
62
63 /* Generated Code (IMPORT) */
64 /* Source File: ODS_CLAIM.csv */
65 /* Source Path: /folders/myfolders/Folder1 */
66 /* Code generated on: 10/9/17, 2:15 PM */
67
68 %web_drop_table(WORK.ODS_CLAIM);
69
70
71 FILENAME REFFILE '/folders/myfolders/Folder1/ODS_CLAIM.csv';
72
73 PROC IMPORT DATAFILE=REFFILE
74 DBMS=CSV
75 OUT=WORK.ODS_CLAIM;
76 GETNAMES=YES;
77 RUN;
 
NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to
WORK.PARMS.PARMS.SLIST.
78 /**********************************************************************
79 * PRODUCT: SAS
80 * VERSION: 9.4
81 * CREATOR: External File Interface
82 * DATE: 09OCT17
83 * DESC: Generated SAS Datastep Code
84 * TEMPLATE SOURCE: (None Specified.)
85 ***********************************************************************/
86 data WORK.ODS_CLAIM ;
87 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
88 infile REFFILE delimiter = ',' MISSOVER DSD firstobs=2 ;
89 informat ID $19. ;
90 informat SOURCEMODIFIED anydtdtm40. ;
91 informat SOURCECREATED anydtdtm40. ;
92 informat MODIFIEDBY $5. ;
93 informat MODIFIED anydtdtm40. ;
94 informat CREATEDBY $5. ;
95 informat CREATED anydtdtm40. ;
96 informat COVERAGEID $17. ;
97 informat SOURCESYSTEMID $5. ;
98 informat POLICYNUMBER $11. ;
99 informat PRODUCTC $5. ;
100 informat CLAIMNUMBER $12. ;
101 informat CLAIMNOTIFIEDDATE anydtdtm40. ;
102 informat STATUS $12. ;
103 informat LOSSDATE anydtdtm40. ;
104 informat OCCURENCEDATE $1. ;
105 informat LOSSTIME $2. ;
106 informat LOSSRECOVERYTYPE $2. ;
107 informat LOSSTYPE $32. ;
108 informat CAUSEOFEVENT $8. ;
109 informat TOTALPAIDAMOUNT best32. ;
110 informat TOTALOUTSTANDINGAMOUNT $1. ;
111 informat CURRENCY $2. ;
112 informat LOSSDESCRIPTION $2. ;
113 informat INTIMATEDDATE $1. ;
114 informat CLOSEDATE $1. ;
115 informat SUMMARY $2. ;
116 informat CLAIMHANDLERUSRID $2. ;
117 informat CLAIMHANDLERUSRNUM $2. ;
118 informat CLAIMHANDLERUSRNAME $2. ;
119 informat NCDAFFECTEDFLAG $2. ;
120 informat PROCESSTRANSACTIONDATE $1. ;
121 informat TRANSACTIONNUMBER $2. ;
122 informat DATEADMITTED anydtdtm40. ;
123 informat DATEOFDEATH $1. ;
124 informat PENDINGCLAIMREASON $2. ;
125 informat ADJUSTMENTCODE $2. ;
126 informat ADJUSTMENTREASON $2. ;
127 informat ADJUSTMENTAMOUNT $1. ;
128 informat DATEAPPROVED anydtdtm40. ;
129 informat DATEDISCHARGED anydtdtm40. ;
130 informat CAUSEOFDEATH $2. ;
131 informat APPROVEDAMOUNT best32. ;
132 informat REJECTEDFLG $2. ;
133 informat REJECTIONREASON $2. ;
134 informat LOSSCAUSE $2. ;
135 informat ILLNESSCODE $5. ;
136 informat ILLNESS $39. ;
137 informat PRODUCTCODE $8. ;
138 informat PRODUCT $41. ;
139 informat COVERAGENUMBER best32. ;
140 informat REPORTEDBY $2. ;
141 informat INTEGRATION_ID $23. ;
142 informat X_RECORD_CODE $6. ;
143 informat NATUREOFACCIDENT $2. ;
144 informat ACCIDENTPLACE $2. ;
145 informat ACCIDENTPLACEZIPCODE $2. ;
146 informat LOSSDESCRIPTIONCODE $2. ;
147 informat LOSSRECOVERYTYPECODE $2. ;
148 informat LOSSTYPECODE $5. ;
149 informat CAUSEOFEVENTCODE $5. ;
150 informat COVERAGEGROUPNUMBER $1. ;
151 informat REPLCLAIMNUM $2. ;
152 informat VALIDFLAG $2. ;
153 informat AGENTCOMPANYTYPE $2. ;
154 informat AGENTCODE $2. ;
155 informat AGENTID $2. ;
156 informat STATUSCODE $3. ;
157 informat SERVBRANCHID $2. ;
158 informat SERVBRANCH $2. ;
159 informat CLAIMENTRYDATE $1. ;
160 informat REGNUMBER $2. ;
161 informat CONTRACTTYPECODE $2. ;
162 informat CONTRACTTYPE $2. ;
163 informat RIMETHODCODE $2. ;
164 informat RIMETHOD $2. ;
165 informat MAJOREVENTCODE $2. ;
166 informat MAJOREVENT $2. ;
167 informat CLASSCODE $2. ;
168 informat CLASS $2. ;
169 informat LOCALITY $2. ;
170 informat REMARK01 $54. ;
171 informat REMARK02 $52. ;
172 informat SRCFILE $8. ;
173 informat REJECTIONREASONCODE $2. ;
174 informat VERSION $1. ;
175 informat HQRECEIVEDATE $1. ;
176 informat INCURREDAMOUNT best32. ;
177 informat TPACLAIMNUMBER $2. ;
178 informat TPANAME $2. ;
179 informat TPACGLNUMBER $2. ;
180 informat ANNLMTMAX $1. ;
181 informat AVAILABLEANNLMT $1. ;
182 informat LIFETIMELMT $1. ;
183 informat AVAILABLELIFETIMELMT $1. ;
184 informat ILLNESSCODE2 $2. ;
185 informat ILLNESS2 $2. ;
186 informat ILLNESSCODE3 $2. ;
187 informat ILLNESS3 $2. ;
188 informat CLAIMTYPE $2. ;
189 informat SURGICALCODE $2. ;
190 informat SURGICAL $2. ;
191 informat EXGRATIA $2. ;
192 informat APPROVEDBYUSERCODE $2. ;
193 informat APPROVEDBYUSER $2. ;
194 informat DOCTORNAME $2. ;
195 informat CHECKDIGIT $1. ;
196 format ID $19. ;
197 format SOURCEMODIFIED datetime. ;
198 format SOURCECREATED datetime. ;
199 format MODIFIEDBY $5. ;
200 format MODIFIED datetime. ;
201 format CREATEDBY $5. ;
202 format CREATED datetime. ;
203 format COVERAGEID $17. ;
204 format SOURCESYSTEMID $5. ;
205 format POLICYNUMBER $11. ;
206 format PRODUCTC $5. ;
207 format CLAIMNUMBER $12. ;
208 format CLAIMNOTIFIEDDATE datetime. ;
209 format STATUS $12. ;
210 format LOSSDATE datetime. ;
211 format OCCURENCEDATE $1. ;
212 format LOSSTIME $2. ;
213 format LOSSRECOVERYTYPE $2. ;
214 format LOSSTYPE $32. ;
215 format CAUSEOFEVENT $8. ;
216 format TOTALPAIDAMOUNT best12. ;
217 format TOTALOUTSTANDINGAMOUNT $1. ;
218 format CURRENCY $2. ;
219 format LOSSDESCRIPTION $2. ;
220 format INTIMATEDDATE $1. ;
221 format CLOSEDATE $1. ;
222 format SUMMARY $2. ;
223 format CLAIMHANDLERUSRID $2. ;
224 format CLAIMHANDLERUSRNUM $2. ;
225 format CLAIMHANDLERUSRNAME $2. ;
226 format NCDAFFECTEDFLAG $2. ;
227 format PROCESSTRANSACTIONDATE $1. ;
228 format TRANSACTIONNUMBER $2. ;
229 format DATEADMITTED datetime. ;
230 format DATEOFDEATH $1. ;
231 format PENDINGCLAIMREASON $2. ;
232 format ADJUSTMENTCODE $2. ;
233 format ADJUSTMENTREASON $2. ;
234 format ADJUSTMENTAMOUNT $1. ;
235 format DATEAPPROVED datetime. ;
236 format DATEDISCHARGED datetime. ;
237 format CAUSEOFDEATH $2. ;
238 format APPROVEDAMOUNT best12. ;
239 format REJECTEDFLG $2. ;
240 format REJECTIONREASON $2. ;
241 format LOSSCAUSE $2. ;
242 format ILLNESSCODE $5. ;
243 format ILLNESS $39. ;
244 format PRODUCTCODE $8. ;
245 format PRODUCT $41. ;
246 format COVERAGENUMBER best12. ;
247 format REPORTEDBY $2. ;
248 format INTEGRATION_ID $23. ;
249 format X_RECORD_CODE $6. ;
250 format NATUREOFACCIDENT $2. ;
251 format ACCIDENTPLACE $2. ;
252 format ACCIDENTPLACEZIPCODE $2. ;
253 format LOSSDESCRIPTIONCODE $2. ;
254 format LOSSRECOVERYTYPECODE $2. ;
255 format LOSSTYPECODE $5. ;
256 format CAUSEOFEVENTCODE $5. ;
257 format COVERAGEGROUPNUMBER $1. ;
258 format REPLCLAIMNUM $2. ;
259 format VALIDFLAG $2. ;
260 format AGENTCOMPANYTYPE $2. ;
261 format AGENTCODE $2. ;
262 format AGENTID $2. ;
263 format STATUSCODE $3. ;
264 format SERVBRANCHID $2. ;
265 format SERVBRANCH $2. ;
266 format CLAIMENTRYDATE $1. ;
267 format REGNUMBER $2. ;
268 format CONTRACTTYPECODE $2. ;
269 format CONTRACTTYPE $2. ;
270 format RIMETHODCODE $2. ;
271 format RIMETHOD $2. ;
272 format MAJOREVENTCODE $2. ;
273 format MAJOREVENT $2. ;
274 format CLASSCODE $2. ;
275 format CLASS $2. ;
276 format LOCALITY $2. ;
277 format REMARK01 $54. ;
278 format REMARK02 $52. ;
279 format SRCFILE $8. ;
280 format REJECTIONREASONCODE $2. ;
281 format VERSION $1. ;
282 format HQRECEIVEDATE $1. ;
283 format INCURREDAMOUNT best12. ;
284 format TPACLAIMNUMBER $2. ;
285 format TPANAME $2. ;
286 format TPACGLNUMBER $2. ;
287 format ANNLMTMAX $1. ;
288 format AVAILABLEANNLMT $1. ;
289 format LIFETIMELMT $1. ;
290 format AVAILABLELIFETIMELMT $1. ;
291 format ILLNESSCODE2 $2. ;
292 format ILLNESS2 $2. ;
293 format ILLNESSCODE3 $2. ;
294 format ILLNESS3 $2. ;
295 format CLAIMTYPE $2. ;
296 format SURGICALCODE $2. ;
297 format SURGICAL $2. ;
298 format EXGRATIA $2. ;
299 format APPROVEDBYUSERCODE $2. ;
300 format APPROVEDBYUSER $2. ;
301 format DOCTORNAME $2. ;
302 format CHECKDIGIT $1. ;
303 input
304 ID $
305 SOURCEMODIFIED
306 SOURCECREATED
307 MODIFIEDBY $
308 MODIFIED
309 CREATEDBY $
310 CREATED
311 COVERAGEID $
312 SOURCESYSTEMID $
313 POLICYNUMBER $
314 PRODUCTC $
315 CLAIMNUMBER $
316 CLAIMNOTIFIEDDATE
317 STATUS $
318 LOSSDATE
319 OCCURENCEDATE $
320 LOSSTIME $
321 LOSSRECOVERYTYPE $
322 LOSSTYPE $
323 CAUSEOFEVENT $
324 TOTALPAIDAMOUNT
325 TOTALOUTSTANDINGAMOUNT $
326 CURRENCY $
327 LOSSDESCRIPTION $
328 INTIMATEDDATE $
329 CLOSEDATE $
330 SUMMARY $
331 CLAIMHANDLERUSRID $
332 CLAIMHANDLERUSRNUM $
333 CLAIMHANDLERUSRNAME $
334 NCDAFFECTEDFLAG $
335 PROCESSTRANSACTIONDATE $
336 TRANSACTIONNUMBER $
337 DATEADMITTED
338 DATEOFDEATH $
339 PENDINGCLAIMREASON $
340 ADJUSTMENTCODE $
341 ADJUSTMENTREASON $
342 ADJUSTMENTAMOUNT $
343 DATEAPPROVED
344 DATEDISCHARGED
345 CAUSEOFDEATH $
346 APPROVEDAMOUNT
347 REJECTEDFLG $
348 REJECTIONREASON $
349 LOSSCAUSE $
350 ILLNESSCODE $
351 ILLNESS $
352 PRODUCTCODE $
353 PRODUCT $
354 COVERAGENUMBER
355 REPORTEDBY $
356 INTEGRATION_ID $
357 X_RECORD_CODE $
358 NATUREOFACCIDENT $
359 ACCIDENTPLACE $
360 ACCIDENTPLACEZIPCODE $
361 LOSSDESCRIPTIONCODE $
362 LOSSRECOVERYTYPECODE $
363 LOSSTYPECODE $
364 CAUSEOFEVENTCODE $
365 COVERAGEGROUPNUMBER $
366 REPLCLAIMNUM $
367 VALIDFLAG $
368 AGENTCOMPANYTYPE $
369 AGENTCODE $
370 AGENTID $
371 STATUSCODE $
372 SERVBRANCHID $
373 SERVBRANCH $
374 CLAIMENTRYDATE $
375 REGNUMBER $
376 CONTRACTTYPECODE $
377 CONTRACTTYPE $
378 RIMETHODCODE $
379 RIMETHOD $
380 MAJOREVENTCODE $
381 MAJOREVENT $
382 CLASSCODE $
383 CLASS $
384 LOCALITY $
385 REMARK01 $
386 REMARK02 $
387 SRCFILE $
388 REJECTIONREASONCODE $
389 VERSION $
390 HQRECEIVEDATE $
391 INCURREDAMOUNT
392 TPACLAIMNUMBER $
393 TPANAME $
394 TPACGLNUMBER $
395 ANNLMTMAX $
396 AVAILABLEANNLMT $
397 LIFETIMELMT $
398 AVAILABLELIFETIMELMT $
399 ILLNESSCODE2 $
400 ILLNESS2 $
401 ILLNESSCODE3 $
402 ILLNESS3 $
403 CLAIMTYPE $
404 SURGICALCODE $
405 SURGICAL $
406 EXGRATIA $
407 APPROVEDBYUSERCODE $
408 APPROVEDBYUSER $
409 DOCTORNAME $
410 CHECKDIGIT $
411 ;
412 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
413 run;
 
NOTE: The infile REFFILE is:
Filename=/folders/myfolders/Folder1/ODS_CLAIM.csv,
Owner Name=root,Group Name=vboxsf,
Access Permission=-rwxrwx---,
Last Modified=09Oct2017:14:14:26,
File Size (bytes)=3283116
 
NOTE: Invalid data for APPROVEDAMOUNT in line 3814 335-353.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
3814 mc on: 14/6/14-27/6/14,28/6/14-2/7/14,3/7/14-6/8/14,7/8/14-3/9/14,4/9/14-1/10/14,2/10/14-4/11/14,5/1
101 1/14-4/12/14, 5/12/14-8/12/14","","",,,,"",0,"Y","Selected client in claim on whom does not support
201 the selected claim type (event type).","","","","","",,"","CLS_WI201500061","1000","","","","11","",
301 "16","",,"","","","","","6","","",05/03/2015 00:00:00,"","","","","","","","","","","mva on 10/6/14 400
ID=mc on: 14/6/14-27/6 SOURCEMODIFIED=14JUN28:02:07:14 SOURCECREATED=07MAR14:06:08:14 MODIFIEDBY=7/8/1 MODIFIED=09APR14:01:10:14
CREATEDBY=2/10/ CREATED=11MAY14:04:12:14 COVERAGEID=5/12/14-8/12/14" SOURCESYSTEMID= POLICYNUMBER= PRODUCTC= CLAIMNUMBER=
CLAIMNOTIFIEDDATE=. STATUS= LOSSDATE=. OCCURENCEDATE=Y LOSSTIME=Se LOSSRECOVERYTYPE= LOSSTYPE= CAUSEOFEVENT= TOTALPAIDAMOUNT=.
TOTALOUTSTANDINGAMOUNT= CURRENCY= LOSSDESCRIPTION= INTIMATEDDATE=C CLOSEDATE=1 SUMMARY= CLAIMHANDLERUSRID= CLAIMHANDLERUSRNUM=
CLAIMHANDLERUSRNAME=11 NCDAFFECTEDFLAG= PROCESSTRANSACTIONDATE=1 TRANSACTIONNUMBER= DATEADMITTED=. DATEOFDEATH=
PENDINGCLAIMREASON= ADJUSTMENTCODE= ADJUSTMENTREASON= ADJUSTMENTAMOUNT= DATEAPPROVED=. DATEDISCHARGED=. CAUSEOFDEATH=
APPROVEDAMOUNT=. REJECTEDFLG= REJECTIONREASON= LOSSCAUSE= ILLNESSCODE= ILLNESS= PRODUCTCODE= PRODUCT= COVERAGENUMBER=.
REPORTEDBY= INTEGRATION_ID= X_RECORD_CODE="mva o NATUREOFACCIDENT= ACCIDENTPLACE= ACCIDENTPLACEZIPCODE= LOSSDESCRIPTIONCODE=
LOSSRECOVERYTYPECODE= LOSSTYPECODE= CAUSEOFEVENTCODE= COVERAGEGROUPNUMBER= REPLCLAIMNUM= VALIDFLAG= AGENTCOMPANYTYPE=
AGENTCODE= AGENTID= STATUSCODE= SERVBRANCHID= SERVBRANCH= CLAIMENTRYDATE= REGNUMBER= CONTRACTTYPECODE= CONTRACTTYPE=
RIMETHODCODE= RIMETHOD= MAJOREVENTCODE= MAJOREVENT= CLASSCODE= CLASS= LOCALITY= REMARK01= REMARK02= SRCFILE=
REJECTIONREASONCODE= VERSION= HQRECEIVEDATE= INCURREDAMOUNT=. TPACLAIMNUMBER= TPANAME= TPACGLNUMBER= ANNLMTMAX=
AVAILABLEANNLMT= LIFETIMELMT= AVAILABLELIFETIMELMT= ILLNESSCODE2= ILLNESS2= ILLNESSCODE3= ILLNESS3= CLAIMTYPE=
SURGICALCODE= SURGICAL= EXGRATIA= APPROVEDBYUSERCODE= APPROVEDBYUSER= DOCTORNAME= CHECKDIGIT= _ERROR_=1 _N_=3813
NOTE: 5288 records were read from the infile REFFILE.
The minimum record length was 1.
The maximum record length was 1148.
NOTE: The data set WORK.ODS_CLAIM has 5288 observations and 107 variables.
NOTE: DATA statement used (Total process time):
real time 0.09 seconds
cpu time 0.08 seconds
 
 
Errors detected in submitted DATA step. Examine log.
5288 rows created in WORK.ODS_CLAIM from REFFILE.
 
 
 
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.25 seconds
cpu time 0.23 seconds
 
414
 
 
415 PROC CONTENTS DATA=WORK.ODS_CLAIM; RUN;
 
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.17 seconds
cpu time 0.16 seconds
 
 
416
417
418 %web_open_table(WORK.ODS_CLAIM);
419
420 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
433
9 REPLIES 9
Wken1122
Obsidian | Level 7

Oh, and after checking the csv file, I don't see any error in the mentioned rows or columns in the log( for APPROVEDAMOUNT).

ChrisBrooks
Ammonite | Level 13

Things seem to be going wrong before SAS reaches APPROVEDAMOUNT - the VALUE for LOSSTIME is truncated to two characters and I would expect a variable with the word "time" in its name to be a numeric variable anyway, not the character variable indicated by the input statement. There are a few other variables which look odd to me too in terms of their name and data type so I'm wondering if there was some pre-processing done which has caused the variable names and data to get misaligned...

Wken1122
Obsidian | Level 7

The data are exported out from Oracle SQL Developer and no preprocessing are done.

ballardw
Super User

@Wken1122 wrote:

Oh, and after checking the csv file, I don't see any error in the mentioned rows or columns in the log( for APPROVEDAMOUNT).


How did you look at the file? NEVER verify csv data only with a spreadsheet program (Excel?) because they will assume lots of things about your data and sometimes hide things depending on the source.

Look at the data in a plain text editor like Notepad.

 

A common thing that Excel will hide are embedded linefeed characters that when you look at them in a text editor will look something like:

"this value",1234,567,"another value
with linefeeds",10/10/27

Excel will likely show that as one line but a basic proc import or data step generated is likely to see that as two lines and the numeric value runs into the / in the date and SAS reports Invalid data.

 

 

I might try modifying your proc import generated data step to incorporate the TERMSTR option, either TRMSTR=CRLF or TRMSTR=LF, one of which may ignore the extra line feed or carriage return if present as above.

 

Or have the ORACLE process strip CR or LF characters that are part of text fields and replace them with simple blanks.

Kurt_Bremser
Super User

Do NOT rely on proc import for data like this.

See that it tries to read

28/6/14-2/7/14

as a single datetime value with format anydtdtm40.

Such a field needs to be read as string, and then dissected into a start and end date value.

Either have such (not very clever) values fixed in the infile, or write a data step yourself that deals with it.

Wken1122
Obsidian | Level 7

But that line is a comment line(explanation of reasons) in the excel file.

Patrick
Opal | Level 21

@Wken1122

Use guessingrows=max for Proc Import to first analyze all your data before defining the column types and lengths.

PROC IMPORT DATAFILE=REFFILE
  DBMS=CSV
  OUT=WORK.ODS_CLAIM;
  GETNAMES=YES;
  GUESSINGROWS=MAX;
RUN;

IF you've got SAS/Access to Oracle or SAS/Access to ODBC licensed then consider to directly and programmatically access the table in Oracle and not to manually copy/paste some text via Oracle Developer to SAS.

Wken1122
Obsidian | Level 7

Unfortunately I don't have the license for it. And guessingrows only works after I removed all the columns that are not included into the next step.
Anyway thank you for your kind reply!

Wken1122
Obsidian | Level 7

At the end, after i cleaned the data and try to import it again, it works fine.
Thank you for all your replies!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1066 views
  • 1 like
  • 5 in conversation