Dear community,
Part of the CSV file looks like below:
9001 | no | 2018-01-23 | 18:18 | multiphase or dynamic |
9002 | no | 2018-01-31 | 16:49 | multiphase or dynamic |
9003 | no | 2018-02-06 | 11:03 | multiphase or dynamic |
9004 | no | 2018-03-20 | 17:58 | multiphase or dynamic |
9005 | no | 2018-04-28 | 10:44 | multiphase or dynamic |
9006 | no | 2018-05-02 | * | single phase |
9007 | no | 2018-07-13 | 10:28 | multiphase or dynamic |
10001 | no | 2018-01-12 | 13:04 | multiphase or dynamic |
10002 | no | 2018-05-17 | 17:09 | multiphase or dynamic |
10003 | no | * | * | |
10004 | no | 2018-06-26 | 9:49 | multiphase or dynamic |
11001 | no | 2018-02-23 | 16:38 | multiphase or dynamic |
11002 | no | 2018-03-06 | 9:43 | multiphase or dynamic |
11003 | no | 2018-04-19 | 8:28 | multiphase or dynamic |
11004 | no | 2018-04-19 | 13:23 | multiphase or dynamic |
proc import datafile="C:\Users\sasuser\sam.csv"
out=aolc dbms=csv replace ; getnames=yes;run;
Log show the message below:
NOTE: Invalid data for prstdtc_cta_time in line 169 56-56.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+
169 incomplete,2,1824R0089007,30,8,2,09006,no,,,2018/05/02,*,single phase,left,Unchecked,
86 Unchecked,Unchecked,Checked,Unchecked,Unchecked,Unchecked,Unchecked,Unchecked,Uncheck
171 ed,Unchecked,Unchecked,good,Help,incomplete,18/06/13 13:30:56,18/07/12 16:09:41, 250
DFSTATUS=incom DFVALID=2 DFRASTER=1824R0089007 DFSTUDY=30 DFPLATE=8 DFSEQ=2 usubjid=9006
proccur_mrascan=no prstdtc_dt= prstdtc_time= prstdtc_cta_dt=2018-05-02 prstdtc_cta_time=.
prtrt_cta=single phase prlat_occluson=left faloc_icat=Unchecked faloc_ical=Unchecked
faloc_m3distal=Unchecked faloc_proximal=Checked faloc_bothm2=Unchecked faloc_vrtbrslr=Unchecked
faloc_distalm1=Unchecked faloc_oneofm2=Unchecked faloc_aca=Unchecked faloc_midm1=Unchecked
faloc_trifrctn=Unchecked faloc_pca=Unchecked faorres_colatrls=good Help=Help DFSCREEN=incom
DFCREATE=18JUN13:13:30:56 DFMODIFY=18JUL12:16:09:41 VAR32= _ERROR_=1 _N_=168
NOTE: Invalid data for prstdtc_cta_dt in line 173 45-45.
173 incomplete,1,1821R006R007,30,8,2,10003,no,,,*,,*,*,Unchecked,Unchecked,Unchecked,Unch
86 ecked,Unchecked,Unchecked,Unchecked,Unchecked,Unchecked,Unchecked,Unchecked,Unchecked
171 ,*,Help,incomplete,18/05/26 15:14:59,18/07/19 07:51:55, 225
DFSTATUS=incom DFVALID=1 DFRASTER=1821R006R007 DFSTUDY=30 DFPLATE=8 DFSEQ=2 usubjid=10003
proccur_mrascan=no prstdtc_dt= prstdtc_time= prstdtc_cta_dt=. prstdtc_cta_time=. prtrt_cta=*
prlat_occluson=* faloc_icat=Unchecked faloc_ical=Unchecked faloc_m3distal=Unchecked
faloc_proximal=Unchecked faloc_bothm2=Unchecked faloc_vrtbrslr=Unchecked
faloc_distalm1=Unchecked faloc_oneofm2=Unchecked faloc_aca=Unchecked faloc_midm1=Unchecked
faloc_trifrctn=Unchecked faloc_pca=Unchecked faorres_colatrls=* Help=Help DFSCREEN=incom
DFCREATE=18MAY26:15:14:59 DFMODIFY=18JUL19:07:51:55 VAR32= _ERROR_=1 _N_=172
ERROR: Import unsuccessful. See SAS Log for details.
IS there a way to fix this?
Thanks!
Nothing you show would actually cause the import to "fail".
Since you don't show data with commas, you are not actually displaying CSV for your example but almost certainly a spreadsheet interpretation of your source. That really isn't that helpful.
Did the log contain the data step that proc import should create to read a CSV file?
You might try copying that data step into the editor and examining the code closely for details or posting it here, preferably in a code box opened using the {i} or "running man" icons.
It is extremely likely that the informat assigned for your date variable was a date informat and * is invalid data for dates or numeric values, which is what generates the invalid data message. That is normal and expected for such.
You could modify the data step code to use a custom informat that you would have to create to explicity assign * to missing to avoid those messages.
Something like:
proc format library=work; invalue dateasterisk '*'= . other= [yymmdd10.] ; invalue timeasterisk '*'=. other = [time6.] ; run; data example; informat id $5. a $3. date dateasterisk. time timeasterisk.; format date yymmdd10. time time6.; input id a date time; datalines; 9001 no 2018-01-23 * 9002 yes * 16:49 ; run;
Actually every time I run into data with values like no/yes, Y/N, T/F or similar I create a custom format to read that into numeric 1/0 where 1 is Yes, Y, T, True etc. as analysis is much cleaner in general.
There aren't a lot of options in PROC IMPORT. I think the easiest fix is to use a text editor and change the CSV file so that the asterisk is changed to a space. Or write your own data step to read the file instead of PROC IMPORT (which is more complicated, but is also more general if you are going to be getting a lot of these CSV files with asterisks)
What does your import agreement state? If col4 is supposed to be a time variable and your getting * in there, then that is a data fail and the data should be returned to the sender. The same for date.
Avoid using proc import, it is a guessing procedure - i.e. it tries to guess your data. In this case it is assuming that those variables are dates, or times respectively, and thus when it encounters a * it falls over - correctly as dates/times cannot be *. You could write a datastep to import this data, setting the formats of those two variables to character, but then you would either have them as character or need to convert to numeric data or time.
And as you will respond with "I have no data agreement", trying to fix it will lead you in circles, maybe next time it will be . rather than *, or maybe something else - this is why an agreement is so important.
Thank you both!
In the database whenever the value was removed/deleted, the value changed to ' * ', which I don't have much control over it.
then you really need to write your own DATA step import code to handle asterisks and other types of invalid data
Nothing you show would actually cause the import to "fail".
Since you don't show data with commas, you are not actually displaying CSV for your example but almost certainly a spreadsheet interpretation of your source. That really isn't that helpful.
Did the log contain the data step that proc import should create to read a CSV file?
You might try copying that data step into the editor and examining the code closely for details or posting it here, preferably in a code box opened using the {i} or "running man" icons.
It is extremely likely that the informat assigned for your date variable was a date informat and * is invalid data for dates or numeric values, which is what generates the invalid data message. That is normal and expected for such.
You could modify the data step code to use a custom informat that you would have to create to explicity assign * to missing to avoid those messages.
Something like:
proc format library=work; invalue dateasterisk '*'= . other= [yymmdd10.] ; invalue timeasterisk '*'=. other = [time6.] ; run; data example; informat id $5. a $3. date dateasterisk. time timeasterisk.; format date yymmdd10. time time6.; input id a date time; datalines; 9001 no 2018-01-23 * 9002 yes * 16:49 ; run;
Actually every time I run into data with values like no/yes, Y/N, T/F or similar I create a custom format to read that into numeric 1/0 where 1 is Yes, Y, T, True etc. as analysis is much cleaner in general.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.