Hello Everyone,
I was trying to import a .csv file which had almost 37000 records.
When i tried to read them using proc import it is giving me the below note.
NOTE: Invalid data for provider_id in line 5641 36-43.
4061 proc import out = diagnosis
4062 datafile = "/Projects/data/diagnosis.csv"
4063 dbms = csv replace;
4064 delimiter=",";
4065
4066 run;
4067 /*******************************************************************
4067 ! ***
4068 * PRODUCT: SAS
4069 * VERSION: 9.4
4070 * CREATOR: External File Interface
4071 * DATE: 05AUG21
4072 * DESC: Generated SAS Datastep Code
4073 * TEMPLATE SOURCE: (None Specified.)
4074 ********************************************************************
4074 ! ***/
4075 data DIAGNOSIS ;
4076 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
4077 infile Projects/
4077 ! data/diagnosis.csv' delimiter = ',' MISSOVER DSD
4077 ! lrecl=32767 firstobs=2 ;
4078 informat patient_mrn best32. ;
4079 informat encounter_id best32. ;
4080 informat enc_type $2. ;
4081 informat dx_date $21. ;
4082 informat provider_id best32. ;
4083 informat provider_name $30. ;
4084 informat provider_title $4. ;
4085 informat dx_name $101. ;
4086 informat dx_code $8. ;
4087 informat dx_type best32. ;
4088 informat dx_source $2. ;
4089 informat dx_origin $2. ;
4090 informat pdx $1. ;
4091 informat raw_pdx $24. ;
4092 informat sourcesystem_cd $11. ;
4093 format patient_mrn best12. ;
4094 format encounter_id best12. ;
4095 format enc_type $2. ;
4096 format dx_date $21. ;
9 The SAS System
4097 format provider_id best12. ;
4098 format provider_name $30. ;
4099 format provider_title $4. ;
4100 format dx_name $101. ;
4101 format dx_code $8. ;
4102 format dx_type best12. ;
4103 format dx_source $2. ;
4104 format dx_origin $2. ;
4105 format pdx $1. ;
4106 format raw_pdx $24. ;
4107 format sourcesystem_cd $11. ;
4108 input
4109 patient_mrn
4110 encounter_id
4111 enc_type $
4112 dx_date $
4113 provider_id
4114 provider_name $
4115 provider_title $
4116 dx_name $
4117 dx_code $
4118 dx_type
4119 dx_source $
4120 dx_origin $
4121 pdx $
4122 raw_pdx $
4123 sourcesystem_cd $
4124 ;
4125 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR
4125 ! detection macro variable */
4126 run;
NOTE: The infile '/Projects/
data/diagnosis.csv' is:
Filename=/Projects/
data/diagnosis.csv,
Owner Name=s_ravi,Group Name=BIOSTATS-USERS,
Access Permission=-rwxr--r--,
Last Modified=30Jul2021:07:57:16,
File Size (bytes)=605197879
NOTE: Invalid data for provider_id in line 5641 36-43.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7
5641 00365298,,NI,"2010-03-04 00:00:00",TC110397,"STEPIEN, KIMBERLY E",MD,"
71 360.21 Progressive high (degenerative) myopia",360.21,09,FI,BI,P,"IDX
141 DX_1",DX|BI|IDX 155
patient_mrn=365298 encounter_id= enc_type=NI dx_date=2010-03-04 00:00:00
provider_id= provider_name=STEPIEN, KIMBERLY E provider_title=MD
dx_name=360.21 Progressive high (degenerative) myopia dx_code=360.21 dx_type=9
dx_source=FI dx_origin=BI pdx=P raw_pdx=IDX DX_1 sourcesystem_cd=DX|BI|IDX
_ERROR_=1 _N_=5640
NOTE: Invalid data for provider_id in line 6377 36-43.
6377 00011177,,NI,"2010-04-27 00:00:00",TC100135,"SHERMAN, TRACY L",MD,"290
71 .40 Vascular dementia, uncomplicated",290.40,09,FI,BI,P,"IDX DX_1",DX|
141 BI|IDX 146
patient_mrn=11177 encounter_id= enc_type=NI dx_date=2010-04-27 00:00:00
provider_id= provider_name=SHERMAN, TRACY L provider_title=MD
dx_name=290.40 Vascular dementia, uncomplicated dx_code=290.40 dx_type=9
dx_source=FI dx_origin=BI pdx=P raw_pdx=IDX DX_1 sourcesystem_cd=DX|BI|IDX
_ERROR_=1 _N_=6376
and the log goes on with those above notes for so many subjects.
The provider_id has values like below:
Patient_mrn Provider_id
00365298 TC110397,
00011177 TC100135,
00365298 TC110397
while other provider_id for other subjects as
provider_id
41357
15089
7682748
247974928
so the provider_id has both char and numeric values. How can I read that variable into my data?
Thanks
... View more