Why are you using PROC IMPORT to read a delimited text file? One with only 14 variables.
Just write your own data step to read the file.
Then you won't define the variables, like PATIENT_MRN and PROVIDER_ID, using the wrong type. MRN is an identifier, so make it a character variable. The average medical record number has no meaning. Plus the values you are showing have leading zeros that probably want to preserve.
Something like this.
data DIAGNOSIS ;
infile "/Projects/data/diagnosis.csv" dsd truncover firstobs=2;
length
patient_mrn $10
encounter_id $20
enc_type $2.
dx_date 8
provider_id $20
provider_name $30
provider_title $10
dx_name $200
dx_code $8
dx_type $2
dx_source $2
dx_origin $2
pdx $2
raw_pdx $30
sourcesystem_cd $20
;
informat dx_date anydtdte.;
format dx_date yymmdd10.;
input patient_mrn -- sourcesystem_cd ;
run;
You should define the lengths of the character variables based on the documentation of the maximum length they need. But if you don't have such documentation then you could analysis the whole file yourself to figure out the longest string in each field.
data _null_ ;
infile "/Projects/data/diagnosis.csv" dsd truncover firstobs=2 end=eof;
array lengths[14];
do col=1 to 14;
input string :$32767. @;
langths[col]=max(lengths[col],lengthn(string));
end;
if eof then do col=1 to 14 ;
put col= lengths[col] ;
end;
run;
... View more