Hi @sasuser123
When dealing with "dirty" files is is a good idea to read data as text variables, so everything is read in and nothing lost in automatic conversion to numeric. If the demimiters are in place with no extraneus delimiters in fields, and all variables are supposed to be numeric, I will recommend that all variables are read with informat $char32. and converted to numeric in a following step. That gives full control over the cleaning process.
Step for reading your example data:
data work.patients;
infile datalines dlm=",";
informat patient_id 8 cResult_1-cResult_3 $char32.;
input patient_id cResult_1-cResult_3 ;
datalines;
1, =70, 65, 82
2, 88, =75, 79ug/dl
3, 64, 72,=80
;
run;
When data are read in, all variables can be processed with an array. I have made 3 examples. The first uses the same cleaning algoritm as your code, or rather an algoritm that gives the same result without having to specify the diferent types of non-numeric data. The idea is that if notdigit(variable) is 0, the variable has a contaet that can be converted to numerig, otherwise it is recoded to 1.So four values out of nine in your example data is recoded to 1, while the rest are converted to a numeric variable.
data work.result1;
set work.patients;
drop cResult_1-cResult_3;
array cres $ cResult_1-cResult_3;
array res Result_1-Result_3;
do i = 1 to 3;
if notdigit(compress(cres{i},' ')) then res{i} = 1;
else res{i} = input(left(cres{i}),best32.);
end;
run;
The two other eaxmples are included to demonstrate how a more complicated cleaning algoritm could be applied with a few lines of code. Number two tries to preserve as many values as possible by ignoring extraneus information (i.e. all characters that are not digits) and converting the digits in all nine values to numeric variables. That works only if all digits represent valid values, so e.g. 79ug/dl contains the valid value 79.
Version 3 preserves all values with digits only and an optional wqual sign (=), assuming that the values with the equal sign removed represent valid values, while values with other extraneus information should be recoded to 1. This could be refines further if needed.
data work.result2;
set work.patients;
drop cResult_1-cResult_3;
array cres $ cResult_1-cResult_3;
array res Result_1-Result_3;
do i = 1 to 3;
res{i} = input(left(compress(cres{i},,'dk')),best32.);
end;
run;
data work.result3;
set work.patients;
drop cResult_1-cResult_3;
array cres $ cResult_1-cResult_3;
array res Result_1-Result_3;
do i = 1 to 3;
if notdigit(compress(cres{i},' ')) then do;
if notdigit(compress(cres{i},'= ')) = 0 then res{i} = input(left(compress(cres{i},'=')),best32.);
else res{i} = 1;
end;
else res{i} = input(left(cres{i}),best32.);
end;
run;
... View more