BookmarkSubscribeRSS Feed
sasuser123
Calcite | Level 5

Hello,

 

I have a huge data with lab results (25 fields) and it is dirty.  I will need to categorize the final (clean) results for my analysis but before that step I need to clean the unwanted characters such as "=","ug/dl", "<" or ">" comes along with the numerical result.

 

I want to use array and do loops because I have 25 variables.  I want to set anything below 5 to "1".

Here is my code:


data wanted;
set have;

array AResult(25)  Result_1-Result_25;

do i=1 to 25;

if AResult(25) in ('<0.5','3.8ug/dl','4.0ug/dl','4.1ug/dl','4.2ug/dl','<.3.3','<5','LOW','LOW <3','=1.3')
then AResult{25}=1;
else AResult{25}=AResult{25};
END;

RUN;

 

I am getting this error:ERROR: All variables in array list must be the same type, i.e., all numeric or character.

 

How can I handle this error?

 

Example Data:

data work.patients;
infile datalines dlm=",";
input pateint_id Result_1 Result_2 Result_3;
datalines;
1, =70, 65, 82
2, 88, =75, 79ug/dl
3, 64, 72,=80;
run;

 

Thanks.

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Somewhere in your variables Result_1 to Result_25, you have a mix of numeric and character variables. They must be either all numeric, or all character. So you have to find out which variables are which, and then make them all the same type (all numeric or all character).

 

Also, this will cause an error:

 

if AResult(25) in ('<0.5','3.8ug/dl','4.0ug/dl','4.1ug/dl','4.2ug/dl','<.3.3','<5','LOW','LOW <3','=1.3')
then AResult{25}=1;

 

For AResult(25) to be one of those character strings would mean AResult(25) is character. So you cannot assign a numeric 1 to this variable.

--
Paige Miller
sasuser123
Calcite | Level 5
Yes, I have mix of numeric and character variables because of the data entry. That is why I am trying to clean it using array and do loop.
Tom
Super User Tom
Super User

@sasuser123 wrote:
Yes, I have mix of numeric and character variables because of the data entry. That is why I am trying to clean it using array and do loop.

Data entry by itself cannot cause variables to change type.

Are you saying you used PROC IMPORT to create the data set?

 

What was the source of the data?  What type of file is it?

If it is a text file, like a CSV file, then use your own data step to read the text into data and you will have full control over how the variables are defined.

sasuser123
Calcite | Level 5

I used data step to import.  It is txt file.  When I specify the variables to be numeric in the data step, then the cells with characters become blank.  

PaigeMiller
Diamond | Level 26

Show us a portion of the .txt file

--
Paige Miller
Tom
Super User Tom
Super User

@sasuser123 wrote:

I used data step to import.  It is txt file.  When I specify the variables to be numeric in the data step, then the cells with characters become blank.  


Of course that is why you need to first read them into character variables.

 if you want to use them in the same ARRAY then you need to read all of them as character variables, even the ones you think don't have any issues.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @sasuser123 

 

When dealing with "dirty" files is is a good idea to read data as text variables, so everything is read in and nothing lost in automatic conversion to numeric. If the demimiters are in place with no extraneus delimiters in fields, and all variables are supposed to be numeric, I will recommend that all variables are read with informat $char32. and converted to numeric in a following step. That gives full control over the cleaning process.

 

Step for reading your example data:

 

data work.patients;
infile datalines dlm=",";
informat patient_id 8 cResult_1-cResult_3 $char32.;
input patient_id cResult_1-cResult_3 ;
datalines;
1, =70, 65, 82
2, 88, =75, 79ug/dl
3, 64, 72,=80
;
run;

 

 

When data are read in, all variables can be processed with an array. I have made 3 examples. The first uses the same cleaning algoritm as your code, or rather an algoritm that gives the same result without having to specify the diferent types of non-numeric data. The idea is that if notdigit(variable) is 0, the variable has a contaet that can be converted to numerig, otherwise it is recoded to 1.So four values out of nine in your example data is recoded to 1, while the rest are converted to a numeric variable.

data work.result1;
  set work.patients;
  drop cResult_1-cResult_3;
  array cres $ cResult_1-cResult_3;
  array res Result_1-Result_3;
  do i = 1 to 3;
    if notdigit(compress(cres{i},' ')) then res{i} = 1;
    else res{i} = input(left(cres{i}),best32.);
  end;
run;

 

 

The two other eaxmples are included to demonstrate how a more complicated cleaning algoritm could be applied with a few lines of code. Number two tries to preserve as many values as possible by ignoring extraneus information (i.e. all characters that are not digits) and converting the digits in all nine values to numeric variables. That works only if all digits represent valid values, so e.g. 79ug/dl contains the valid value 79.

 

Version 3 preserves all values with digits only and an optional wqual sign (=), assuming that the values with the equal sign removed represent valid values, while values with other extraneus information should be recoded to 1. This could be refines further if needed.

 

data work.result2;
  set work.patients;
  drop cResult_1-cResult_3;
  array cres $ cResult_1-cResult_3;
  array res Result_1-Result_3;
  do i = 1 to 3;
    res{i} = input(left(compress(cres{i},,'dk')),best32.);
  end;
run;

data work.result3;
  set work.patients;
  drop cResult_1-cResult_3;
  array cres $ cResult_1-cResult_3;
  array res Result_1-Result_3;
  do i = 1 to 3;
    if notdigit(compress(cres{i},' ')) then do;
      if notdigit(compress(cres{i},'= ')) = 0 then res{i} = input(left(compress(cres{i},'=')),best32.);
      else res{i} = 1;
    end;
    else res{i} = input(left(cres{i}),best32.);
  end;
run;

 

 

pink_poodle
Barite | Level 11

The comparisons are treating the variables as strings, but some of the variables are numeric, so that produces an error. Start with a dataset that includes only variables Result_1 - Result 25. f you specify a _character_ keyword in the array statement like "array c _character_; " the comparisons will work. The numeric variables are already clean, they cannot contain any special characters. If you would like to go over the numeric variables and set anything below 5 to 1, you can specify "array n _numeric_; do over n; if n < 5 then n = 1; end;".

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 4162 views
  • 0 likes
  • 5 in conversation