Solved: Array recoding incorrect value

atbeczkiewicz · Posted 02-07-2021 10:27 PM

Hi all,

So I am working with a dataset that includes 169 numeric variables which can take the values of . (not observed), 0 (no outcome), or 1+ (number of outcomes). I am using an array to recode these variables to use as predictors in PROC GENMOD and want the variables take the values of 1 (outcome present), 2 (outcome absent), and . (not observed). However, I noticed that variables which had no outcome observed were showing an outcome after using the following code:

data insp_test;
merge data.data_micro data.data_insp_hist_w12;
by ESTID COLLECTDT;
ARRAY dcharr (*) R301_002--R550_001;
do i = 1 to dim(dcharr);
	if dcharr(i) > 0 then dcharr(i) = 1; else 
	if dcharr(i) = 0 then dcharr(i) = 2; else
	dcharr(i) = .;
end;
run;

I have checked the input and output datasets multiple times and I know the error results during this stage of processing. I am confused because the error does not carry over within the variable, but all variable values of 0 within an observation (8 of 40,497 obs) are recoded to have a value of 1 in the output dataset (instead of 2).

I would greatly appreciate any input on whether this is an issue stemming from my code or if there is some underlying feature of SAS that I do not know about at play here.

FreelanceReinh · Posted 02-08-2021 03:48 AM

Hi @atbeczkiewicz,

There's another risk in code like yours which combines merging and recoding: If the merge is not 1:1, observations with duplicate key values might be recoded twice, leading to exactly the effect you've observed, i.e., the recoding changes 0 → 2 → 1.

Simplified example (based on your code):

data micro;
input estid collectdt r301 r400 r550;
cards;
1 20000 . 0 5
;

data hist;
input estid collectdt histvar $;
cards;
1 20000 A
1 20000 B
;

data insp_wrong;
merge micro hist;
by estid collectdt;
array dcharr(*) r301--r550;
do i = 1 to dim(dcharr);
  if dcharr(i) > 0 then dcharr(i) = 1; else 
  if dcharr(i) = 0 then dcharr(i) = 2; else
  dcharr(i) = .;
end;
drop i;
run;

Result:

Obs    estid    collectdt    r301    r400    r550    histvar

 1       1        20000        .       2       1        A
 2       1        20000        .       1       1        B

Safest solution in this situation: Separate the MERGE step from the recoding step, i.e., the recoding step will use a SET statement reading the dataset created in the preceding MERGE step. Of course, you need to make sure that observations with duplicate key values (estid collectdt) are handled correctly according to the requirements.

View solution in original post

Reeza · Posted 02-07-2021 11:28 PM

1. Show the log and the output from a proc freq for the before/after values. Use a second array to remap to ensure that your code is clean.

2. Try using the ROUND() function to round your values to the nearest integer before comparison, you may be having floating point issues.

3. I'm assuming the original variable was numeric, but if it was character you need to modify your code to correctly account for that. You say 1+ below, but not sure if that's the actual value or if that indicates a number greater than 1.

@atbeczkiewicz wrote:

Hi all,

So I am working with a dataset that includes 169 numeric variables which can take the values of . (not observed), 0 (no outcome), or 1+ (number of outcomes).

ballardw · Posted 02-08-2021 02:01 AM

One might also suggest providing some details on exactly how you " checked the input and output datasets".

What are the current formats assigned to the variables? Sometimes "looking" at a table or printing the values, both of which will use the current default format will hide the fact that there are decimal values involved that "look" like 0 when they are are actually .009 or such.

FreelanceReinh · Posted 02-08-2021 03:48 AM

Hi @atbeczkiewicz,

There's another risk in code like yours which combines merging and recoding: If the merge is not 1:1, observations with duplicate key values might be recoded twice, leading to exactly the effect you've observed, i.e., the recoding changes 0 → 2 → 1.

Simplified example (based on your code):

data micro;
input estid collectdt r301 r400 r550;
cards;
1 20000 . 0 5
;

data hist;
input estid collectdt histvar $;
cards;
1 20000 A
1 20000 B
;

data insp_wrong;
merge micro hist;
by estid collectdt;
array dcharr(*) r301--r550;
do i = 1 to dim(dcharr);
  if dcharr(i) > 0 then dcharr(i) = 1; else 
  if dcharr(i) = 0 then dcharr(i) = 2; else
  dcharr(i) = .;
end;
drop i;
run;

Result:

Obs    estid    collectdt    r301    r400    r550    histvar

 1       1        20000        .       2       1        A
 2       1        20000        .       1       1        B

Safest solution in this situation: Separate the MERGE step from the recoding step, i.e., the recoding step will use a SET statement reading the dataset created in the preceding MERGE step. Of course, you need to make sure that observations with duplicate key values (estid collectdt) are handled correctly according to the requirements.

Array recoding incorrect value

Re: Array recoding incorrect value

Re: Array recoding incorrect value

Re: Array recoding incorrect value

Re: Array recoding incorrect value

Array recoding incorrect value

Re: Array recoding incorrect value

Re: Array recoding incorrect value

Re: Array recoding incorrect value

Re: Array recoding incorrect value

Click image to register for webinar

Classroom Training Available!