Hi all,
So I am working with a dataset that includes 169 numeric variables which can take the values of . (not observed), 0 (no outcome), or 1+ (number of outcomes). I am using an array to recode these variables to use as predictors in PROC GENMOD and want the variables take the values of 1 (outcome present), 2 (outcome absent), and . (not observed). However, I noticed that variables which had no outcome observed were showing an outcome after using the following code:
data insp_test;
merge data.data_micro data.data_insp_hist_w12;
by ESTID COLLECTDT;
ARRAY dcharr (*) R301_002--R550_001;
do i = 1 to dim(dcharr);
if dcharr(i) > 0 then dcharr(i) = 1; else
if dcharr(i) = 0 then dcharr(i) = 2; else
dcharr(i) = .;
end;
run;
I have checked the input and output datasets multiple times and I know the error results during this stage of processing. I am confused because the error does not carry over within the variable, but all variable values of 0 within an observation (8 of 40,497 obs) are recoded to have a value of 1 in the output dataset (instead of 2).
I would greatly appreciate any input on whether this is an issue stemming from my code or if there is some underlying feature of SAS that I do not know about at play here.
Hi @atbeczkiewicz,
There's another risk in code like yours which combines merging and recoding: If the merge is not 1:1, observations with duplicate key values might be recoded twice, leading to exactly the effect you've observed, i.e., the recoding changes 0 → 2 → 1.
Simplified example (based on your code):
data micro;
input estid collectdt r301 r400 r550;
cards;
1 20000 . 0 5
;
data hist;
input estid collectdt histvar $;
cards;
1 20000 A
1 20000 B
;
data insp_wrong;
merge micro hist;
by estid collectdt;
array dcharr(*) r301--r550;
do i = 1 to dim(dcharr);
if dcharr(i) > 0 then dcharr(i) = 1; else
if dcharr(i) = 0 then dcharr(i) = 2; else
dcharr(i) = .;
end;
drop i;
run;
Result:
Obs estid collectdt r301 r400 r550 histvar 1 1 20000 . 2 1 A 2 1 20000 . 1 1 B
Safest solution in this situation: Separate the MERGE step from the recoding step, i.e., the recoding step will use a SET statement reading the dataset created in the preceding MERGE step. Of course, you need to make sure that observations with duplicate key values (estid collectdt) are handled correctly according to the requirements.
1. Show the log and the output from a proc freq for the before/after values. Use a second array to remap to ensure that your code is clean.
2. Try using the ROUND() function to round your values to the nearest integer before comparison, you may be having floating point issues.
3. I'm assuming the original variable was numeric, but if it was character you need to modify your code to correctly account for that. You say 1+ below, but not sure if that's the actual value or if that indicates a number greater than 1.
@atbeczkiewicz wrote:
Hi all,
So I am working with a dataset that includes 169 numeric variables which can take the values of . (not observed), 0 (no outcome), or 1+ (number of outcomes).
One might also suggest providing some details on exactly how you " checked the input and output datasets".
What are the current formats assigned to the variables? Sometimes "looking" at a table or printing the values, both of which will use the current default format will hide the fact that there are decimal values involved that "look" like 0 when they are are actually .009 or such.
Hi @atbeczkiewicz,
There's another risk in code like yours which combines merging and recoding: If the merge is not 1:1, observations with duplicate key values might be recoded twice, leading to exactly the effect you've observed, i.e., the recoding changes 0 → 2 → 1.
Simplified example (based on your code):
data micro;
input estid collectdt r301 r400 r550;
cards;
1 20000 . 0 5
;
data hist;
input estid collectdt histvar $;
cards;
1 20000 A
1 20000 B
;
data insp_wrong;
merge micro hist;
by estid collectdt;
array dcharr(*) r301--r550;
do i = 1 to dim(dcharr);
if dcharr(i) > 0 then dcharr(i) = 1; else
if dcharr(i) = 0 then dcharr(i) = 2; else
dcharr(i) = .;
end;
drop i;
run;
Result:
Obs estid collectdt r301 r400 r550 histvar 1 1 20000 . 2 1 A 2 1 20000 . 1 1 B
Safest solution in this situation: Separate the MERGE step from the recoding step, i.e., the recoding step will use a SET statement reading the dataset created in the preceding MERGE step. Of course, you need to make sure that observations with duplicate key values (estid collectdt) are handled correctly according to the requirements.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.