BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
atbeczkiewicz
Calcite | Level 5

Hi all,

So I am working with a dataset that includes 169 numeric variables which can take the values of . (not observed), 0 (no outcome), or 1+ (number of outcomes). I am using an array to recode these variables to use as predictors in PROC GENMOD and want the variables take the values of 1 (outcome present), 2 (outcome absent), and . (not observed). However, I noticed that variables which had no outcome observed were showing an outcome after using the following code:

data insp_test;
merge data.data_micro data.data_insp_hist_w12;
by ESTID COLLECTDT; ARRAY dcharr (*) R301_002--R550_001; do i = 1 to dim(dcharr); if dcharr(i) > 0 then dcharr(i) = 1; else if dcharr(i) = 0 then dcharr(i) = 2; else dcharr(i) = .; end; run;

I have checked the input and output datasets multiple times and I know the error results during this stage of processing. I am confused because the error does not carry over within the variable, but all variable values of 0 within an observation (8 of 40,497 obs) are recoded to have a value of 1 in the output dataset (instead of 2).

I would greatly appreciate any input on whether this is an issue stemming from my code or if there is some underlying feature of SAS that I do not know about at play here.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @atbeczkiewicz,

 

There's another risk in code like yours which combines merging and recoding: If the merge is not 1:1, observations with duplicate key values might be recoded twice, leading to exactly the effect you've observed, i.e., the recoding changes 0 → 2 → 1.

 

Simplified example (based on your code):

data micro;
input estid collectdt r301 r400 r550;
cards;
1 20000 . 0 5
;

data hist;
input estid collectdt histvar $;
cards;
1 20000 A
1 20000 B
;

data insp_wrong;
merge micro hist;
by estid collectdt;
array dcharr(*) r301--r550;
do i = 1 to dim(dcharr);
  if dcharr(i) > 0 then dcharr(i) = 1; else 
  if dcharr(i) = 0 then dcharr(i) = 2; else
  dcharr(i) = .;
end;
drop i;
run;

Result:

Obs    estid    collectdt    r301    r400    r550    histvar

 1       1        20000        .       2       1        A
 2       1        20000        .       1       1        B

Safest solution in this situation: Separate the MERGE step from the recoding step, i.e., the recoding step will use a SET statement reading the dataset created in the preceding MERGE step. Of course, you need to make sure that observations with duplicate key values (estid collectdt) are handled correctly according to the requirements.

View solution in original post

3 REPLIES 3
Reeza
Super User

1. Show the log and the output from a proc freq for the before/after values. Use a second array to remap to ensure that your code is clean. 

2. Try using the ROUND() function to round your values to the nearest integer before comparison, you may be having floating point issues. 

3. I'm assuming the original variable was numeric, but if it was character you need to modify your code to correctly account for that. You say 1+ below, but not sure if that's the actual value or if that indicates a number greater than 1. 

 


@atbeczkiewicz wrote:

Hi all,

So I am working with a dataset that includes 169 numeric variables which can take the values of . (not observed), 0 (no outcome), or 1+ (number of outcomes).


 

 

ballardw
Super User

One might also suggest providing some details on exactly how you " checked the input and output datasets".

 

What are the current formats assigned to the variables? Sometimes "looking" at a table or printing the values, both of which will use the current default format will hide the fact that there are decimal values involved that "look" like 0 when they are are actually .009 or such.

 

 

FreelanceReinh
Jade | Level 19

Hi @atbeczkiewicz,

 

There's another risk in code like yours which combines merging and recoding: If the merge is not 1:1, observations with duplicate key values might be recoded twice, leading to exactly the effect you've observed, i.e., the recoding changes 0 → 2 → 1.

 

Simplified example (based on your code):

data micro;
input estid collectdt r301 r400 r550;
cards;
1 20000 . 0 5
;

data hist;
input estid collectdt histvar $;
cards;
1 20000 A
1 20000 B
;

data insp_wrong;
merge micro hist;
by estid collectdt;
array dcharr(*) r301--r550;
do i = 1 to dim(dcharr);
  if dcharr(i) > 0 then dcharr(i) = 1; else 
  if dcharr(i) = 0 then dcharr(i) = 2; else
  dcharr(i) = .;
end;
drop i;
run;

Result:

Obs    estid    collectdt    r301    r400    r550    histvar

 1       1        20000        .       2       1        A
 2       1        20000        .       1       1        B

Safest solution in this situation: Separate the MERGE step from the recoding step, i.e., the recoding step will use a SET statement reading the dataset created in the preceding MERGE step. Of course, you need to make sure that observations with duplicate key values (estid collectdt) are handled correctly according to the requirements.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 535 views
  • 4 likes
  • 4 in conversation