Solved: Array and Do Loop "Skipping" - skips first observation line

rrevans · Posted 01-22-2014 10:35 AM

Hello,

Rationale: searching through patient records for specific ICD-9 codes (char) using prxmatch() and flagging for disease condition. Data looks like this:

id	icd9_1	icd9_2	icd9_3
1	153.4	285.1	427.31
2	578.1	584.5	570

Each observation has a different number of variables (icd9_{i}) the max is icd9_39. What I want is the following:

id	icd9_1...	...icd9_39	Cancer	AIDS
1	153.4		1	0
2	578.1	276.2	0	0

My Code:

data want

array icd9_{39} $ icd9_1-icd9_39;

cancer = 0;

do i = 1 to 39;

if prxmatch("/140/", icd9_{i}) or prxmatch("/141/", icd9_{i}) or ...

then cancer = 1;

end;

aids = 0;

do i = 1 to 39;

if prxmatch("/042/", icd9_{i}) or prxmatch("/043/", icd9_{i}) or ...

then aids = 1;

end;

set have;

run;

This works perfectly for me except that it skips the first observation and puts the the icd9 codes from the first observation into the second observations disease variables then the 2nd into the 3rd and the 3rd into the 4th and so on.

Question is: How do I fix this? I tried messing with my "i" value and haven't found a solution. I'll keep tinkering.

Reeza · Posted 01-22-2014 10:45 AM

Have you tried moving your set statement to the top, right after the data statement.

View solution in original post

Reeza · Posted 01-22-2014 10:45 AM

Have you tried moving your set statement to the top, right after the data statement.

rrevans · Posted 01-22-2014 10:46 AM

Nevermind everyone, I figured it out. Apparently if you put your set statement at the end as opposed to right below the data step you get this problem I've described. If anyone can explain why that is I'd be grateful.

Astounding · Posted 01-22-2014 12:05 PM

You'll have to switch gears related to how you think about the SET statement. It is not just a label that tells you where the data comes from. It is an executable statement. During the course of a DATA step, it executes many times and each time it reads in the next observation from the incoming SAS data set.

In that light, consider what happens on your first observation (in your original, uncorrected DATA step). You calculate AIDS and CANCER, then read in the first observation from the incoming data, and finally output the result. So the calculated values will be 0. Then the DATA step continues. It calculates AIDS and CANCER based on the current data values (which came from the first observation, and are retained in memory). Then it reads in the second observation, and outputs the final result. So AIDS and CANCER are, as you observed, based on the first observation but the final data values come from the second observation.

It's a medium-complex process, and there are a host of related topics. For example, you could scour the documentation to study the difference between the compilation phase vs. the execution phase of the DATA step. But the description above is probably the most relevant to understanding the results you saw.

Good luck.

rrevans · Posted 01-22-2014 01:49 PM

Thank You for Answering. I forget these details at times.

Array and Do Loop "Skipping" - skips first observation line

Re: Array and Do Loop "Skipping" - skips first observation line

Re: Array and Do Loop "Skipping" - skips first observation line

Re: Array and Do Loop "Skipping" - skips first observation line

Re: Array and Do Loop "Skipping" - skips first observation line

Re: Array and Do Loop "Skipping" - skips first observation line

Array and Do Loop "Skipping" - skips first observation line

Re: Array and Do Loop "Skipping" - skips first observation line

Re: Array and Do Loop "Skipping" - skips first observation line

Re: Array and Do Loop "Skipping" - skips first observation line

Re: Array and Do Loop "Skipping" - skips first observation line

Re: Array and Do Loop "Skipping" - skips first observation line

Registration is open

SAS Training: Just a Click Away