Solved
Contributor
Posts: 20

# Array and Do Loop "Skipping" - skips first observation line

Hello,

Rationale: searching through patient records for specific ICD-9 codes (char) using prxmatch() and flagging for disease condition. Data looks like this:

idicd9_1icd9_2icd9_3
1153.4285.1427.31
2578.1584.5570

Each observation has a different number of variables (icd9_{i}) the max is icd9_39. What I want is the following:

idicd9_1......icd9_39CancerAIDS
1153.410
2578.1276.200

My Code:

data want

array icd9_{39} \$ icd9_1-icd9_39;

cancer = 0;

do i = 1 to 39;

if prxmatch("/140/", icd9_{i}) or prxmatch("/141/", icd9_{i}) or ...

then cancer = 1;

end;

aids = 0;

do i = 1 to 39;

if prxmatch("/042/", icd9_{i}) or prxmatch("/043/", icd9_{i}) or ...

then aids = 1;

end;

set have;

run;

This works perfectly for me except that it skips the first observation and puts the the icd9 codes from the first observation into the second observations disease variables then the 2nd into the 3rd and the 3rd into the 4th and so on.

Question is: How do I fix this? I tried messing with my "i" value and haven't found a solution. I'll keep tinkering.

Accepted Solutions
Solution
‎01-22-2014 10:45 AM
Super User
Posts: 23,682

## Re: Array and Do Loop "Skipping" - skips first observation line

Have you tried moving your set statement to the top, right after the data statement.

All Replies
Solution
‎01-22-2014 10:45 AM
Super User
Posts: 23,682

## Re: Array and Do Loop "Skipping" - skips first observation line

Have you tried moving your set statement to the top, right after the data statement.

Contributor
Posts: 20

## Re: Array and Do Loop "Skipping" - skips first observation line

Nevermind everyone, I figured it out. Apparently if you put your set statement at the end as opposed to right below the data step you get this problem I've described. If anyone can explain why that is I'd be grateful.

Super User
Posts: 6,754

## Re: Array and Do Loop "Skipping" - skips first observation line

You'll have to switch gears related to how you think about the SET statement.  It is not just a label that tells  you where the data comes from.  It is an executable statement.  During the course of a DATA step, it executes many times and each time it reads in the next observation from the incoming SAS data set.

In that light, consider what happens on your first observation (in your original, uncorrected DATA step).  You calculate AIDS and CANCER, then read in the first observation from the incoming data, and finally output the result.  So the calculated values will be 0.  Then the DATA step continues.  It calculates AIDS and CANCER based on the current data values (which came from the first observation, and are retained in memory).  Then it reads in the second observation, and outputs the final result.  So AIDS and CANCER are, as you observed, based on the first observation but the final data values come from the second observation.

It's a medium-complex process, and there are a host of related topics.  For example, you could scour the documentation to study the difference between the compilation phase vs. the execution phase of the DATA step.  But the description above is probably the most relevant to understanding the results you saw.

Good luck.

Contributor
Posts: 20