BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rrevans
Calcite | Level 5

Hello,

Rationale: searching through patient records for specific ICD-9 codes (char) using prxmatch() and flagging for disease condition. Data looks like this:

idicd9_1icd9_2icd9_3
1153.4285.1427.31
2578.1584.5570

Each observation has a different number of variables (icd9_{i}) the max is icd9_39. What I want is the following:

idicd9_1......icd9_39CancerAIDS
1153.410
2578.1276.200

My Code:

data want

     array icd9_{39} $ icd9_1-icd9_39;

     cancer = 0;

          do i = 1 to 39;

               if prxmatch("/140/", icd9_{i}) or prxmatch("/141/", icd9_{i}) or ...

               then cancer = 1;

          end;

     aids = 0;

          do i = 1 to 39;

               if prxmatch("/042/", icd9_{i}) or prxmatch("/043/", icd9_{i}) or ...

               then aids = 1;

          end;

     set have;

run;

This works perfectly for me except that it skips the first observation and puts the the icd9 codes from the first observation into the second observations disease variables then the 2nd into the 3rd and the 3rd into the 4th and so on.

Question is: How do I fix this? I tried messing with my "i" value and haven't found a solution. I'll keep tinkering.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Have you tried moving your set statement to the top, right after the data statement.

View solution in original post

4 REPLIES 4
Reeza
Super User

Have you tried moving your set statement to the top, right after the data statement.

rrevans
Calcite | Level 5

Nevermind everyone, I figured it out. Apparently if you put your set statement at the end as opposed to right below the data step you get this problem I've described. If anyone can explain why that is I'd be grateful.

Astounding
PROC Star

You'll have to switch gears related to how you think about the SET statement.  It is not just a label that tells  you where the data comes from.  It is an executable statement.  During the course of a DATA step, it executes many times and each time it reads in the next observation from the incoming SAS data set.

In that light, consider what happens on your first observation (in your original, uncorrected DATA step).  You calculate AIDS and CANCER, then read in the first observation from the incoming data, and finally output the result.  So the calculated values will be 0.  Then the DATA step continues.  It calculates AIDS and CANCER based on the current data values (which came from the first observation, and are retained in memory).  Then it reads in the second observation, and outputs the final result.  So AIDS and CANCER are, as you observed, based on the first observation but the final data values come from the second observation.

It's a medium-complex process, and there are a host of related topics.  For example, you could scour the documentation to study the difference between the compilation phase vs. the execution phase of the DATA step.  But the description above is probably the most relevant to understanding the results you saw.

Good luck.

rrevans
Calcite | Level 5

Thank You for Answering. I forget these details at times.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1822 views
  • 1 like
  • 3 in conversation