I have a list of diagnostic codes in variables dx1-dx9 for each observation. I'm trying to extract the first diagnostic code that matches a specified subset for each observation. Here's the basic structure of my data:
data test;
input dx1 $ dx2 $ dx3 $ dx4 $ dx5 $ dx6 $ dx7 $ dx8 $ dx9 $;
datalines;
D2002 O2829 V1002 T2000 W2018 . . . .
B1080 V1001 S400X R2910 44323 I1088 FB372 X3007 A5850
M5992 R6602 U2710 S400X D0412 V1002 C4010 . .
;
run;
This is the code I wrote:
data pwid.test;
length first_dx $6;
set pwid.test;
array dxnum[9] dx1-dx9;
do i = 1 to 9;
if dxnum[i] not in ("V1001", "V1002", "S400X") then continue;
else first_dx=dxnum[i] and leave;
end;
run;
I'm getting a message for my "else" line that character values have been converted to numeric values and numeric values have been converted to character values, then I get an error message saying "invalid numeric data" (see screenshot). As far as I can tell everything should be a character variable, so I'm not sure why anything is getting converted to numeric. What's causing this issue? If there's a better way to write this code I'd appreciate that as well. Thank you!
Not sure your exact objective, can i assume this is what you perhaps want?
data test1;
length first_dx $6;
set test;
array dxnum[9] dx1-dx9;
do i = 1 to 9;
if dxnum[i] not in ("V1001", "V1002", "S400X") then continue;
else first_dx=dxnum[i];
end;
run;
I'm trying to set first_dx to the first matching diagnostic code listed in dx1-dx9. In the above example, that would be V1002 for obs 1, V1001 for obs 2, and S400X for obs 3. I think your code keeps running the do loop even after it finds a match, so first_dx is set to the last matching code.
I would recommend this variation:
do i = 1 to 9 until (first_dx > ' ');
if dxnum[i] in ("V1001", "V1002", "S400X") then
first_dx=dxnum[i];
end;
Please read the entire log from this. Your problem is at line 189, column 23. What is there? At that location is a NUMERIC variable named LEAVE. You can't assign the value DXNUM[i] AND LEAVE to FIRST_DX, because DXNUM[i] is charcter and LEAVE is numeric and so the statement DXNUM[i] AND LEAVE cannot be evaluated.
Thanks, I figured that might be my problem. How can I exit the do loop at that point since leave doesn't work?
Forget about LEAVE and CONTINUE.
Tell the DO statement what criteria to use to end the looping.
data pwid.test;
set pwid.test;
array dxnum[9] dx1-dx9;
length first_dx $6;
do i = 1 to 9 until(not missing(first_dx));
if dxnum[i] in ("V1001", "V1002", "S400X") then first_dx=dxnum[i];
end;
run;
Thanks for the help, this syntax is much more straightforward.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.