In the program below, it works fine when I use just one variable (Diag_cd1) but when I try 5 variables in an array (Diag_cd1-Diag_cd5) it does not work giving me an error message about parsing WHERER clause with an underline under $DIAGL. Can an array be used for what I want to do? The Proc Format has been reduced here for posting. thanks.
PROC FORMAT;
VALUE $DIAGL
"39891",
"4280 "-"4289 " = "CHF" /* Congestive heart failure */
"09320"-"09324",
"3940 "-"3971 ",
"3979 ",
"4240 "-"42499",
"7463 "-"7466 ",
"V422 ",
"V433 " = "VALVE" /* Valvular disease */
"41511"-"41519",
"4160 "-"4169 ",
"4179 " = "PULMCIRC" /* Pulmonary circ disorder;*/
"042 "-"0449 " = "AIDS" /* HIV and AIDS */
;
data monday3;
set monday2;
where put (diag_cd1,$DIAGL.) eq 'AIDS';
run;
data
monday3;
set
monday2;array
diagg(5) $ diag_cd1-diag_cd5;
DO
i = 1 TO 5;
where
putc(diagg(i),$DIAGL.) eq 'AIDS';
end
;
run
;
Your format only appears to contain the values for aids .. none of the other values appear to match your example data. If you need to capture all of the records that contain any of the aids values as shown in your format, then you might be able to use something like:
data monday3;
set monday2;
array diagg(5) $ diag_cd1-diag_cd5;
keep=0;
do i = 1 TO 5;
if put (diagg(i),$DIAGL5.) eq 'AIDS' then do;
keep=1;
leave;
end;
end;
if keep;
run;
An example dataset, in the form of a datastep, would help everyone to test your code.
Can I assume you have you double tested your format code? I'm not sure about character ranges and how they map codes, specifically with the spaces as well.
I think there might be a bit of a logic flaw though trying to delete/keep records in the middle of a do loop without having iterated through the whole loop.
I'm guessing you want only patients who have a diag_cd that maps to AIDS. You could try using an if statement instead and creating a new variable keep that flags whether or not the record should be kept.
data
monday3;
set
monday2;
array
diagg(5) $ diag_cd1-diag_cd5;
keep_record=0;
DO
i = 1 TO 5;
if 
putc(diagg(i),$DIAGL.) eq 'AIDS' then keep_record=1;
end;
if keep_record=1;
run;
Did you notice that changed from a PUT to a PUTC?
Also are you wanting to include the record if any of the variables meet the condition or all of the variables meet the condition?
If so the loop approach you are using probably isn't going to work.
You may need to be a bit more explicit instead of using the array:
where [put(diag_cd1,$diagl.) eq 'AIDS'] OR [put(diag_cd2,$diagl.) eq 'AIDS'] or <continue through all variables>;
if ALL of the codes then replace the OR with AND.
Hello. Below I've included test data. The DIAG_CD1-DIAG_CD5 are all character 5 length. I am using PUTC because I understood that was for character. My intention with the coding is if any of the five DIAG_CD are AIDS, I want to keep the observation. AIDS code is 04200. I will try the KEEP_RECORD suggestion. thanks.
DIAG_CD1 DIAG_CD2 DIAG_CD3 DIAG_CD4 DIAG_CD5
78210 70800 38290
E9390 78050 78050 25000 31100
04200 78020 78020 30000 92000
64683 64683 79650 04200
04200 72879 72879
78900 33819 78900
V2210 V2389
78060 78060 07999 78620
E9270 72450 84720
E8842 72450 92231
72950 72950
78703 40190 78703 V4365
59970 59320 59970 59900 33829
E9160 95940 92320 92310 30510
E8842 95970 92420 98780
78620 47819
84790 84790
V0190 53081 V0190 V0190 53081
E8160 95920 84090 84700 30510
72380 72380 72310
78096 78096 72310 30000 49600
78096 78096
78330 78330 V4410 34390
34390 34390 34320
V6709 04200 59200 59390
E8788 70780 70780 70920 74190
52250 52250 25000 04200 V5867
data have;
infile cards missover;
input (DIAG_CD1 DIAG_CD2 DIAG_CD3 DIAG_CD4 DIAG_CD5) ($);
cards;
78210 70800 38290
E9390 78050 78050 25000 31100
04200 78020 78020 30000 92000
64683 64683 79650 04200
04200 72879 72879
78900 33819 78900
V2210 V2389
78060 78060 07999 78620
E9270 72450 84720
E8842 72450 92231
72950 72950
78703 40190 78703 V4365
59970 59320 59970 59900 33829
E9160 95940 92320 92310 30510
E8842 95970 92420 98780
78620 47819
84790 84790
V0190 53081 V0190 V0190 53081
E8160 95920 84090 84700 30510
72380 72380 72310
78096 78096 72310 30000 49600
78096 78096
78330 78330 V4410 34390
34390 34390 34320
V6709 04200 59200 59390
E8788 70780 70780 70920 74190
52250 52250 25000 04200 V5867
;
run;
data want;
set have;
if whichc('04200',of DIAG_CD1-DIAG_CD5) then output;
run;
Your format only appears to contain the values for aids .. none of the other values appear to match your example data. If you need to capture all of the records that contain any of the aids values as shown in your format, then you might be able to use something like:
data monday3;
set monday2;
array diagg(5) $ diag_cd1-diag_cd5;
keep=0;
do i = 1 TO 5;
if put (diagg(i),$DIAGL5.) eq 'AIDS' then do;
keep=1;
leave;
end;
end;
if keep;
run;
The coding from art297 works for me with the focus on Proc Format "AIDS" text. Thanks to all replying.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
