I have a dataset of hospital discharges for 57 months (hd1-hd57) with character values (up to 20 in length) if there was a discharge. I want to select the records that have values. I wrote this code but it is not selecting all the records:
data kids_case;
set casemix_cancer;
array hd(57) cancer_HD1-cancer_HD57;
do i=1 to dim(hd);
date_hd=hd(i);
if missing(date_hd) then cancer_hd=0;
else cancer_hd=1;
end;
if cancer_hd=1;
run;
the data is in this format
id $20, cancer_HD1-cancer_HD57 $20
So, one record has id=XXHGG and only has 2 discharges: cancer_HD1=2345 and cancer_HD45=58414528. The rest of cancer_HD are blanks.
Just to select the entire record, you could use:
data kids_case;
set casemix_cancer;
if cat(of cancer_hd1 - cancer_hd57) > ' ';
run;
If you are trying to take the existing records and find only the nonblank discharges, that would take a different approach.
Post test data in the form of a datastep. This really helps illustrate your problem. At a guess, you could just concat the array and check for length?
if lengthn(cats(of hd_;)) > 0 then output;
I.e. if they are all blanks lengthn=0 so no output.
The code you have tests all values in the array. The value of cancer_hd is always going to be the result of the last comparison after the do loop.
If you want to set cancer_hd=1 when any of the values are set then perhaps this is what you want;
data kids_case; set casemix_cancer; array hd(57) cancer_HD1-cancer_HD57; do i=1 to dim(hd); date_hd=hd(i); if missing(date_hd) then cancer_hd=0; else do; cancer_hd=1; leave; end; end; if cancer_hd=1; run;
The LEAVE statement says to quit the do loop when encountered. So this will interupt the loop the first time if finds a non-missing value.
Your logic is flawed. You should set the flag to false before the loop and then in the loop set it to true if any non missing value is found. Your current code is effectly only testing the last value in the array.
But why not just use the CMISS() function to count how many variables have a missing value.
cancer_hd = not (57 = cmiss(of cancerHD1-cancer_HD57));
Just to select the entire record, you could use:
data kids_case;
set casemix_cancer;
if cat(of cancer_hd1 - cancer_hd57) > ' ';
run;
If you are trying to take the existing records and find only the nonblank discharges, that would take a different approach.
thank you for all your suggestions! really appreciate it all your help
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.