I'm working with Medicaid data, and I have a series of variables that represent all of the diagnosis categories that the doctor billed for during each visit. There are about 250 variables, and they each have a sequentially numbered variable name: DX_cat_1, DX_cat_2 ... DX_cat_250. If any of these variables have a value of 5, then it is considered a visit that involved a mental health diagnosis. So here is the code I attempted, which doesn't work:
DATA medicaid.&filename._DXcount;
/*Initialize variable to represent if the visit included a mental health diagnosis*/
MHvisit = 0;
DO i = 1 to 250 by 1;
IF DX_cat_i = 5 THEN MHvisit = 1;
OUTPUT;
END;
RUN;
Close, you need to declare an array for the variables. You can loop or use the WHICHN function.
You'll output a line for ever DX_cAT though, effectively transposing it. Is that what you wanted? I'm assuming you only want the mental health visits.
DATA medicaid.&filename._DXcount;
/*Initialize variable to represent if the visit included a mental health diagnosis*/
MHvisit = 0;
array dx_cat(250) dx_cat_1-dx_cat_250;
DO i = 1 to 250 ;
IF DX_cat(i) = 5 THEN MHvisit = 1;
END;
if MHvisit=1 then output;
RUN;
Or using whichn function:
DATA medicaid.&filename._DXcount;
/*Initialize variable to represent if the visit included a mental health diagnosis*/
MHvisit = 0;
array dx_cat(250) dx_cat_1-dx_cat_250;
mhvisit=1;
if whichn(5, of dx_cat(*))>0 then output;
RUN;
No luck I tried it both ways. The DO loop outputs a file with 250 variables and 0 observations. The whichn version outputs a file with 249 variables and 0 observations.
Here is the code I used (the DX_cat variables actually go up to 248).
DATA medicaid.&filename._DXcount;
/*Initialize variable to represent if the visit included a mental health diagnosis*/
MHvisit = 0;
array dx_cat(248) dx_cat_1-dx_cat_248;
DO i = 1 to 248;
IF DX_cat(i) = 5 THEN MHvisit = 1;
END;
IF MHvisit=1 THEN output;
RUN;
DATA medicaid.&filename._DXcount;
/*Initialize variable to represent if the visit included a mental health diagnosis*/
MHvisit = 0;
array dx_cat(248) dx_cat_1-dx_cat_248;
mhvisit=1;
if whichn(5, of dx_cat(*))>0 then output;
RUN;
Hi,
Are the variables numeric? Could be that. Post some test data where it doesn't work as the following works fine:
data have;
dx_cat1=1; dx_cat2=5; dx_cat3=4; output;
dx_cat1=3; dx_cat2=2; dx_cat3=7; output;
run;
data want;
set have;
array dx_cat{3};
if whichn(5, of dx_cat{*}) > 0 then output;
run;
Apparently the problem was that I didn't use separate input and output data sets. Below is the final version of the code. Thanks!
/*******************************************************************************/
DATA medicaid.&filename._MHvisit; SET medicaid.&filename._DXcount;
/*Initialize variable to represent if the visit included a mental health diagnosis*/
MHvisit = 0;
array dx_cat{248} dx_cat_1-dx_cat_248;
DO i = 1 to 248;
IF DX_cat(i) = 5 THEN MHvisit = 1;
END;
IF MHvisit=1 THEN output;
RUN;
And did you try the whichn() version? It will be faster than a loop and reads easier?
I would be very tempted to create an entirely new set of variables that are indicators.
Array dx_cat dx_cat: ;
array dx dx_1-dx_250 ;
do I = 1 to dim(dx_cat);
if not missing(dx_cat) then dx[dx_cat]=1;
end;
dx_5 would be your mental health visits and a 1 indicates that visit had that treatment. I would hope you would be able to assign appropriate variable labels to all of the 250 categories.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.