Hello, I have a large healthcare administrative dataset and have multiple variables that will use the same condition. Previously, I used the following code to create a variable for diabetes:
PROC SQL;
CREATE TABLE MOM_2 AS
SELECT *, CASE WHEN
MOM_ID IN (SELECT MOM_ID FROM MOM_1
WHERE COMPRESS (DX_CODE) BETWEEN "XX.XX" AND "XX.XX")
THEN 1
ELSE 0
END AS DIABETES
FROM MOM_1;
QUIT;
Now I have multiple DX_CODE variables associated with multiple visits. Example:
DX_CODE1_VISIT1
DX_CODE1_VISIT2
DX_CODE2_VISIT1
DX_CODE2_VISIT2
DX_CODE3_VISIT1
DX_CODE3_VISIT2
My previous PROC SQL statement is not working when I include multiple variables in the WHERE statement. For example:
PROC SQL;
CREATE TABLE MOM_1 AS
SELECT *, CASE WHEN
MOM_ID IN (SELECT MOM_ID FROM VISIT1
WHERE COMPRESS (DX_CODE1_VISIT1, DX_CODE2_VISIT1, DX_CODE1_VISIT2, DX_CODE2_VISIT1.....and so on ) BETWEEN "XX.XX" AND "XX.XX")
THEN 1
ELSE 0
END AS DIABETES
FROM MOM_2;
QUIT;
Is there a way to write multiple variables in the statement?
If so, is it possible to put a range for the DX_CODE variables, rather than writing them each separately (to save time)?. For example:
DX_CODE1_VISIT1-DX_CODE1_VISIT10, DX_CODE2_VISIT1-DX_CODE2_VISIT10...
If including multple variables is not possible, any suggestions?
Thank you.
You can go all macro on it, here is a sketch:
%macro t;
PROC SQL;
CREATE TABLE MOM_2 AS
SELECT *
%do i=1 %to 3;
%do j=1 %to 4;
,CASE WHEN
MOM_ID IN (SELECT MOM_ID FROM MOM_1
WHERE COMPRESS (DX_CODE&i._visit&j.) BETWEEN "XX.XX" AND "XX.XX")
THEN 1
ELSE 0
END AS DIABETES_&i._&j.
%end;
%end;
END AS DIABETES
FROM MOM_1;
QUIT;
%mend t;
%t;
You can go all macro on it, here is a sketch:
%macro t;
PROC SQL;
CREATE TABLE MOM_2 AS
SELECT *
%do i=1 %to 3;
%do j=1 %to 4;
,CASE WHEN
MOM_ID IN (SELECT MOM_ID FROM MOM_1
WHERE COMPRESS (DX_CODE&i._visit&j.) BETWEEN "XX.XX" AND "XX.XX")
THEN 1
ELSE 0
END AS DIABETES_&i._&j.
%end;
%end;
END AS DIABETES
FROM MOM_1;
QUIT;
%mend t;
%t;
Excellent, it worked, thank you. Next question, the table it created MOM_2 has multiple columns:
DIABETES_1_1 DIABETES_1_2... DIABETES_2_1...
How do I collapse the columns so that if the mom has diabetes in any of the diagnosis slots, in any of the visits, then she has diabetes (1=yes) . Basically I want to collapse all the columns I just created.
Thanks!
Here, I think you are clearly better of using a data step.
in the data step you can create a new variable that takes the min of all the ones you've created. For example:
IndDiab = min (of DIABETES_1_1 -- DIABETES_<n>_<m>);
replace n & m with values
IndDiab will take 0 when none of the variable in the list is 1.
good luck.
Great, thanks so much for your help!
Yes, the problem here is your fighting SQL. SQL is a language developed to work with normalised data, i.e. data going down a table rather than across. Its really not suited to using data which is transposed in nature - going across the dataset. Now in Base SAS you have constructs to work with this type of data, however its still not optimal.
So you have two options - and as you have not provided any Test data in the form of a datastep in the body of the post!! - I can't go further into other than this:
1) Still using SQL, then normalise the DX_CODE list so that visit is a separate variable that goes down th etable like:
VISIT DX_CODE
1 XYZ
...
This is how you should model your data if you want to use SQL (and note, if this is Healthcare, then why are you not using CDISC models anyways?)
2) Drop SQL, and goto datastep, something like:
data...; set mom_1; array dx_code{6}; if visit=1 then x=1; if visit=2 then x=3; else x=5; if dx_code=dx_code{x} or dx_code=dx_code{x+1} then output; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.