Multiple variables for same condition PROC SQL

Accepted Solution Solved
Reply
Contributor
Posts: 32
Accepted Solution

Multiple variables for same condition PROC SQL

Hello, I have a large healthcare administrative dataset and have multiple variables that will use the same condition. Previously, I used the following code to create a variable for diabetes:

 

PROC SQL;
 CREATE TABLE MOM_2 AS
 SELECT *, CASE WHEN
 MOM_ID IN (SELECT MOM_ID FROM MOM_1
  WHERE COMPRESS (DX_CODE) BETWEEN "XX.XX" AND "XX.XX")
   THEN 1
 ELSE 0
 END AS DIABETES
 FROM MOM_1;
QUIT;

 

Now I have multiple  DX_CODE variables associated with multiple visits. Example:

DX_CODE1_VISIT1

DX_CODE1_VISIT2

DX_CODE2_VISIT1

DX_CODE2_VISIT2

DX_CODE3_VISIT1

DX_CODE3_VISIT2

 

My previous PROC SQL statement is not working when I include multiple variables in the WHERE statement. For example:

 

PROC SQL;
 CREATE TABLE MOM_1 AS
 SELECT *, CASE WHEN
 MOM_ID IN (SELECT MOM_ID FROM VISIT1
  WHERE COMPRESS (DX_CODE1_VISIT1, DX_CODE2_VISIT1, DX_CODE1_VISIT2, DX_CODE2_VISIT1.....and so on ) BETWEEN "XX.XX" AND "XX.XX")
   THEN 1
 ELSE 0
 END AS DIABETES
 FROM MOM_2;
QUIT;

 

Is there a way to write multiple variables in the statement?

If so, is it possible to put a range for the DX_CODE variables, rather than writing them each separately (to save time)?. For example:

DX_CODE1_VISIT1-DX_CODE1_VISIT10, DX_CODE2_VISIT1-DX_CODE2_VISIT10...

 

If including multple variables is not possible, any suggestions?

Thank you.

 


Accepted Solutions
Solution
‎08-10-2017 02:34 PM
New Contributor
Posts: 4

Re: Multiple variables for same condition PROC SQL

 

You can go all macro on it, here is a sketch: 

 

%macro t; 

PROC SQL;
 CREATE TABLE MOM_2 AS
 SELECT *

%do i=1 %to 3; 

 %do j=1 %to 4; 

   ,CASE WHEN
 MOM_ID IN (SELECT MOM_ID FROM MOM_1
  WHERE COMPRESS (DX_CODE&i._visit&j.) BETWEEN "XX.XX" AND "XX.XX")
   THEN 1
 ELSE 0
 END AS DIABETES_&i._&j.

 %end; 

%end; 

 

 END AS DIABETES
 FROM MOM_1;
QUIT;

 

%mend t; 

%t; 

View solution in original post


All Replies
Solution
‎08-10-2017 02:34 PM
New Contributor
Posts: 4

Re: Multiple variables for same condition PROC SQL

 

You can go all macro on it, here is a sketch: 

 

%macro t; 

PROC SQL;
 CREATE TABLE MOM_2 AS
 SELECT *

%do i=1 %to 3; 

 %do j=1 %to 4; 

   ,CASE WHEN
 MOM_ID IN (SELECT MOM_ID FROM MOM_1
  WHERE COMPRESS (DX_CODE&i._visit&j.) BETWEEN "XX.XX" AND "XX.XX")
   THEN 1
 ELSE 0
 END AS DIABETES_&i._&j.

 %end; 

%end; 

 

 END AS DIABETES
 FROM MOM_1;
QUIT;

 

%mend t; 

%t; 

Contributor
Posts: 32

Re: Multiple variables for same condition PROC SQL

Excellent, it worked, thank you. Next question, the table it created MOM_2 has multiple columns:

 

DIABETES_1_1      DIABETES_1_2...      DIABETES_2_1...

 

How do I collapse the columns so that if the mom has diabetes in any of the diagnosis slots, in any of the visits, then she has diabetes (1=yes) . Basically I want to collapse all the columns I just created.

 

Thanks!

 

New Contributor
Posts: 4

Re: Multiple variables for same condition PROC SQL

[ Edited ]

Here, I think you are clearly better of using a data step. 

 

in the data step you can create a new variable that takes the min of all the ones you've created. For example: 

 

IndDiab = min (of DIABETES_1_1   -- DIABETES_<n>_<m>);

 

replace n & m with values 

 

IndDiab will take 0 when none of the variable in the list is 1. 

 

good luck. 

Contributor
Posts: 32

Re: Multiple variables for same condition PROC SQL

Great, thanks so much for your help!

New Contributor
Posts: 4

Re: Multiple variables for same condition PROC SQL

You are welcome! glad I was able to.
Super User
Super User
Posts: 7,958

Re: Multiple variables for same condition PROC SQL

Yes, the problem here is your fighting SQL.  SQL is a language developed to work with normalised data, i.e. data going down a table rather than across.  Its really not suited to using data which is transposed in nature - going across the dataset.  Now in Base SAS you have constructs to work with this type of data, however its still not optimal.  

So you have two options - and as you have not provided any Test data in the form of a datastep in the body of the post!! - I can't go further into other than this:

1) Still using SQL, then normalise the DX_CODE list so that visit is a separate variable that goes down th etable like:

VISIT DX_CODE

1        XYZ

...

This is how you should model your data if you want to use SQL (and note, if this is Healthcare, then why are you not using CDISC models anyways?)

 

2) Drop SQL, and goto datastep, something like:

data...;
  set mom_1;
  array dx_code{6};
  if visit=1 then x=1;
  if visit=2 then x=3;
  else x=5;
  if dx_code=dx_code{x} or dx_code=dx_code{x+1} then output;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 390 views
  • 1 like
  • 3 in conversation