WHERE Statement Efficiency

Reply
Contributor
Posts: 42

WHERE Statement Efficiency

Is there a more efficient way to write the following code?

proc sql;

create table ABC as

select t1.age,

       t1.gender,

       t1.diagnosis1,

       t1.diagnosis2,

       t1.diagnosis3,

       t1.diagnosis4
from work.testfile t1

where t1.diagnosis1 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",
"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359")

or t1.diagnosis2 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359")

or t1.diagnosis3 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359")

or t1.diagnosis4 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359");

quit;

Grand Advisor
Posts: 17,351

Re: WHERE Statement Efficiency

Perhaps put the codes in a format and then evaluate?

Or a data step with an array?

data want;

set have;

keep age diagnosis1-diagnosis4;

array diag(4) diagnosis1-diagnosis4;

keep=0;

do i=1 to dim(diag);

if diag(i) in

("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359") then keep=1;

end;


if keep=1;

run;

Respected Advisor
Posts: 4,976

Re: WHERE Statement Efficiency

Reeza, a small improvement:

do i=1 to dim(diag) until (keep=1);

That way, if DIAGNOSIS1 is a match, there's no need to check the other 3 variables.

Esteemed Advisor
Posts: 6,661

Re: WHERE Statement Efficiency

<g> One can always find something to optimize. I LOVE this job. Smiley Happy

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Valued Guide
Posts: 854

Re: WHERE Statement Efficiency

Without changing the syntax you should think of using a macro variable anytime you type something over again like this:

%let list = ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359");

proc sql;

create table ABC as

select t1.age,

       t1.gender,

       t1.diagnosis1,

       t1.diagnosis2,

       t1.diagnosis3,

       t1.diagnosis4

from work.testfile t1

    where t1.diagnosis1 in &list

       or t1.diagnosis2 in &list

       or t1.diagnosis3 in &list

or t1.diagnosis_oth4_cd in &list;

I would probably use a proc tranpose but I'm sure there are a lot of ways.

Grand Advisor
Posts: 10,211

Re: WHERE Statement Efficiency

Define your need for efficiency: code brevity, program run time, memory/disk utilization, something else.

Contributor
Posts: 42

Re: WHERE Statement Efficiency

Code brevity or a better way of writing the code.

Trusted Advisor
Posts: 1,213

Re: WHERE Statement Efficiency

Note that one way to simplify your code would be to restructure (normalize) your data.  Having a wide dataset with variables like DIGNOSIS1-DIAGNOSIS4 often makes code more difficult to write and maintain than if data are normalized into a narrow format with just a single DIAGNOSIS variable and repeating rows.  Key point being that if you ever get a patient with a fifth diagnosis, you will not need to update all your code.

I spent a lot of time in my early years manipulating data with lots of arrays,  until I was lucky to have a boss point out that simplified data structures lead to simplified code.

Contributor
Posts: 42

Re: WHERE Statement Efficiency

Unfortunately, the data is as normalized as we can get it.  The nature of our database (all payers claims data) will not allow the structure you are suggesting.  I have a total of 12 diagnosis fields and 7 procedure code fields with a total of 219 columns. 

Trusted Advisor
Posts: 1,213

Re: WHERE Statement Efficiency

Even if the database that stores the data is not normalized (unfortunately), as a SAS programmer it's often helpful to normalize data for yourself before you do any processing.  Often for a project, or for a data source I will use across projects, I have a %reformatdata() macro which reads some source data in a format I don't like and reshapes it into a format I do like.  If the data are big, can do this as a nightly ETL job, or do it for just the subset I'm interested it.

Given  a table with variables: patient age gender claimdate diagnosis1-diagnosis12 procedure1-procedure7, I might reshape it into three tables.  Patient table with patient age gender.  Diagnosis table with Patient, ClaimDate, Diagnosis.  Procedure Table with Patient, ClaimDate, Procedure.

Once I have that done, answering most of the anticipated questions is easier.  Which patients have one of these diagnosis codes?  Which patients have both diagnosis A and diagnosis B?  Which patients had diagnosis X within one month following treatment with procedure Y.  What is the average number of diagnoses per patient?  etc.

Esteemed Advisor
Posts: 5,198

Re: WHERE Statement Efficiency

I agree with Quentin and others. A normalised structure is better in most ways. Easier to maintain and query.

If that's not an option consider storing the data in SPDE in which the where-planner can optimise OR conditions  (given that the columns are indexed).

Data never sleeps
Esteemed Advisor
Esteemed Advisor
Posts: 7,217

Re: WHERE Statement Efficiency

Just to add after seeing your post, I am not suggesting chaning your database, just your working copy which you are processing.  Having 219 columns will make your code more complicated, unless your using arrays - (i.e. SQL will just be long winded) - per Reeza's example.  

I would agree with Quentin here.  Normalize your data to get:

AGE GENDER    DIAG_NO          DIAG

xx     xxxx           1                       4330

xx     xxxx           2                       2345

...

Then have your list of data to find in another dataset - I personally don't like the use of macro variables to hold lots of data:

DIAG

4330

43301

...

Then its a simple lookup:

proc sql;

     create table WANT as

     select     AGE

                   ,GENDER

                   ,DIAG_NO,

                   ,DIAG

     where     DIAG in (select distinct DIAG from LOOKUP);

quit;

Ask a Question
Discussion stats
  • 11 replies
  • 498 views
  • 7 likes
  • 9 in conversation