BookmarkSubscribeRSS Feed
kparker
Quartz | Level 8

Is there a more efficient way to write the following code?

proc sql;

create table ABC as

select t1.age,

       t1.gender,

       t1.diagnosis1,

       t1.diagnosis2,

       t1.diagnosis3,

       t1.diagnosis4
from work.testfile t1

where t1.diagnosis1 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",
"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359")

or t1.diagnosis2 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359")

or t1.diagnosis3 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359")

or t1.diagnosis4 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359");

quit;

11 REPLIES 11
Reeza
Super User

Perhaps put the codes in a format and then evaluate?

Or a data step with an array?

data want;

set have;

keep age diagnosis1-diagnosis4;

array diag(4) diagnosis1-diagnosis4;

keep=0;

do i=1 to dim(diag);

if diag(i) in

("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359") then keep=1;

end;


if keep=1;

run;

Astounding
PROC Star

Reeza, a small improvement:

do i=1 to dim(diag) until (keep=1);

That way, if DIAGNOSIS1 is a match, there's no need to check the other 3 variables.

Steelers_In_DC
Barite | Level 11

Without changing the syntax you should think of using a macro variable anytime you type something over again like this:

%let list = ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359");

proc sql;

create table ABC as

select t1.age,

       t1.gender,

       t1.diagnosis1,

       t1.diagnosis2,

       t1.diagnosis3,

       t1.diagnosis4

from work.testfile t1

    where t1.diagnosis1 in &list

       or t1.diagnosis2 in &list

       or t1.diagnosis3 in &list

or t1.diagnosis_oth4_cd in &list;

I would probably use a proc tranpose but I'm sure there are a lot of ways.

ballardw
Super User

Define your need for efficiency: code brevity, program run time, memory/disk utilization, something else.

kparker
Quartz | Level 8

Code brevity or a better way of writing the code.

Quentin
Super User

Note that one way to simplify your code would be to restructure (normalize) your data.  Having a wide dataset with variables like DIGNOSIS1-DIAGNOSIS4 often makes code more difficult to write and maintain than if data are normalized into a narrow format with just a single DIAGNOSIS variable and repeating rows.  Key point being that if you ever get a patient with a fifth diagnosis, you will not need to update all your code.

I spent a lot of time in my early years manipulating data with lots of arrays,  until I was lucky to have a boss point out that simplified data structures lead to simplified code.

kparker
Quartz | Level 8

Unfortunately, the data is as normalized as we can get it.  The nature of our database (all payers claims data) will not allow the structure you are suggesting.  I have a total of 12 diagnosis fields and 7 procedure code fields with a total of 219 columns. 

Quentin
Super User

Even if the database that stores the data is not normalized (unfortunately), as a SAS programmer it's often helpful to normalize data for yourself before you do any processing.  Often for a project, or for a data source I will use across projects, I have a %reformatdata() macro which reads some source data in a format I don't like and reshapes it into a format I do like.  If the data are big, can do this as a nightly ETL job, or do it for just the subset I'm interested it.

Given  a table with variables: patient age gender claimdate diagnosis1-diagnosis12 procedure1-procedure7, I might reshape it into three tables.  Patient table with patient age gender.  Diagnosis table with Patient, ClaimDate, Diagnosis.  Procedure Table with Patient, ClaimDate, Procedure.

Once I have that done, answering most of the anticipated questions is easier.  Which patients have one of these diagnosis codes?  Which patients have both diagnosis A and diagnosis B?  Which patients had diagnosis X within one month following treatment with procedure Y.  What is the average number of diagnoses per patient?  etc.

LinusH
Tourmaline | Level 20

I agree with Quentin and others. A normalised structure is better in most ways. Easier to maintain and query.

If that's not an option consider storing the data in SPDE in which the where-planner can optimise OR conditions  (given that the columns are indexed).

Data never sleeps
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Just to add after seeing your post, I am not suggesting chaning your database, just your working copy which you are processing.  Having 219 columns will make your code more complicated, unless your using arrays - (i.e. SQL will just be long winded) - per Reeza's example.  

I would agree with Quentin here.  Normalize your data to get:

AGE GENDER    DIAG_NO          DIAG

xx     xxxx           1                       4330

xx     xxxx           2                       2345

...

Then have your list of data to find in another dataset - I personally don't like the use of macro variables to hold lots of data:

DIAG

4330

43301

...

Then its a simple lookup:

proc sql;

     create table WANT as

     select     AGE

                   ,GENDER

                   ,DIAG_NO,

                   ,DIAG

     where     DIAG in (select distinct DIAG from LOOKUP);

quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3613 views
  • 7 likes
  • 9 in conversation