SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
kparker
Quartz | Level 8

Is there a more efficient way to write the following code?

proc sql;

create table ABC as

select t1.age,

       t1.gender,

       t1.diagnosis1,

       t1.diagnosis2,

       t1.diagnosis3,

       t1.diagnosis4
from work.testfile t1

where t1.diagnosis1 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",
"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359")

or t1.diagnosis2 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359")

or t1.diagnosis3 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359")

or t1.diagnosis4 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359");

quit;

11 REPLIES 11
Reeza
Super User

Perhaps put the codes in a format and then evaluate?

Or a data step with an array?

data want;

set have;

keep age diagnosis1-diagnosis4;

array diag(4) diagnosis1-diagnosis4;

keep=0;

do i=1 to dim(diag);

if diag(i) in

("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359") then keep=1;

end;


if keep=1;

run;

Astounding
PROC Star

Reeza, a small improvement:

do i=1 to dim(diag) until (keep=1);

That way, if DIAGNOSIS1 is a match, there's no need to check the other 3 variables.

Steelers_In_DC
Barite | Level 11

Without changing the syntax you should think of using a macro variable anytime you type something over again like this:

%let list = ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359");

proc sql;

create table ABC as

select t1.age,

       t1.gender,

       t1.diagnosis1,

       t1.diagnosis2,

       t1.diagnosis3,

       t1.diagnosis4

from work.testfile t1

    where t1.diagnosis1 in &list

       or t1.diagnosis2 in &list

       or t1.diagnosis3 in &list

or t1.diagnosis_oth4_cd in &list;

I would probably use a proc tranpose but I'm sure there are a lot of ways.

ballardw
Super User

Define your need for efficiency: code brevity, program run time, memory/disk utilization, something else.

kparker
Quartz | Level 8

Code brevity or a better way of writing the code.

Quentin
Super User

Note that one way to simplify your code would be to restructure (normalize) your data.  Having a wide dataset with variables like DIGNOSIS1-DIAGNOSIS4 often makes code more difficult to write and maintain than if data are normalized into a narrow format with just a single DIAGNOSIS variable and repeating rows.  Key point being that if you ever get a patient with a fifth diagnosis, you will not need to update all your code.

I spent a lot of time in my early years manipulating data with lots of arrays,  until I was lucky to have a boss point out that simplified data structures lead to simplified code.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
kparker
Quartz | Level 8

Unfortunately, the data is as normalized as we can get it.  The nature of our database (all payers claims data) will not allow the structure you are suggesting.  I have a total of 12 diagnosis fields and 7 procedure code fields with a total of 219 columns. 

Quentin
Super User

Even if the database that stores the data is not normalized (unfortunately), as a SAS programmer it's often helpful to normalize data for yourself before you do any processing.  Often for a project, or for a data source I will use across projects, I have a %reformatdata() macro which reads some source data in a format I don't like and reshapes it into a format I do like.  If the data are big, can do this as a nightly ETL job, or do it for just the subset I'm interested it.

Given  a table with variables: patient age gender claimdate diagnosis1-diagnosis12 procedure1-procedure7, I might reshape it into three tables.  Patient table with patient age gender.  Diagnosis table with Patient, ClaimDate, Diagnosis.  Procedure Table with Patient, ClaimDate, Procedure.

Once I have that done, answering most of the anticipated questions is easier.  Which patients have one of these diagnosis codes?  Which patients have both diagnosis A and diagnosis B?  Which patients had diagnosis X within one month following treatment with procedure Y.  What is the average number of diagnoses per patient?  etc.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
LinusH
Tourmaline | Level 20

I agree with Quentin and others. A normalised structure is better in most ways. Easier to maintain and query.

If that's not an option consider storing the data in SPDE in which the where-planner can optimise OR conditions  (given that the columns are indexed).

Data never sleeps
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Just to add after seeing your post, I am not suggesting chaning your database, just your working copy which you are processing.  Having 219 columns will make your code more complicated, unless your using arrays - (i.e. SQL will just be long winded) - per Reeza's example.  

I would agree with Quentin here.  Normalize your data to get:

AGE GENDER    DIAG_NO          DIAG

xx     xxxx           1                       4330

xx     xxxx           2                       2345

...

Then have your list of data to find in another dataset - I personally don't like the use of macro variables to hold lots of data:

DIAG

4330

43301

...

Then its a simple lookup:

proc sql;

     create table WANT as

     select     AGE

                   ,GENDER

                   ,DIAG_NO,

                   ,DIAG

     where     DIAG in (select distinct DIAG from LOOKUP);

quit;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2615 views
  • 7 likes
  • 9 in conversation