SAS Programming

kparker · Posted 08-11-2015 10:34 AM

Is there a more efficient way to write the following code?

proc sql;

create table ABC as

select t1.age,

t1.gender,

t1.diagnosis1,

t1.diagnosis2,

t1.diagnosis3,

t1.diagnosis4
from work.testfile t1

where t1.diagnosis1 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",
"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359")

or t1.diagnosis2 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359")

or t1.diagnosis3 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359")

or t1.diagnosis4 in ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359");

quit;

Reeza · Posted 08-11-2015 10:44 AM

Perhaps put the codes in a format and then evaluate?

Or a data step with an array?

data want;

set have;

keep age diagnosis1-diagnosis4;

array diag(4) diagnosis1-diagnosis4;

keep=0;

do i=1 to dim(diag);

if diag(i) in

("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359") then keep=1;

end;

if keep=1;

run;

Astounding · Posted 08-11-2015 06:45 PM

Reeza, a small improvement:

do i=1 to dim(diag) until (keep=1);

That way, if DIAGNOSIS1 is a match, there's no need to check the other 3 variables.

Kurt_Bremser · Posted 08-12-2015 03:16 AM

<g> One can always find something to optimize. I LOVE this job.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Steelers_In_DC · Posted 08-11-2015 10:47 AM

Without changing the syntax you should think of using a macro variable anytime you type something over again like this:

%let list = ("4330","43301","43310","43311","43320","43321","43330","43331","43380","43381","43390","43391","4340","43401",

"43410","43411","43490","43491","4350","4351","4352","4353","4358","4359");

proc sql;

create table ABC as

select t1.age,

t1.gender,

t1.diagnosis1,

t1.diagnosis2,

t1.diagnosis3,

t1.diagnosis4

from work.testfile t1

where t1.diagnosis1 in &list

or t1.diagnosis2 in &list

or t1.diagnosis3 in &list

or t1.diagnosis_oth4_cd in &list;

I would probably use a proc tranpose but I'm sure there are a lot of ways.

ballardw · Posted 08-11-2015 11:08 AM

Define your need for efficiency: code brevity, program run time, memory/disk utilization, something else.

kparker · Posted 08-11-2015 11:27 AM

Code brevity or a better way of writing the code.

Quentin · Posted 08-11-2015 12:05 PM

Note that one way to simplify your code would be to restructure (normalize) your data. Having a wide dataset with variables like DIGNOSIS1-DIAGNOSIS4 often makes code more difficult to write and maintain than if data are normalized into a narrow format with just a single DIAGNOSIS variable and repeating rows. Key point being that if you ever get a patient with a fifth diagnosis, you will not need to update all your code.

I spent a lot of time in my early years manipulating data with lots of arrays, until I was lucky to have a boss point out that simplified data structures lead to simplified code.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

kparker · Posted 08-11-2015 12:11 PM

Unfortunately, the data is as normalized as we can get it. The nature of our database (all payers claims data) will not allow the structure you are suggesting. I have a total of 12 diagnosis fields and 7 procedure code fields with a total of 219 columns.

Quentin · Posted 08-11-2015 12:48 PM

Even if the database that stores the data is not normalized (unfortunately), as a SAS programmer it's often helpful to normalize data for yourself before you do any processing. Often for a project, or for a data source I will use across projects, I have a %reformatdata() macro which reads some source data in a format I don't like and reshapes it into a format I do like. If the data are big, can do this as a nightly ETL job, or do it for just the subset I'm interested it.

Given a table with variables: patient age gender claimdate diagnosis1-diagnosis12 procedure1-procedure7, I might reshape it into three tables. Patient table with patient age gender. Diagnosis table with Patient, ClaimDate, Diagnosis. Procedure Table with Patient, ClaimDate, Procedure.

Once I have that done, answering most of the anticipated questions is easier. Which patients have one of these diagnosis codes? Which patients have both diagnosis A and diagnosis B? Which patients had diagnosis X within one month following treatment with procedure Y. What is the average number of diagnoses per patient? etc.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

LinusH · Posted 08-12-2015 04:18 AM

I agree with Quentin and others. A normalised structure is better in most ways. Easier to maintain and query.

If that's not an option consider storing the data in SPDE in which the where-planner can optimise OR conditions (given that the columns are indexed).

Data never sleeps

RW9 · Posted 08-11-2015 12:13 PM

Just to add after seeing your post, I am not suggesting chaning your database, just your working copy which you are processing. Having 219 columns will make your code more complicated, unless your using arrays - (i.e. SQL will just be long winded) - per Reeza's example.

I would agree with Quentin here. Normalize your data to get:

AGE GENDER DIAG_NO DIAG

xx xxxx 1 4330

xx xxxx 2 2345

...

Then have your list of data to find in another dataset - I personally don't like the use of macro variables to hold lots of data:

DIAG

4330

43301

...

Then its a simple lookup:

proc sql;

create table WANT as

select AGE

,GENDER

,DIAG_NO,

,DIAG

where DIAG in (select distinct DIAG from LOOKUP);

quit;

SAS Programming

WHERE Statement Efficiency

Re: WHERE Statement Efficiency

Re: WHERE Statement Efficiency

Re: WHERE Statement Efficiency

Re: WHERE Statement Efficiency

Re: WHERE Statement Efficiency

Re: WHERE Statement Efficiency

Re: WHERE Statement Efficiency

Re: WHERE Statement Efficiency

Re: WHERE Statement Efficiency

Re: WHERE Statement Efficiency

Re: WHERE Statement Efficiency

Energy forecasting with electrification and energy efficiency scenario...

Efficiently Migrating Data to UTF-8 Encoding

Making Rule Sets More Efficient with Lookup Tables

Efficiency Test for Estimators by Simulation

AI for efficient MRI diagnosis

Follow Us

What is...

SAS Programming

Register Today!

SAS Training: Just a Click Away

Follow Us

What is...