Solved: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce rep...

cdubs · Posted 12-13-2017 01:15 AM

I have two questions...

1. Is there a way to simplify the repetitiveness of the PROC SQLs below?

Specifically, I used repeated proc sqls to

STEP A) Take out any inpatient claims with the DX codes 123, 12345 and compile them into "compiled" dataset, AND

STEP B) Take out all eligibility claims from eligibility files if their enrolid shows up in "compiled"

2. I think if a patient has enrolID 12345 in year 2011, and then loses eligibility, another patient may appear as enrolID 12345. How can I change the code below to make sure STEP B takes out eligibility claims that match on enrolID and a corresponding "year" variable (assuming both eligibility claims and inpatient claims has a common 'year' variable?)

Step A and step B in its current state below.

/*0.1.0 SETUP LIBS */

libname raw "N:\" ACCESS = READONLY;
libname temp "M:\";

/*0.1.1 GENERATE LIST OF PTS WITH CORRECT DX CODE*/

PROC SQL NOPRINT;
	CREATE TABLE gg_def10 AS
	SELECT *
	FROM 'N:\ccaes103.sas7bdat'
	WHERE (DX1 IN ('123', '12345')) 
	AND (AGE >= (18));
QUIT;

PROC SQL NOPRINT;
	CREATE TABLE gg_def11 AS
	SELECT *
	FROM 'N:\ccaes113.sas7bdat'
	WHERE (DX1 IN ('123', '12345')) 
	AND (AGE >= (18));
QUIT;

PROC SQL NOPRINT;
	CREATE TABLE gg_def12 AS
	SELECT *
	FROM 'N:\ccaes122.sas7bdat'
	WHERE (DX1 IN ('123', '12345')) 
	AND (AGE >= (18));
QUIT;

PROC SQL NOPRINT;
	CREATE TABLE gg_def13 AS
	SELECT *
	FROM 'N:\ccaes130.sas7bdat'
	WHERE (DX1 IN ('123', '12345')) 
	AND (AGE >= (18));
QUIT;

/*Stack the inpatient claims from 2010 - 2013Q3*/

data temp.compiled;
	set gg_def10 gg_def11 gg_def12 gg_def13;

/*0.1.1 CLEAN ELIGIBILITY */

	/*0.1.1.1 COLLAPSE ELIGIBILITY CLAIMS */ 
	
PROC SQL NOPRINT;
	CREATE TABLE elig_10 AS
	SELECT *
	FROM 'N:\ccaet103.sas7bdat'
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;

PROC SQL NOPRINT;
	CREATE TABLE elig_11 AS
	SELECT *
	FROM 'N:\ccaet113.sas7bdat'
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;

PROC SQL NOPRINT;
	CREATE TABLE elig_12 AS
	SELECT *
	FROM 'N:\ccaet122.sas7bdat'
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;

PROC SQL NOPRINT;
	CREATE TABLE elig_13 AS
	SELECT *
	FROM 'N:\ccaet132.sas7bdat'
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;

data temp.compiled_elig;
	set elig_10 elig_11 elig_12 elig_13;

SASKiwi · Posted 12-13-2017 01:24 AM

You don't need macro. Just read all of your datasets in a DATA step:

data temp.compiled;
  set raw.ccaes103
        raw.ccaes113
        raw.ccaes122
        raw.ccaes130;
  where dx1 in ('123', '12345') and age >= 18;
run;

View solution in original post

SASKiwi · Posted 12-13-2017 01:24 AM

You don't need macro. Just read all of your datasets in a DATA step:

data temp.compiled;
  set raw.ccaes103
        raw.ccaes113
        raw.ccaes122
        raw.ccaes130;
  where dx1 in ('123', '12345') and age >= 18;
run;

cdubs · Posted 12-13-2017 02:16 AM

Thanks! And would that also work for the second half of the program above?

So instead of repeating below x4

PROC SQL NOPRINT;
	CREATE TABLE elig_10 AS
	SELECT *
	FROM 'N:\ccaet103.sas7bdat'
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;

.....

we would have

data temp.elig;
  set raw.ccaet103
        raw.ccaet113
        raw.ccaet122
        raw.ccaet130;
  where enrolid in (select distinct enrolid from temp.compiled);
run;

RM6 · Posted 12-13-2017 02:25 AM

SELECT DISTINCT ENROLID from temp.compiled is sql statements which does not run in data step

cdubs · Posted 12-13-2017 02:30 AM

I see ... in that case is there a way to make the bottom series of proc SQLs less repetitive? :0

RM6 · Posted 12-13-2017 03:14 AM

try this,

	proc sql;
	create table temp.compiled_elig as
	select * from (

		select * from 'N:\ccaet103.sas7bdat'
		union all
		select * from 'N:\ccaet113.sas7bdat'
		union all
		select * from 'N:\ccaet122.sas7bdat'
		union all
		select * from 'N:\ccaet132.sas7bdat'
	)
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled)
	;
	quit;

RM6 · Posted 12-13-2017 03:25 AM

an other alternative to the whole code.


	proc sql;
	create table temp.compiled as
	select * from 'N:\ccaet103.sas7bdat'
	union all
	select * from 'N:\ccaet113.sas7bdat'
	union all
	select * from 'N:\ccaet122.sas7bdat'
	union all
	select * from 'N:\ccaet132.sas7bdat'
	;


	create table temp.compiled_elig as
	select * from temp.compiled
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled
					 where (DX1 IN ('123', '12345')) AND (AGE >= (18))
					  )
	;
	quit;

cdubs · Posted 12-14-2017 02:33 PM

When I use the first part of your code, it says

ERROR: Column 12 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 13 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 17 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 26 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 13 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 26 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 13 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 26 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
294          quit;

Any idea what this means? 😕

Specifically, this is what I ran:

		proc sql;
			create table temp.compiled as
			select * from (
				select * from raw.ccaes103 union all
				select * from raw.ccaes113 union all
				select * from raw.ccaes122 union all
				select * from raw.ccaes132)
			where dx1 in ('123', '12345') and age >= 18;
		quit;

RM6 · Posted 12-13-2017 01:33 AM

one way to simplify your code


%macro data_processing(dsn=,file=);
PROC SQL NOPRINT;
	CREATE TABLE &dsn. AS
	SELECT *
	FROM 'N:\&file..sas7bdat'
	WHERE (DX1 IN ('123', '12345')) 
	AND (AGE >= (18));
QUIT;
%mend;


%macro data_processing1(dsn=,file=);
PROC SQL NOPRINT;
	CREATE TABLE &dsn. AS
	SELECT *
	FROM 'N:\&file..sas7bdat'
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;
%mend;

%data processing(dsn=gg_def10,file=ccaes103);
%data processing(dsn=gg_def11,file=ccaes113);
%data processing(dsn=gg_def12,file=ccaes122);
%data processing(dsn=gg_def13,file=ccaes130);


data temp.compiled;
	set gg_def10 gg_def11 gg_def12 gg_def13;


%data processing1(dsn=elig_10,file=ccaes103);
%data processing1(dsn=elig_11,file=ccaes113);
%data processing1(dsn=elig_12,file=ccaes122);
%data processing1(dsn=elig_13,file=ccaes130);


data temp.compiled_elig;
	set elig_10 elig_11 elig_12 elig_13;

Shmuel · Posted 12-13-2017 03:43 AM

I hope that next code will be the most efficient:

Step 1 is identical to @SASKiwi's code:

data temp.compiled  enrollids(keep=enrollid);
  set raw.ccaes103
        raw.ccaes113
        raw.ccaes122
        raw.ccaes130;
  where dx1 in ('123', '12345') and age >= 18;
run;

Step 2 creates distinct values of ENROLLID:

proc sort data=enrollids out=enrollids NODUPKEY;
     by enrollid;
run;

Last step:

data temp.compiled_elig;
  merge raw.ccaet103 (in=in103)
        raw.ccaet113 (in=in113)
        raw.ccaet122 (in=in122)
        raw.ccaet132 (in=in132)
        enrollids    (in=inids);
  by enrollid;   /* assumes all inputs are sorted by EnrollID */
       if inids and (in103 or in113 or in122 or in132);
run;

cdubs · Posted 12-14-2017 01:53 PM

Question 1: Could I make the code more "user friendly" by allowing a modifiable inputs at the top? e.g.

%let inputs = ccaes103 ccaes 113 ccaes122 ccaes130

and then somehow incorporate that to the code later?

I can see how we may incorporate this on the dx and age side, e.g. instead of

data temp.compiled  enrollids(keep=enrollid);
  set raw.ccaes103
        raw.ccaes113
        raw.ccaes122
        raw.ccaes130;
  where dx1 in ('123', '12345') and age >= 18;
run;

we have something like

%let dx1 = 123, 12345;
%let age_cutoff = 18;

data temp.compiled  enrollids(keep=enrollid);
  set raw.ccaes103
        raw.ccaes113
        raw.ccaes122
        raw.ccaes130;
  where dx1 in &dx1. and age >= &age_cutoff.;
run;

Question 2: isn't data merge adding all the datasets horizontally? Since these are datasets from different years, should we instead be stacking vertically?

RW9 · Posted 12-15-2017 04:12 AM

Question 1) To be honest, I have seen many implementations of macro lists and such like in over 15years, and there is one consistent thing I have found, they never make code more user friendly or modifiable. First there is rarely any documentation to adequately explain the code or parameters, and secondly due to that, those paramter lists can cause fails and bugs of all kinds in the code. For instance a simple step where the name in the list doesn't exist, will you check for this before your code, what about not containing the expected variables, or the right type, correct values etc. The list goes on.

Question 2) Data merging (joining) is the means of putting data across the page, based on joining variables. Setting data is the method of setting data one under the other.

Finally if you want to make your code more robust, put your data into a set library, then you can apply a list to get all of them without knowing up front:

data temp.compiled  enrollids(keep=enrollid);
  set raw.ccaes:;
  where dx1 in ('123', '12345') and age >= 18;
run;

This will take all datasets in raw with ccaes prefix.

RW9 · Posted 12-13-2017 04:27 AM

i do have to ask, where did you learn to code like this. I have never in 15 plus years seen anyone use:

	FROM 'N:\ccaes103.sas7bdat'

You create a libname called raw which points to the datasets and never use them?

  from RAW.CCAES103

Is how it should look. Anyways @Shmuel has the correct answer here, concatenate your data before processing removes the repetitiveness in this example and most macro coding examples I have seen. 1 dataset is almost always preferable to multiple ones.

cdubs · Posted 12-13-2017 10:56 AM

Thank you! Yea, my coding is horrible :(, but thank you for helping me learn a bit more!

SASKiwi · Posted 12-13-2017 02:06 PM

@RW9 - that style of referencing SAS datasets means you avoid assigning a LIBNAME entirely. I've used it occasionally if I'm just wanting to read a single dataset from a folder that isn't used for anything else. Funny I didn't learn about it until fairly recently myself.

How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Re: How to wrap repetitive PROC SQLs into a macro? Or otherwise reduce repetitiveness of proc sqls?

Registration is open

Registration is open

SAS Training: Just a Click Away