BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cdubs
Quartz | Level 8

I have two questions...

 

1. Is there a way to simplify the repetitiveness of the PROC SQLs below?

 

Specifically, I used repeated proc sqls to

 

STEP A) Take out any inpatient claims with the DX codes 123, 12345 and compile them into "compiled" dataset, AND

STEP B) Take out all eligibility claims from eligibility files if their enrolid shows up in "compiled"

 

2. I think if a patient has enrolID 12345 in year 2011, and then loses eligibility, another patient may appear as enrolID 12345. How can I change the code below to make sure  STEP B takes out eligibility claims that match on enrolID and a corresponding "year" variable (assuming both eligibility claims and inpatient claims has a common 'year' variable?)

 

Step A and step B in its current state below.

 

/*0.1.0 SETUP LIBS */

libname raw "N:\" ACCESS = READONLY;
libname temp "M:\";

/*0.1.1 GENERATE LIST OF PTS WITH CORRECT DX CODE*/

PROC SQL NOPRINT;
	CREATE TABLE gg_def10 AS
	SELECT *
	FROM 'N:\ccaes103.sas7bdat'
	WHERE (DX1 IN ('123', '12345')) 
	AND (AGE >= (18));
QUIT;

PROC SQL NOPRINT;
	CREATE TABLE gg_def11 AS
	SELECT *
	FROM 'N:\ccaes113.sas7bdat'
	WHERE (DX1 IN ('123', '12345')) 
	AND (AGE >= (18));
QUIT;

PROC SQL NOPRINT;
	CREATE TABLE gg_def12 AS
	SELECT *
	FROM 'N:\ccaes122.sas7bdat'
	WHERE (DX1 IN ('123', '12345')) 
	AND (AGE >= (18));
QUIT;

PROC SQL NOPRINT;
	CREATE TABLE gg_def13 AS
	SELECT *
	FROM 'N:\ccaes130.sas7bdat'
	WHERE (DX1 IN ('123', '12345')) 
	AND (AGE >= (18));
QUIT;

/*Stack the inpatient claims from 2010 - 2013Q3*/

data temp.compiled;
	set gg_def10 gg_def11 gg_def12 gg_def13;

/*0.1.1 CLEAN ELIGIBILITY */

	/*0.1.1.1 COLLAPSE ELIGIBILITY CLAIMS */ 
	
PROC SQL NOPRINT;
	CREATE TABLE elig_10 AS
	SELECT *
	FROM 'N:\ccaet103.sas7bdat'
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;

PROC SQL NOPRINT;
	CREATE TABLE elig_11 AS
	SELECT *
	FROM 'N:\ccaet113.sas7bdat'
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;

PROC SQL NOPRINT;
	CREATE TABLE elig_12 AS
	SELECT *
	FROM 'N:\ccaet122.sas7bdat'
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;

PROC SQL NOPRINT;
	CREATE TABLE elig_13 AS
	SELECT *
	FROM 'N:\ccaet132.sas7bdat'
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;

data temp.compiled_elig;
	set elig_10 elig_11 elig_12 elig_13;
1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

You don't need macro. Just read all of your datasets in a DATA step:

 

data temp.compiled;
  set raw.ccaes103
        raw.ccaes113
        raw.ccaes122
        raw.ccaes130;
  where dx1 in ('123', '12345') and age >= 18;
run;

View solution in original post

14 REPLIES 14
SASKiwi
PROC Star

You don't need macro. Just read all of your datasets in a DATA step:

 

data temp.compiled;
  set raw.ccaes103
        raw.ccaes113
        raw.ccaes122
        raw.ccaes130;
  where dx1 in ('123', '12345') and age >= 18;
run;
cdubs
Quartz | Level 8

Thanks! And would that also work for the second half of the program above?

 

So instead of repeating below x4  

 

PROC SQL NOPRINT;
	CREATE TABLE elig_10 AS
	SELECT *
	FROM 'N:\ccaet103.sas7bdat'
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;

.....

 

 we would have 

 

data temp.elig;
  set raw.ccaet103
        raw.ccaet113
        raw.ccaet122
        raw.ccaet130;
  where enrolid in (select distinct enrolid from temp.compiled);
run;

 

RM6
Obsidian | Level 7 RM6
Obsidian | Level 7

SELECT DISTINCT ENROLID from temp.compiled is  sql statements which does not run in data step

cdubs
Quartz | Level 8
I see ... in that case is there a way to make the bottom series of proc SQLs less repetitive? :0
RM6
Obsidian | Level 7 RM6
Obsidian | Level 7

try this,

	proc sql;
	create table temp.compiled_elig as
	select * from (

		select * from 'N:\ccaet103.sas7bdat'
		union all
		select * from 'N:\ccaet113.sas7bdat'
		union all
		select * from 'N:\ccaet122.sas7bdat'
		union all
		select * from 'N:\ccaet132.sas7bdat'
	)
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled)
	;
	quit;
RM6
Obsidian | Level 7 RM6
Obsidian | Level 7

an other alternative to the whole code.


	proc sql;
	create table temp.compiled as
	select * from 'N:\ccaet103.sas7bdat'
	union all
	select * from 'N:\ccaet113.sas7bdat'
	union all
	select * from 'N:\ccaet122.sas7bdat'
	union all
	select * from 'N:\ccaet132.sas7bdat'
	;


	create table temp.compiled_elig as
	select * from temp.compiled
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled
					 where (DX1 IN ('123', '12345')) AND (AGE >= (18))
					  )
	;
	quit;
cdubs
Quartz | Level 8

When I use the first part of your code, it says

 

ERROR: Column 12 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 13 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 17 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 26 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 13 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 26 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 13 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
ERROR: Column 26 from the first contributor of UNION ALL is not the same type as its
       counterpart from the second.
294          quit;

Any idea what this means? 😕

 

 

Specifically, this is what I ran:

 

		proc sql;
			create table temp.compiled as
			select * from (
				select * from raw.ccaes103 union all
				select * from raw.ccaes113 union all
				select * from raw.ccaes122 union all
				select * from raw.ccaes132)
			where dx1 in ('123', '12345') and age >= 18;
		quit;
RM6
Obsidian | Level 7 RM6
Obsidian | Level 7

one way to simplify your code


%macro data_processing(dsn=,file=);
PROC SQL NOPRINT;
	CREATE TABLE &dsn. AS
	SELECT *
	FROM 'N:\&file..sas7bdat'
	WHERE (DX1 IN ('123', '12345')) 
	AND (AGE >= (18));
QUIT;
%mend;


%macro data_processing1(dsn=,file=);
PROC SQL NOPRINT;
	CREATE TABLE &dsn. AS
	SELECT *
	FROM 'N:\&file..sas7bdat'
	WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;
%mend;

%data processing(dsn=gg_def10,file=ccaes103);
%data processing(dsn=gg_def11,file=ccaes113);
%data processing(dsn=gg_def12,file=ccaes122);
%data processing(dsn=gg_def13,file=ccaes130);


data temp.compiled;
	set gg_def10 gg_def11 gg_def12 gg_def13;


%data processing1(dsn=elig_10,file=ccaes103);
%data processing1(dsn=elig_11,file=ccaes113);
%data processing1(dsn=elig_12,file=ccaes122);
%data processing1(dsn=elig_13,file=ccaes130);


data temp.compiled_elig;
	set elig_10 elig_11 elig_12 elig_13;
Shmuel
Garnet | Level 18

I hope that next code will be the most efficient:

 

Step 1 is identical to @SASKiwi's code:

data temp.compiled  enrollids(keep=enrollid);
  set raw.ccaes103
        raw.ccaes113
        raw.ccaes122
        raw.ccaes130;
  where dx1 in ('123', '12345') and age >= 18;
run;

Step 2 creates distinct values of ENROLLID:

proc sort data=enrollids out=enrollids NODUPKEY;
by enrollid; run;

Last step:

data temp.compiled_elig;
  merge raw.ccaet103 (in=in103)
        raw.ccaet113 (in=in113)
        raw.ccaet122 (in=in122)
        raw.ccaet132 (in=in132)
        enrollids    (in=inids);
  by enrollid;   /* assumes all inputs are sorted by EnrollID */
       if inids and (in103 or in113 or in122 or in132);
run;
cdubs
Quartz | Level 8

Question 1: Could I make the code more "user friendly" by allowing a modifiable inputs at the top? e.g. 

 

%let inputs = ccaes103 ccaes 113 ccaes122 ccaes130

and then somehow incorporate that to the code later? 

 

I can see how we may incorporate this on the dx and age side, e.g. instead of 

 

data temp.compiled  enrollids(keep=enrollid);
  set raw.ccaes103
        raw.ccaes113
        raw.ccaes122
        raw.ccaes130;
  where dx1 in ('123', '12345') and age >= 18;
run;

we have something like 

 

%let dx1 = 123, 12345;
%let age_cutoff = 18;

data temp.compiled  enrollids(keep=enrollid);
  set raw.ccaes103
        raw.ccaes113
        raw.ccaes122
        raw.ccaes130;
  where dx1 in &dx1. and age >= &age_cutoff.;
run;

 

Question 2: isn't data merge adding all the datasets horizontally? Since these are datasets from different years, should we instead be stacking vertically?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Question 1) To be honest, I have seen many implementations of macro lists and such like in over 15years, and there is one consistent thing I have found, they never make code more user friendly or modifiable.  First there is rarely any documentation to adequately explain the code or parameters, and secondly due to that, those paramter lists can cause fails and bugs of all kinds in the code.  For instance a simple step where the name in the list doesn't exist, will you check for this before your code, what about not containing the expected variables, or the right type, correct values etc.  The list goes on.

 

Question 2) Data merging (joining) is the means of putting data across the page, based on joining variables.  Setting data is the method of setting data one under the other.

 

Finally if you want to make your code more robust, put your data into a set library, then you can apply a list to get all of them without knowing up front:

data temp.compiled  enrollids(keep=enrollid);
  set raw.ccaes:;
  where dx1 in ('123', '12345') and age >= 18;
run;

This will take all datasets in raw with ccaes prefix. 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

i do have to ask, where did you learn to code like this.  I have never in 15 plus years seen anyone use:

	FROM 'N:\ccaes103.sas7bdat'

You create a libname called raw which points to the datasets and never use them?

  from RAW.CCAES103
Is how it should look.  Anyways @Shmuel has the correct answer here, concatenate your data before processing removes the repetitiveness in this example and most macro coding examples I have seen.  1 dataset is almost always preferable to multiple ones.

 

cdubs
Quartz | Level 8

Thank you! Yea, my coding is horrible :(, but thank you for helping me learn a bit more! 

SASKiwi
PROC Star

@RW9 - that style of referencing SAS datasets means you avoid assigning a LIBNAME entirely. I've used it occasionally if I'm just wanting to read a single dataset from a folder that isn't used for anything else. Funny I didn't learn about it until fairly recently myself. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 4378 views
  • 12 likes
  • 5 in conversation