I have two questions...
1. Is there a way to simplify the repetitiveness of the PROC SQLs below?
Specifically, I used repeated proc sqls to
STEP A) Take out any inpatient claims with the DX codes 123, 12345 and compile them into "compiled" dataset, AND
STEP B) Take out all eligibility claims from eligibility files if their enrolid shows up in "compiled"
2. I think if a patient has enrolID 12345 in year 2011, and then loses eligibility, another patient may appear as enrolID 12345. How can I change the code below to make sure STEP B takes out eligibility claims that match on enrolID and a corresponding "year" variable (assuming both eligibility claims and inpatient claims has a common 'year' variable?)
Step A and step B in its current state below.
/*0.1.0 SETUP LIBS */
libname raw "N:\" ACCESS = READONLY;
libname temp "M:\";
/*0.1.1 GENERATE LIST OF PTS WITH CORRECT DX CODE*/
PROC SQL NOPRINT;
CREATE TABLE gg_def10 AS
SELECT *
FROM 'N:\ccaes103.sas7bdat'
WHERE (DX1 IN ('123', '12345'))
AND (AGE >= (18));
QUIT;
PROC SQL NOPRINT;
CREATE TABLE gg_def11 AS
SELECT *
FROM 'N:\ccaes113.sas7bdat'
WHERE (DX1 IN ('123', '12345'))
AND (AGE >= (18));
QUIT;
PROC SQL NOPRINT;
CREATE TABLE gg_def12 AS
SELECT *
FROM 'N:\ccaes122.sas7bdat'
WHERE (DX1 IN ('123', '12345'))
AND (AGE >= (18));
QUIT;
PROC SQL NOPRINT;
CREATE TABLE gg_def13 AS
SELECT *
FROM 'N:\ccaes130.sas7bdat'
WHERE (DX1 IN ('123', '12345'))
AND (AGE >= (18));
QUIT;
/*Stack the inpatient claims from 2010 - 2013Q3*/
data temp.compiled;
set gg_def10 gg_def11 gg_def12 gg_def13;
/*0.1.1 CLEAN ELIGIBILITY */
/*0.1.1.1 COLLAPSE ELIGIBILITY CLAIMS */
PROC SQL NOPRINT;
CREATE TABLE elig_10 AS
SELECT *
FROM 'N:\ccaet103.sas7bdat'
WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;
PROC SQL NOPRINT;
CREATE TABLE elig_11 AS
SELECT *
FROM 'N:\ccaet113.sas7bdat'
WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;
PROC SQL NOPRINT;
CREATE TABLE elig_12 AS
SELECT *
FROM 'N:\ccaet122.sas7bdat'
WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;
PROC SQL NOPRINT;
CREATE TABLE elig_13 AS
SELECT *
FROM 'N:\ccaet132.sas7bdat'
WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;
data temp.compiled_elig;
set elig_10 elig_11 elig_12 elig_13;
You don't need macro. Just read all of your datasets in a DATA step:
data temp.compiled;
set raw.ccaes103
raw.ccaes113
raw.ccaes122
raw.ccaes130;
where dx1 in ('123', '12345') and age >= 18;
run;
You don't need macro. Just read all of your datasets in a DATA step:
data temp.compiled;
set raw.ccaes103
raw.ccaes113
raw.ccaes122
raw.ccaes130;
where dx1 in ('123', '12345') and age >= 18;
run;
Thanks! And would that also work for the second half of the program above?
So instead of repeating below x4
PROC SQL NOPRINT;
CREATE TABLE elig_10 AS
SELECT *
FROM 'N:\ccaet103.sas7bdat'
WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;
.....
we would have
data temp.elig;
set raw.ccaet103
raw.ccaet113
raw.ccaet122
raw.ccaet130;
where enrolid in (select distinct enrolid from temp.compiled);
run;
SELECT DISTINCT ENROLID from temp.compiled is sql statements which does not run in data step
try this,
proc sql;
create table temp.compiled_elig as
select * from (
select * from 'N:\ccaet103.sas7bdat'
union all
select * from 'N:\ccaet113.sas7bdat'
union all
select * from 'N:\ccaet122.sas7bdat'
union all
select * from 'N:\ccaet132.sas7bdat'
)
WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled)
;
quit;
an other alternative to the whole code.
proc sql;
create table temp.compiled as
select * from 'N:\ccaet103.sas7bdat'
union all
select * from 'N:\ccaet113.sas7bdat'
union all
select * from 'N:\ccaet122.sas7bdat'
union all
select * from 'N:\ccaet132.sas7bdat'
;
create table temp.compiled_elig as
select * from temp.compiled
WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled
where (DX1 IN ('123', '12345')) AND (AGE >= (18))
)
;
quit;
When I use the first part of your code, it says
ERROR: Column 12 from the first contributor of UNION ALL is not the same type as its
counterpart from the second.
ERROR: Column 13 from the first contributor of UNION ALL is not the same type as its
counterpart from the second.
ERROR: Column 17 from the first contributor of UNION ALL is not the same type as its
counterpart from the second.
ERROR: Column 26 from the first contributor of UNION ALL is not the same type as its
counterpart from the second.
ERROR: Column 13 from the first contributor of UNION ALL is not the same type as its
counterpart from the second.
ERROR: Column 26 from the first contributor of UNION ALL is not the same type as its
counterpart from the second.
ERROR: Column 13 from the first contributor of UNION ALL is not the same type as its
counterpart from the second.
ERROR: Column 26 from the first contributor of UNION ALL is not the same type as its
counterpart from the second.
294 quit;
Any idea what this means? 😕
Specifically, this is what I ran:
proc sql;
create table temp.compiled as
select * from (
select * from raw.ccaes103 union all
select * from raw.ccaes113 union all
select * from raw.ccaes122 union all
select * from raw.ccaes132)
where dx1 in ('123', '12345') and age >= 18;
quit;
one way to simplify your code
%macro data_processing(dsn=,file=);
PROC SQL NOPRINT;
CREATE TABLE &dsn. AS
SELECT *
FROM 'N:\&file..sas7bdat'
WHERE (DX1 IN ('123', '12345'))
AND (AGE >= (18));
QUIT;
%mend;
%macro data_processing1(dsn=,file=);
PROC SQL NOPRINT;
CREATE TABLE &dsn. AS
SELECT *
FROM 'N:\&file..sas7bdat'
WHERE ENROLID IN (SELECT DISTINCT ENROLID from temp.compiled);
QUIT;
%mend;
%data processing(dsn=gg_def10,file=ccaes103);
%data processing(dsn=gg_def11,file=ccaes113);
%data processing(dsn=gg_def12,file=ccaes122);
%data processing(dsn=gg_def13,file=ccaes130);
data temp.compiled;
set gg_def10 gg_def11 gg_def12 gg_def13;
%data processing1(dsn=elig_10,file=ccaes103);
%data processing1(dsn=elig_11,file=ccaes113);
%data processing1(dsn=elig_12,file=ccaes122);
%data processing1(dsn=elig_13,file=ccaes130);
data temp.compiled_elig;
set elig_10 elig_11 elig_12 elig_13;
I hope that next code will be the most efficient:
Step 1 is identical to @SASKiwi's code:
data temp.compiled enrollids(keep=enrollid);
set raw.ccaes103
raw.ccaes113
raw.ccaes122
raw.ccaes130;
where dx1 in ('123', '12345') and age >= 18;
run;
Step 2 creates distinct values of ENROLLID:
proc sort data=enrollids out=enrollids NODUPKEY;
by enrollid;
run;
Last step:
data temp.compiled_elig;
merge raw.ccaet103 (in=in103)
raw.ccaet113 (in=in113)
raw.ccaet122 (in=in122)
raw.ccaet132 (in=in132)
enrollids (in=inids);
by enrollid; /* assumes all inputs are sorted by EnrollID */
if inids and (in103 or in113 or in122 or in132);
run;
Question 1: Could I make the code more "user friendly" by allowing a modifiable inputs at the top? e.g.
%let inputs = ccaes103 ccaes 113 ccaes122 ccaes130
and then somehow incorporate that to the code later?
I can see how we may incorporate this on the dx and age side, e.g. instead of
data temp.compiled enrollids(keep=enrollid);
set raw.ccaes103
raw.ccaes113
raw.ccaes122
raw.ccaes130;
where dx1 in ('123', '12345') and age >= 18;
run;
we have something like
%let dx1 = 123, 12345;
%let age_cutoff = 18;
data temp.compiled enrollids(keep=enrollid);
set raw.ccaes103
raw.ccaes113
raw.ccaes122
raw.ccaes130;
where dx1 in &dx1. and age >= &age_cutoff.;
run;
Question 2: isn't data merge adding all the datasets horizontally? Since these are datasets from different years, should we instead be stacking vertically?
Question 1) To be honest, I have seen many implementations of macro lists and such like in over 15years, and there is one consistent thing I have found, they never make code more user friendly or modifiable. First there is rarely any documentation to adequately explain the code or parameters, and secondly due to that, those paramter lists can cause fails and bugs of all kinds in the code. For instance a simple step where the name in the list doesn't exist, will you check for this before your code, what about not containing the expected variables, or the right type, correct values etc. The list goes on.
Question 2) Data merging (joining) is the means of putting data across the page, based on joining variables. Setting data is the method of setting data one under the other.
Finally if you want to make your code more robust, put your data into a set library, then you can apply a list to get all of them without knowing up front:
data temp.compiled enrollids(keep=enrollid);
set raw.ccaes:;
where dx1 in ('123', '12345') and age >= 18;
run;
This will take all datasets in raw with ccaes prefix.
i do have to ask, where did you learn to code like this. I have never in 15 plus years seen anyone use:
FROM 'N:\ccaes103.sas7bdat'
You create a libname called raw which points to the datasets and never use them?
from RAW.CCAES103Is how it should look. Anyways @Shmuel has the correct answer here, concatenate your data before processing removes the repetitiveness in this example and most macro coding examples I have seen. 1 dataset is almost always preferable to multiple ones.
Thank you! Yea, my coding is horrible :(, but thank you for helping me learn a bit more!
@RW9 - that style of referencing SAS datasets means you avoid assigning a LIBNAME entirely. I've used it occasionally if I'm just wanting to read a single dataset from a folder that isn't used for anything else. Funny I didn't learn about it until fairly recently myself.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.