I have two tables, table 1 has one record per patient, table 2 has multiple records per patient with dummy variables. I want to join the tables so that the dummy variables are merged with table 1.
Table 1.
PAT_ID | AGE | SEX |
93704 | 20 | F |
93705 | 26 | F |
93706 | 28 | M |
93707 | 27 | F |
93708 | 47 | M |
93709 | 38 | F |
93710 | 44 | F |
93711 | 38 | M |
93712 | 24 | F |
93713 | 40 | F |
93714 | 40 | F |
93715 | 25 | M |
93716 | 23 | M |
93717 | 48 | F |
93718 | 41 | F |
Table 2.
PAT_ID | DRUG | STATIN | BETA_BLOCKER | ACEI | DIURETIC | STEROID |
93705 | atorvastatin | 1 | 0 | 0 | 0 | 0 |
93705 | metoprolol | 0 | 1 | 0 | 0 | 0 |
93705 | lisinopril | 0 | 0 | 1 | 0 | 0 |
93709 | hydrochlorothiazide | 0 | 0 | 0 | 1 | 0 |
93709 | rosuvastatin | 1 | 0 | 0 | 0 | 0 |
93709 | prednisone | 0 | 0 | 0 | 0 | 1 |
93713 | pravastatin | 1 | 0 | 0 | 0 | 0 |
93715 | carvedilol | 0 | 1 | 0 | 0 | 0 |
93715 | prednisone | 0 | 0 | 0 | 0 | 1 |
93715 | atorvastatin | 1 | 0 | 0 | 0 | 0 |
93716 | spironolactone | 0 | 0 | 0 | 1 | 0 |
93717 | lisinopril | 0 | 0 | 1 | 0 | 0 |
93717 | propranolol | 0 | 1 | 0 | 0 | 0 |
93717 | hydrochlorothiazide | 0 | 0 | 0 | 1 | 0 |
93718 | lisinopril | 0 | 0 | 1 | 0 | 0 |
Want:
PAT_ID | AGE | SEX | STATIN | BETA_BLOCKER | ACEI | DIURETIC | STEROID |
93704 | 20 | F | 0 | 0 | 0 | 0 | 0 |
93705 | 26 | F | 1 | 1 | 1 | 0 | 0 |
93706 | 28 | M | 0 | 0 | 0 | 0 | 0 |
93707 | 27 | F | 0 | 0 | 0 | 0 | 0 |
93708 | 47 | M | 0 | 0 | 0 | 0 | 0 |
93709 | 38 | F | 1 | 0 | 0 | 1 | 1 |
93710 | 44 | F | 0 | 0 | 0 | 0 | 0 |
93711 | 38 | M | 0 | 0 | 0 | 0 | 0 |
93712 | 24 | F | 0 | 0 | 0 | 0 | 0 |
93713 | 40 | F | 1 | 0 | 0 | 0 | 0 |
93714 | 40 | F | 0 | 0 | 0 | 0 | 0 |
93715 | 25 | M | 1 | 1 | 0 | 0 | 1 |
93716 | 23 | M | 0 | 0 | 0 | 1 | 0 |
93717 | 48 | F | 0 | 1 | 1 | 1 | 0 |
93718 | 41 | F | 0 | 0 | 1 | 0 | 0 |
data table_1;
input PAT_ID AGE SEX $;
datalines;
93704 20 F
93705 26 F
93706 28 M
93707 27 F
93708 47 M
93709 38 F
93710 44 F
93711 38 M
93712 24 F
93713 40 F
93714 40 F
93715 25 M
93716 23 M
93717 48 F
93718 41 F
;
data table_2;
infile datalines dlm=',';
input PAT_ID DRUG : $19. STATIN BETA_BLOCKER ACEI DIURETIC STEROID;
datalines;
93705,atorvastatin,1,0,0,0,0
93705,metoprolol,0,1,0,0,0
93705,lisinopril,0,0,1,0,0
93709,hydrochlorothiazide,0,0,0,1,0
93709,rosuvastatin,1,0,0,0,0
93709,prednisone,0,0,0,0,1
93713,pravastatin,1,0,0,0,0
93715,carvedilol,0,1,0,0,0
93715,prednisone,0,0,0,0,1
93715,atorvastatin,1,0,0,0,0
93716,spironolactone,0,0,0,1,0
93717,lisinopril,0,0,1,0,0
93717,propranolol,0,1,0,0,0
93717,hydrochlorothiazide,0,0,0,1,0
93718,lisinopril,0,0,1,0,0
;
Using the SQL below should do the job.
data table_1;
input PAT_ID AGE SEX $;
datalines;
93704 20 F
93705 26 F
93706 28 M
93707 27 F
93708 47 M
93709 38 F
93710 44 F
93711 38 M
93712 24 F
93713 40 F
93714 40 F
93715 25 M
93716 23 M
93717 48 F
93718 41 F
;
data table_2;
infile datalines dlm=',';
input PAT_ID DRUG : $19. STATIN BETA_BLOCKER ACEI DIURETIC STEROID;
datalines;
93705,atorvastatin,1,0,0,0,0
93705,metoprolol,0,1,0,0,0
93705,lisinopril,0,0,1,0,0
93709,hydrochlorothiazide,0,0,0,1,0
93709,rosuvastatin,1,0,0,0,0
93709,prednisone,0,0,0,0,1
93713,pravastatin,1,0,0,0,0
93715,carvedilol,0,1,0,0,0
93715,prednisone,0,0,0,0,1
93715,atorvastatin,1,0,0,0,0
93716,spironolactone,0,0,0,1,0
93717,lisinopril,0,0,1,0,0
93717,propranolol,0,1,0,0,0
93717,hydrochlorothiazide,0,0,0,1,0
93718,lisinopril,0,0,1,0,0
;
proc sql;
create table want as
select
l.PAT_ID,
l.AGE,
l.SEX,
max(coalesce(r.STATIN,0)) as STATIN,
max(coalesce(r.BETA_BLOCKER,0)) as BETA_BLOCKER,
max(coalesce(r.ACEI,0)) as ACEI,
max(coalesce(r.DIURETIC,0)) as DIURETIC,
max(coalesce(r.STEROID,0)) as STEROID
from
table_1 l
left join
table_2 r
on
l.PAT_ID=r.PAT_ID
group by l.PAT_ID, l.AGE, l.SEX
;
quit;
Having one to many relationship between the two tables. you can use sas base, merge statement:
data want;
merge table1
table2(drop=drug);
by pat_id;
run;
That gets more than 1 row per patient though.
Using the SQL below should do the job.
data table_1;
input PAT_ID AGE SEX $;
datalines;
93704 20 F
93705 26 F
93706 28 M
93707 27 F
93708 47 M
93709 38 F
93710 44 F
93711 38 M
93712 24 F
93713 40 F
93714 40 F
93715 25 M
93716 23 M
93717 48 F
93718 41 F
;
data table_2;
infile datalines dlm=',';
input PAT_ID DRUG : $19. STATIN BETA_BLOCKER ACEI DIURETIC STEROID;
datalines;
93705,atorvastatin,1,0,0,0,0
93705,metoprolol,0,1,0,0,0
93705,lisinopril,0,0,1,0,0
93709,hydrochlorothiazide,0,0,0,1,0
93709,rosuvastatin,1,0,0,0,0
93709,prednisone,0,0,0,0,1
93713,pravastatin,1,0,0,0,0
93715,carvedilol,0,1,0,0,0
93715,prednisone,0,0,0,0,1
93715,atorvastatin,1,0,0,0,0
93716,spironolactone,0,0,0,1,0
93717,lisinopril,0,0,1,0,0
93717,propranolol,0,1,0,0,0
93717,hydrochlorothiazide,0,0,0,1,0
93718,lisinopril,0,0,1,0,0
;
proc sql;
create table want as
select
l.PAT_ID,
l.AGE,
l.SEX,
max(coalesce(r.STATIN,0)) as STATIN,
max(coalesce(r.BETA_BLOCKER,0)) as BETA_BLOCKER,
max(coalesce(r.ACEI,0)) as ACEI,
max(coalesce(r.DIURETIC,0)) as DIURETIC,
max(coalesce(r.STEROID,0)) as STEROID
from
table_1 l
left join
table_2 r
on
l.PAT_ID=r.PAT_ID
group by l.PAT_ID, l.AGE, l.SEX
;
quit;
@tarheel13 Thanks for answering, I noticed that your method does not provide the correct output if a patient is taking more than one medication in the same drug class, for example 2 diuretics, and then distinct values for DIURETIC would be .,0,1,2 instead of just .,0,1.
Your method works. Though lets say table_1 has 30 columns instead of 3, in which case I modified your code grouping only by PAT_ID and then used proc sort with the nodupkey option.
proc sql;
create table want as
select
l.*,
max(coalesce(r.STATIN,0)) as STATIN,
max(coalesce(r.BETA_BLOCKER,0)) as BETA_BLOCKER,
max(coalesce(r.ACEI,0)) as ACEI,
max(coalesce(r.DIURETIC,0)) as DIURETIC,
max(coalesce(r.STEROID,0)) as STEROID
from
table_1 l
left join
table_2 r
on
l.PAT_ID=r.PAT_ID
group by l.PAT_ID
;
quit;
PROC SORT DATA=want OUT=want_NODUP NODUPKEY;
BY PAT_ID;
RUN;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.