I have two tables, table 1 has one record per patient, table 2 has multiple records per patient with dummy variables. I want to join the tables so that the dummy variables are merged with table 1.
Table 1.
PAT_ID | AGE | SEX |
93704 | 20 | F |
93705 | 26 | F |
93706 | 28 | M |
93707 | 27 | F |
93708 | 47 | M |
93709 | 38 | F |
93710 | 44 | F |
93711 | 38 | M |
93712 | 24 | F |
93713 | 40 | F |
93714 | 40 | F |
93715 | 25 | M |
93716 | 23 | M |
93717 | 48 | F |
93718 | 41 | F |
Table 2.
PAT_ID | DRUG | STATIN | BETA_BLOCKER | ACEI | DIURETIC | STEROID |
93705 | atorvastatin | 1 | 0 | 0 | 0 | 0 |
93705 | metoprolol | 0 | 1 | 0 | 0 | 0 |
93705 | lisinopril | 0 | 0 | 1 | 0 | 0 |
93709 | hydrochlorothiazide | 0 | 0 | 0 | 1 | 0 |
93709 | rosuvastatin | 1 | 0 | 0 | 0 | 0 |
93709 | prednisone | 0 | 0 | 0 | 0 | 1 |
93713 | pravastatin | 1 | 0 | 0 | 0 | 0 |
93715 | carvedilol | 0 | 1 | 0 | 0 | 0 |
93715 | prednisone | 0 | 0 | 0 | 0 | 1 |
93715 | atorvastatin | 1 | 0 | 0 | 0 | 0 |
93716 | spironolactone | 0 | 0 | 0 | 1 | 0 |
93717 | lisinopril | 0 | 0 | 1 | 0 | 0 |
93717 | propranolol | 0 | 1 | 0 | 0 | 0 |
93717 | hydrochlorothiazide | 0 | 0 | 0 | 1 | 0 |
93718 | lisinopril | 0 | 0 | 1 | 0 | 0 |
Want:
PAT_ID | AGE | SEX | STATIN | BETA_BLOCKER | ACEI | DIURETIC | STEROID |
93704 | 20 | F | 0 | 0 | 0 | 0 | 0 |
93705 | 26 | F | 1 | 1 | 1 | 0 | 0 |
93706 | 28 | M | 0 | 0 | 0 | 0 | 0 |
93707 | 27 | F | 0 | 0 | 0 | 0 | 0 |
93708 | 47 | M | 0 | 0 | 0 | 0 | 0 |
93709 | 38 | F | 1 | 0 | 0 | 1 | 1 |
93710 | 44 | F | 0 | 0 | 0 | 0 | 0 |
93711 | 38 | M | 0 | 0 | 0 | 0 | 0 |
93712 | 24 | F | 0 | 0 | 0 | 0 | 0 |
93713 | 40 | F | 1 | 0 | 0 | 0 | 0 |
93714 | 40 | F | 0 | 0 | 0 | 0 | 0 |
93715 | 25 | M | 1 | 1 | 0 | 0 | 1 |
93716 | 23 | M | 0 | 0 | 0 | 1 | 0 |
93717 | 48 | F | 0 | 1 | 1 | 1 | 0 |
93718 | 41 | F | 0 | 0 | 1 | 0 | 0 |
data table_1;
input PAT_ID AGE SEX $;
datalines;
93704 20 F
93705 26 F
93706 28 M
93707 27 F
93708 47 M
93709 38 F
93710 44 F
93711 38 M
93712 24 F
93713 40 F
93714 40 F
93715 25 M
93716 23 M
93717 48 F
93718 41 F
;
data table_2;
infile datalines dlm=',';
input PAT_ID DRUG : $19. STATIN BETA_BLOCKER ACEI DIURETIC STEROID;
datalines;
93705,atorvastatin,1,0,0,0,0
93705,metoprolol,0,1,0,0,0
93705,lisinopril,0,0,1,0,0
93709,hydrochlorothiazide,0,0,0,1,0
93709,rosuvastatin,1,0,0,0,0
93709,prednisone,0,0,0,0,1
93713,pravastatin,1,0,0,0,0
93715,carvedilol,0,1,0,0,0
93715,prednisone,0,0,0,0,1
93715,atorvastatin,1,0,0,0,0
93716,spironolactone,0,0,0,1,0
93717,lisinopril,0,0,1,0,0
93717,propranolol,0,1,0,0,0
93717,hydrochlorothiazide,0,0,0,1,0
93718,lisinopril,0,0,1,0,0
;
Using the SQL below should do the job.
data table_1;
input PAT_ID AGE SEX $;
datalines;
93704 20 F
93705 26 F
93706 28 M
93707 27 F
93708 47 M
93709 38 F
93710 44 F
93711 38 M
93712 24 F
93713 40 F
93714 40 F
93715 25 M
93716 23 M
93717 48 F
93718 41 F
;
data table_2;
infile datalines dlm=',';
input PAT_ID DRUG : $19. STATIN BETA_BLOCKER ACEI DIURETIC STEROID;
datalines;
93705,atorvastatin,1,0,0,0,0
93705,metoprolol,0,1,0,0,0
93705,lisinopril,0,0,1,0,0
93709,hydrochlorothiazide,0,0,0,1,0
93709,rosuvastatin,1,0,0,0,0
93709,prednisone,0,0,0,0,1
93713,pravastatin,1,0,0,0,0
93715,carvedilol,0,1,0,0,0
93715,prednisone,0,0,0,0,1
93715,atorvastatin,1,0,0,0,0
93716,spironolactone,0,0,0,1,0
93717,lisinopril,0,0,1,0,0
93717,propranolol,0,1,0,0,0
93717,hydrochlorothiazide,0,0,0,1,0
93718,lisinopril,0,0,1,0,0
;
proc sql;
create table want as
select
l.PAT_ID,
l.AGE,
l.SEX,
max(coalesce(r.STATIN,0)) as STATIN,
max(coalesce(r.BETA_BLOCKER,0)) as BETA_BLOCKER,
max(coalesce(r.ACEI,0)) as ACEI,
max(coalesce(r.DIURETIC,0)) as DIURETIC,
max(coalesce(r.STEROID,0)) as STEROID
from
table_1 l
left join
table_2 r
on
l.PAT_ID=r.PAT_ID
group by l.PAT_ID, l.AGE, l.SEX
;
quit;
Having one to many relationship between the two tables. you can use sas base, merge statement:
data want;
merge table1
table2(drop=drug);
by pat_id;
run;
That gets more than 1 row per patient though.
Using the SQL below should do the job.
data table_1;
input PAT_ID AGE SEX $;
datalines;
93704 20 F
93705 26 F
93706 28 M
93707 27 F
93708 47 M
93709 38 F
93710 44 F
93711 38 M
93712 24 F
93713 40 F
93714 40 F
93715 25 M
93716 23 M
93717 48 F
93718 41 F
;
data table_2;
infile datalines dlm=',';
input PAT_ID DRUG : $19. STATIN BETA_BLOCKER ACEI DIURETIC STEROID;
datalines;
93705,atorvastatin,1,0,0,0,0
93705,metoprolol,0,1,0,0,0
93705,lisinopril,0,0,1,0,0
93709,hydrochlorothiazide,0,0,0,1,0
93709,rosuvastatin,1,0,0,0,0
93709,prednisone,0,0,0,0,1
93713,pravastatin,1,0,0,0,0
93715,carvedilol,0,1,0,0,0
93715,prednisone,0,0,0,0,1
93715,atorvastatin,1,0,0,0,0
93716,spironolactone,0,0,0,1,0
93717,lisinopril,0,0,1,0,0
93717,propranolol,0,1,0,0,0
93717,hydrochlorothiazide,0,0,0,1,0
93718,lisinopril,0,0,1,0,0
;
proc sql;
create table want as
select
l.PAT_ID,
l.AGE,
l.SEX,
max(coalesce(r.STATIN,0)) as STATIN,
max(coalesce(r.BETA_BLOCKER,0)) as BETA_BLOCKER,
max(coalesce(r.ACEI,0)) as ACEI,
max(coalesce(r.DIURETIC,0)) as DIURETIC,
max(coalesce(r.STEROID,0)) as STEROID
from
table_1 l
left join
table_2 r
on
l.PAT_ID=r.PAT_ID
group by l.PAT_ID, l.AGE, l.SEX
;
quit;
@tarheel13 Thanks for answering, I noticed that your method does not provide the correct output if a patient is taking more than one medication in the same drug class, for example 2 diuretics, and then distinct values for DIURETIC would be .,0,1,2 instead of just .,0,1.
Your method works. Though lets say table_1 has 30 columns instead of 3, in which case I modified your code grouping only by PAT_ID and then used proc sort with the nodupkey option.
proc sql;
create table want as
select
l.*,
max(coalesce(r.STATIN,0)) as STATIN,
max(coalesce(r.BETA_BLOCKER,0)) as BETA_BLOCKER,
max(coalesce(r.ACEI,0)) as ACEI,
max(coalesce(r.DIURETIC,0)) as DIURETIC,
max(coalesce(r.STEROID,0)) as STEROID
from
table_1 l
left join
table_2 r
on
l.PAT_ID=r.PAT_ID
group by l.PAT_ID
;
quit;
PROC SORT DATA=want OUT=want_NODUP NODUPKEY;
BY PAT_ID;
RUN;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.