Hi,
I have two tables which has many-to-many relationships with this TEAM_1 and TEAM_2:
TEAM_1 | TEAM_2 | TEAM_C | J_DT | J_ID |
CINDY | BOB | JOB3 | 14-01-22 | J3 |
CINDY | BOB | JOB4 | 14-01-22 | J4 |
TEAM_1 | TEAM_2 | A_DT | A_ID | B_DT | B_ID | D_DT | D_ID |
CINDY | BOB | 02-01-22 | A2 | 03-01-22 | B1 | . | |
CINDY | BOB | 05-01-22 | A3 | . | . |
I want to show the after join they only return unique J_DT, J_ID as follow:
TEAM_1 | TEAM_2 | A_DT | A_ID | B_DT | B_ID | D_DT | D_ID | TEAM_C | J_DT | J_ID |
CINDY | BOB | 02-01-22 | A2 | 03-01-22 | B1 | . | JOB3 | 14-01-22 | J3 | |
CINDY | BOB | 05-01-22 | A3 | . | . | |||||
CINDY | BOB | 02-01-22 | A2 | 03-01-22 | B1 | . | JOB4 | 14-01-22 | J4 | |
CINDY | BOB | 05-01-22 | A3 | . | . |
I did the left join and come out wrong output like this:
Here is my working code
data have;
input TEAM_1 $ TEAM_2 $ EVENT_NM $ EVENT_DT :mmddyy. EVENT_ID $ ;
format event_dt yymmdd10.;
cards;
ALICE JIMMY A 1-1-2022 A1
CINDY BOB A 1-2-2022 A2
CINDY BOB B 1-3-2022 B1
CINDY BOB A 1-5-2022 A3
ELLA JACK B 1-8-2022 B4
ELLA JACK D 1-9-2022 D1
AMBER DANIEL E 1-10-2022 E1
KITTY ARRON F 1-11-2022 F1
;
data GOT;
input TEAM_1 $ TEAM_2 $ TEAM_C $ J_DT :mmddyy. J_ID $ ;
format J_DT yymmdd10.;
cards;
ALICE JIMMY JOB1 1-12-2022 J1
ALICE JIMMY JOB2 1-13-2022 J2
CINDY BOB JOB3 1-14-2022 J3
CINDY BOB JOB4 1-14-2022 J4
AMBER DANIEL JOB5 1-16-2022 J5
AMBER DANIEL JOB6 1-17-2022 J6
KITTY ARRON JOB7 1-18-2022 J7
;
data groups1;
set have;
by team_1 team_2 notsorted ;
group + (EVENT_NM <= lag(EVENT_NM) );
if first.team_2 then group=1;
run;
proc transpose data=groups1 out=wide1(DROP=_NAME_);
by team_1 team_2 group group notsorted;
id EVENT_NM ;
var EVENT_ID;
run;
proc print data=wide1;
run;
data groups2;
set have;
by team_1 team_2 notsorted ;
group + (EVENT_NM <= lag(EVENT_NM) );
if first.team_2 then group=1;
run;
proc transpose data=groups2 out=wide2(DROP=_NAME_);
by team_1 team_2 group group notsorted;
id EVENT_NM;
var EVENT_dt;
run;
proc print data=wide2;
run;
PROC SQL;
CREATE TABLE EVENT_DT AS
SELECT
TEAM_1,
TEAM_2,
GROUP,
A AS A_DT,
B AS B_DT,
/*C AS C_DT,*/
D AS D_DT
/*E AS E_DT,*/
/*F AS F_DT*/
FROM WIDE2;
QUIT;
PROC SQL;
CREATE TABLE EVENT_ID AS
SELECT
TEAM_1,
TEAM_2,
GROUP,
A AS A_ID,
B AS B_ID,
/*C AS C_ID,*/
D AS D_ID
/*E AS E_ID,*/
/*F AS F_ID*/
FROM WIDE1;
QUIT;
PROC SQL;
CREATE TABLE ID_DT AS
SELECT
A.TEAM_1,
A.TEAM_2,
B.A_DT,
A.A_ID,
B.B_DT,
A.B_ID,
/*B.C_DT,*/
/*A.C_ID,*/
B.D_DT,
A.D_ID
/*B.E_DT,*/
/*A.E_ID,*/
/**/
/*B.F_DT,*/
/*A.F_ID*/
FROM EVENT_ID A FULL OUTER JOIN EVENT_DT B
ON A.TEAM_1=B.TEAM_1 AND A.TEAM_2 = B.TEAM_2 and A.GROUP=B.GROUP;
QUIT;
PROC SQL;
CREATE TABLE ID_DT_JOB AS
SELECT DISTINCT * FROM ID_DT A
LEFT JOIN GOT B
ON A.TEAM_1=B.TEAM_1 AND A.TEAM_2 = B.TEAM_2
ORDER BY B.J_DT;
QUIT;
PROC SQL;
CREATE TABLE ID_DT_JOBt_DST AS
SELECT DISTINCT
A.TEAM_1,
A.TEAM_2,
B.TEAM_C
FROM ID_DT A
LEFT JOIN GOT B
ON A.TEAM_1=B.TEAM_1 AND A.TEAM_2 = B.TEAM_2;
QUIT;
How can I fix this?
Appriciate your help!
Hi @sarahzhou,
just to better understand what you are aming for:
If you
1. sorted your current output table by J_ID and A_ID
2. grouped by J_ID
3. set Team_C, J_DT and J_ID to missing, if it is not the first row of the group
Would you then have what you want?
Best
Markus
Hi @sarahzhou,
just to better understand what you are aming for:
If you
1. sorted your current output table by J_ID and A_ID
2. grouped by J_ID
3. set Team_C, J_DT and J_ID to missing, if it is not the first row of the group
Would you then have what you want?
Best
Markus
3. set Team_C, J_DT and J_ID to missing, if it is not the first row of the group
Would you then have what you want?
Yes, that can work. How can I set missing "if it is not the first row of the group"?
SQL;
CREATE TABLE ID_DT_JOB AS
SELECT DISTINCT * FROM ID_DT A
LEFT JOIN GOT B
ON A.TEAM_1=B.TEAM_1 AND A.TEAM_2 = B.TEAM_2
ORDER BY B.TEAM_C;
Hi @sarahzhou
after sorting the data, you might apply a datastep with the FIRST. variable (FIRST. and LAST. DATA Step Variables), something like:
data wanted_list;
set ID_DT_JOB;
by J_ID;
if not first.dept then
do
Team_C =””;
J_DT =”.”;
J_ID = “”;
end;
run;
(I haven't tested it, but it shows the principle).
Best
Markus
It worked! Thx @MarkusWeick
data wanted_list;
set ID_DT_JOB;
by J_ID;
if not first.dept then
do
Team_C =””;
J_DT =”.”;
J_ID = “”;
end;
run;
Caution to anyone copying code from this thread.
The following lines that appear with curly or "smart" quotes are likely not going to work in your SAS session editor for programming:
Team_C =””;
J_DT =”.”;
J_ID = “”;
end;
In the editor you should see:
Team_C = ""; J_DT = "."; J_ID = "";
or with simple ' type single quotes.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.