Solved: Re: Return a uinque value after join

sarahzhou · Posted 06-12-2022 12:18 PM

Hi,

I have two tables which has many-to-many relationships with this TEAM_1 and TEAM_2:

TEAM_1	TEAM_2	TEAM_C	J_DT	J_ID
CINDY	BOB	JOB3	14-01-22	J3
CINDY	BOB	JOB4	14-01-22	J4

TEAM_1	TEAM_2	A_DT	A_ID	B_DT	B_ID	D_DT	D_ID
CINDY	BOB	02-01-22	A2	03-01-22	B1	.
CINDY	BOB	05-01-22	A3	.		.

I want to show the after join they only return unique J_DT, J_ID as follow:

TEAM_1	TEAM_2	A_DT	A_ID	B_DT	B_ID	D_DT	D_ID	TEAM_C	J_DT	J_ID
CINDY	BOB	02-01-22	A2	03-01-22	B1	.		JOB3	14-01-22	J3
CINDY	BOB	05-01-22	A3	.		.
CINDY	BOB	02-01-22	A2	03-01-22	B1	.		JOB4	14-01-22	J4
CINDY	BOB	05-01-22	A3	.		.

I did the left join and come out wrong output like this:

Here is my working code

data have;
  input TEAM_1 $ TEAM_2 $ EVENT_NM $ EVENT_DT :mmddyy. EVENT_ID $ ;
  format event_dt yymmdd10.;
cards;
ALICE  JIMMY  A  1-1-2022 A1
CINDY  BOB  A  1-2-2022 A2
CINDY  BOB  B  1-3-2022 B1
CINDY  BOB  A  1-5-2022 A3
ELLA  JACK  B  1-8-2022 B4
ELLA  JACK  D  1-9-2022 D1 
AMBER  DANIEL  E  1-10-2022 E1
KITTY  ARRON  F  1-11-2022 F1
;


data GOT;
  input TEAM_1 $ TEAM_2 $ TEAM_C $ J_DT :mmddyy. J_ID $ ;
  format J_DT yymmdd10.;
cards;
ALICE  JIMMY  JOB1  1-12-2022 J1
ALICE  JIMMY  JOB2  1-13-2022 J2
CINDY  BOB  JOB3  1-14-2022 J3
CINDY  BOB  JOB4  1-14-2022 J4
AMBER  DANIEL  JOB5  1-16-2022 J5
AMBER  DANIEL  JOB6  1-17-2022 J6
KITTY  ARRON  JOB7  1-18-2022 J7
;

data groups1;
  set have;
  by team_1 team_2 notsorted ;
  group + (EVENT_NM <= lag(EVENT_NM) );
  if first.team_2 then group=1;
run;
proc transpose data=groups1 out=wide1(DROP=_NAME_);
  by team_1 team_2 group group notsorted;
  id  EVENT_NM ;
  var EVENT_ID;
run;
proc print data=wide1;
run;

data groups2;
  set have;
  by team_1 team_2 notsorted ;
  group + (EVENT_NM <= lag(EVENT_NM) );
  if first.team_2 then group=1;
run;
proc transpose data=groups2 out=wide2(DROP=_NAME_);
  by team_1 team_2 group group notsorted;
  id  EVENT_NM;
  var EVENT_dt;
run;
proc print data=wide2;
run;

PROC SQL;
CREATE TABLE EVENT_DT AS
SELECT 
TEAM_1,
TEAM_2,
GROUP,
A AS A_DT,
B AS B_DT,
/*C AS C_DT,*/
D AS D_DT
/*E AS E_DT,*/
/*F AS F_DT*/
FROM WIDE2;
QUIT;

PROC SQL;
CREATE TABLE EVENT_ID AS
SELECT 
TEAM_1,
TEAM_2,
GROUP,
A AS A_ID,
B AS B_ID,
/*C AS C_ID,*/
D AS D_ID
/*E AS E_ID,*/
/*F AS F_ID*/
FROM WIDE1;
QUIT;
 
PROC SQL;
CREATE TABLE ID_DT AS
SELECT 
A.TEAM_1,
A.TEAM_2,

B.A_DT,
A.A_ID,

B.B_DT,
A.B_ID,

/*B.C_DT,*/
/*A.C_ID,*/

B.D_DT,
A.D_ID

/*B.E_DT,*/
/*A.E_ID,*/
/**/
/*B.F_DT,*/
/*A.F_ID*/
 

FROM EVENT_ID A FULL OUTER JOIN EVENT_DT B
ON A.TEAM_1=B.TEAM_1 AND A.TEAM_2 = B.TEAM_2 and A.GROUP=B.GROUP;
QUIT;

PROC SQL;
CREATE TABLE ID_DT_JOB AS
SELECT DISTINCT * FROM ID_DT A
LEFT JOIN GOT B
ON A.TEAM_1=B.TEAM_1 AND A.TEAM_2 = B.TEAM_2
ORDER BY B.J_DT;
QUIT;


PROC SQL;
CREATE TABLE ID_DT_JOBt_DST AS
SELECT DISTINCT
A.TEAM_1,
A.TEAM_2,
B.TEAM_C
FROM ID_DT A
LEFT JOIN GOT B
ON A.TEAM_1=B.TEAM_1 AND A.TEAM_2 = B.TEAM_2;
QUIT;

How can I fix this?

Appriciate your help!

MarkusWeick · Posted 06-13-2022 04:37 AM

Hi @sarahzhou,

just to better understand what you are aming for:

If you

1. sorted your current output table by J_ID and A_ID

2. grouped by J_ID

3. set Team_C, J_DT and J_ID to missing, if it is not the first row of the group

Would you then have what you want?

Best

Markus

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles

View solution in original post

MarkusWeick · Posted 06-13-2022 04:37 AM

Hi @sarahzhou,

just to better understand what you are aming for:

If you

1. sorted your current output table by J_ID and A_ID

2. grouped by J_ID

3. set Team_C, J_DT and J_ID to missing, if it is not the first row of the group

Would you then have what you want?

Best

Markus

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles

sarahzhou · Posted 06-13-2022 10:51 AM

3. set Team_C, J_DT and J_ID to missing, if it is not the first row of the group

Would you then have what you want?

Yes, that can work. How can I set missing "if it is not the first row of the group"?

SQL;
CREATE TABLE ID_DT_JOB AS
SELECT DISTINCT * FROM ID_DT A
LEFT JOIN GOT B
ON A.TEAM_1=B.TEAM_1 AND A.TEAM_2 = B.TEAM_2
ORDER BY B.TEAM_C;

MarkusWeick · Posted 06-13-2022 11:17 AM

Hi @sarahzhou

after sorting the data, you might apply a datastep with the FIRST. variable (FIRST. and LAST. DATA Step Variables), something like:

data wanted_list;

set ID_DT_JOB;

by J_ID;

if not first.dept then

do

Team_C =””;

J_DT =”.”;

J_ID = “”;

end;

run;

(I haven't tested it, but it shows the principle).

Best

Markus

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles

sarahzhou · Posted 06-13-2022 11:27 AM

It worked! Thx @MarkusWeick

data wanted_list;

set ID_DT_JOB;

by J_ID;

if not first.dept then

   do

       Team_C =””;

      J_DT =”.”;

      J_ID = “”;

   end;

run;

ballardw · Posted 06-13-2022 05:43 PM

Caution to anyone copying code from this thread.

The following lines that appear with curly or "smart" quotes are likely not going to work in your SAS session editor for programming:

       Team_C =””;

      J_DT =”.”;

      J_ID = “”;

   end;

In the editor you should see:

Team_C = "";
J_DT   = ".";
J_ID   = "";

or with simple ' type single quotes.

SAS Innovate 2025: Call for Content

Classroom Training Available!