BookmarkSubscribeRSS Feed
jsphnwllms
Fluorite | Level 6

I need to compare Table1 and Table2 by ID. If Data1 to Data5 in Table2 is equal to Data in Table1, set the field to blank.

 

Example:

Table1 
IDData
ABCR5
ABCJ3
ABCE1
ABCC7
DEFD0
DEFZa
DEFZ1
DEFX4
GHIR2
GHIA7
GHIE5
GHIF3
GHIH4

 

Table2        
IDData1Data2Data3Data4Data5Var1Va2Var3
ABCR5MxC7C8J3   
ABCR5MxC7C8J3   
ABCR5MxC7C8J3   
ABCR5MxC7C8J3   
ABCR5MxC7C8J3   
ABCR5MxC7C8J3   
ABCR5MxC7C8J3   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
GHIR2E5Q4I9Jk   
GHIR2E5Q4I9Jk   
GHIR2E5Q4I9Jk   
GHIR2E5Q4I9Jk   
GHIR2E5Q4I9Jk   
GHIR2E5Q4I9Jk   
GHIR2E5Q4I9Jk   

 

Expected Output       
IDData1Data2Data3Data4Data5Var1Va2Var3
ABC Mx C8    
ABC Mx C8    
ABC Mx C8    
ABC Mx C8    
ABC Mx C8    
ABC Mx C8    
ABC Mx C8    
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
GHI  Q4I9Jk   
GHI  Q4I9Jk   
GHI  Q4I9Jk   
GHI  Q4I9Jk   
GHI  Q4I9Jk   
GHI  Q4I9Jk   
GHI  Q4I9Jk   

 

Thanks so much for your help!

4 REPLIES 4
FredrikE
Rhodochrosite | Level 12

From the first data set, create a new one with one row per ID and a new variable (ex: conc) containing the data value separated by ,

ex:

ID     conc

ABC R5,J3,E1,C7

DEF D0, Za, Z1, X4

...

 

Then merge that data set with the second one on ID.

For each data1-data5, check if the concatenated contains the value, and if so blank it.

if conc contains data1 then data1 = '';

if conc contains data2 then data2 = '';

...

 

Just as an example of how to solve it, it can of course be made nicer with macros and stuff, but try the simple way first 🙂

 

//Fredrik

jsphnwllms
Fluorite | Level 6

Thanks for your response!

Ksharp
Super User

data Table1	 ;
input (ID	Data) ($);
cards;
ABC	R5
ABC	J3
ABC	E1
ABC	C7
DEF	D0
DEF	Za
DEF	Z1
DEF	X4
GHI	R2
GHI	A7
GHI	E5
GHI	F3
GHI	H4
;
run;
 
data Table2; 
input (ID	Data1	Data2	Data3	Data4	Data5) ($);
cards;
ABC	R5	Mx	C7	C8	J3	 	 	 
ABC	R5	Mx	C7	C8	J3	 	 	 
ABC	R5	Mx	C7	C8	J3	 	 	 
ABC	R5	Mx	C7	C8	J3	 	 	 
ABC	R5	Mx	C7	C8	J3	 	 	 
ABC	R5	Mx	C7	C8	J3	 	 	 
ABC	R5	Mx	C7	C8	J3	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
GHI	R2	E5	Q4	I9	Jk	 	 	 
GHI	R2	E5	Q4	I9	Jk	 	 	 
GHI	R2	E5	Q4	I9	Jk	 	 	 
GHI	R2	E5	Q4	I9	Jk	 	 	 
GHI	R2	E5	Q4	I9	Jk	 	 	 
GHI	R2	E5	Q4	I9	Jk	 	 	 
GHI	R2	E5	Q4	I9	Jk
;
run;
data want;
 if _n_=1 then do;
  if 0 then set table1;
  declare hash h(dataset:'table1');
  h.definekey('id','data');
  h.definedone();
 end;
set table2;
array x{*} $ data1-data5;
do i=1 to dim(x);
 data=x{i};if h.check()=0 then call missing(x{i});
end;
drop i data;
run;

jsphnwllms
Fluorite | Level 6

Thanks for your response. The code from Ksharp did not quite produce the expected output.

 

Here's what I've done:

 

libname test 'C:\Data';

PROC SORT data=test.Table1;
by ID;
RUN;

 

PROC TRANSPOSE data=test.Table1 out=test.Table3;
var Data;
by ID;
RUN;

 

PROC SQL;
CREATE TABLE test.combine AS
SELECT
t1.*
,t2.col1
,t2.col2
,t2.col3
,t2.col4
,t2.col5
FROM
test.table2 t1
LEFT JOIN
test.table3 t2 on
t1.id = t2.id;
QUIT;

 

PROC SQL;
CREATE TABLE test.final AS
SELECT
ID
,Case When ((Data1 = col1) or
(Data1 = col2) or
(Data1 = col3) or
(Data1 = col4) or
(Data1 = col5)) then '' else Data1 end as Data1
,Case When ((Data2 = col1) or
(Data2 = col2) or
(Data2 = col3) or
(Data2 = col4) or
(Data2 = col5)) then '' else Data2 end as Data2
,Case When ((Data3 = col1) or
(Data3 = col2) or
(Data3 = col3) or
(Data3 = col4) or
(Data3 = col5)) then '' else Data3 end as Data3
,Case When ((Data4 = col1) or
(Data4 = col2) or
(Data4 = col3) or
(Data4 = col4) or
(Data4 = col5)) then '' else Data4 end as Data4
,Case When ((Data5 = col1) or
(Data5 = col2) or
(Data5 = col3) or
(Data5 = col4) or
(Data5 = col5)) then '' else Data5 end as Data5
FROM test.combine;
QUIT;

 

Here's the problem:

I'm dealing with thousands, even millions, of records. Also, the data variable could go from Data1 to Data25, which means this would be a long program. Is there a way to shorten the code in the last PROC SQL step? Thanks!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 924 views
  • 0 likes
  • 3 in conversation