BookmarkSubscribeRSS Feed
jsphnwllms
Fluorite | Level 6

I need to compare Table1 and Table2 by ID. If Data1 to Data5 in Table2 is equal to Data in Table1, set the field to blank.

 

Example:

Table1 
IDData
ABCR5
ABCJ3
ABCE1
ABCC7
DEFD0
DEFZa
DEFZ1
DEFX4
GHIR2
GHIA7
GHIE5
GHIF3
GHIH4

 

Table2        
IDData1Data2Data3Data4Data5Var1Va2Var3
ABCR5MxC7C8J3   
ABCR5MxC7C8J3   
ABCR5MxC7C8J3   
ABCR5MxC7C8J3   
ABCR5MxC7C8J3   
ABCR5MxC7C8J3   
ABCR5MxC7C8J3   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
DEFZ1E3IwZaD1   
GHIR2E5Q4I9Jk   
GHIR2E5Q4I9Jk   
GHIR2E5Q4I9Jk   
GHIR2E5Q4I9Jk   
GHIR2E5Q4I9Jk   
GHIR2E5Q4I9Jk   
GHIR2E5Q4I9Jk   

 

Expected Output       
IDData1Data2Data3Data4Data5Var1Va2Var3
ABC Mx C8    
ABC Mx C8    
ABC Mx C8    
ABC Mx C8    
ABC Mx C8    
ABC Mx C8    
ABC Mx C8    
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
DEF E3Iw D1   
GHI  Q4I9Jk   
GHI  Q4I9Jk   
GHI  Q4I9Jk   
GHI  Q4I9Jk   
GHI  Q4I9Jk   
GHI  Q4I9Jk   
GHI  Q4I9Jk   

 

Thanks so much for your help!

4 REPLIES 4
FredrikE
Rhodochrosite | Level 12

From the first data set, create a new one with one row per ID and a new variable (ex: conc) containing the data value separated by ,

ex:

ID     conc

ABC R5,J3,E1,C7

DEF D0, Za, Z1, X4

...

 

Then merge that data set with the second one on ID.

For each data1-data5, check if the concatenated contains the value, and if so blank it.

if conc contains data1 then data1 = '';

if conc contains data2 then data2 = '';

...

 

Just as an example of how to solve it, it can of course be made nicer with macros and stuff, but try the simple way first 🙂

 

//Fredrik

jsphnwllms
Fluorite | Level 6

Thanks for your response!

Ksharp
Super User

data Table1	 ;
input (ID	Data) ($);
cards;
ABC	R5
ABC	J3
ABC	E1
ABC	C7
DEF	D0
DEF	Za
DEF	Z1
DEF	X4
GHI	R2
GHI	A7
GHI	E5
GHI	F3
GHI	H4
;
run;
 
data Table2; 
input (ID	Data1	Data2	Data3	Data4	Data5) ($);
cards;
ABC	R5	Mx	C7	C8	J3	 	 	 
ABC	R5	Mx	C7	C8	J3	 	 	 
ABC	R5	Mx	C7	C8	J3	 	 	 
ABC	R5	Mx	C7	C8	J3	 	 	 
ABC	R5	Mx	C7	C8	J3	 	 	 
ABC	R5	Mx	C7	C8	J3	 	 	 
ABC	R5	Mx	C7	C8	J3	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
DEF	Z1	E3	Iw	Za	D1	 	 	 
GHI	R2	E5	Q4	I9	Jk	 	 	 
GHI	R2	E5	Q4	I9	Jk	 	 	 
GHI	R2	E5	Q4	I9	Jk	 	 	 
GHI	R2	E5	Q4	I9	Jk	 	 	 
GHI	R2	E5	Q4	I9	Jk	 	 	 
GHI	R2	E5	Q4	I9	Jk	 	 	 
GHI	R2	E5	Q4	I9	Jk
;
run;
data want;
 if _n_=1 then do;
  if 0 then set table1;
  declare hash h(dataset:'table1');
  h.definekey('id','data');
  h.definedone();
 end;
set table2;
array x{*} $ data1-data5;
do i=1 to dim(x);
 data=x{i};if h.check()=0 then call missing(x{i});
end;
drop i data;
run;

jsphnwllms
Fluorite | Level 6

Thanks for your response. The code from Ksharp did not quite produce the expected output.

 

Here's what I've done:

 

libname test 'C:\Data';

PROC SORT data=test.Table1;
by ID;
RUN;

 

PROC TRANSPOSE data=test.Table1 out=test.Table3;
var Data;
by ID;
RUN;

 

PROC SQL;
CREATE TABLE test.combine AS
SELECT
t1.*
,t2.col1
,t2.col2
,t2.col3
,t2.col4
,t2.col5
FROM
test.table2 t1
LEFT JOIN
test.table3 t2 on
t1.id = t2.id;
QUIT;

 

PROC SQL;
CREATE TABLE test.final AS
SELECT
ID
,Case When ((Data1 = col1) or
(Data1 = col2) or
(Data1 = col3) or
(Data1 = col4) or
(Data1 = col5)) then '' else Data1 end as Data1
,Case When ((Data2 = col1) or
(Data2 = col2) or
(Data2 = col3) or
(Data2 = col4) or
(Data2 = col5)) then '' else Data2 end as Data2
,Case When ((Data3 = col1) or
(Data3 = col2) or
(Data3 = col3) or
(Data3 = col4) or
(Data3 = col5)) then '' else Data3 end as Data3
,Case When ((Data4 = col1) or
(Data4 = col2) or
(Data4 = col3) or
(Data4 = col4) or
(Data4 = col5)) then '' else Data4 end as Data4
,Case When ((Data5 = col1) or
(Data5 = col2) or
(Data5 = col3) or
(Data5 = col4) or
(Data5 = col5)) then '' else Data5 end as Data5
FROM test.combine;
QUIT;

 

Here's the problem:

I'm dealing with thousands, even millions, of records. Also, the data variable could go from Data1 to Data25, which means this would be a long program. Is there a way to shorten the code in the last PROC SQL step? Thanks!

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1486 views
  • 0 likes
  • 3 in conversation