I need to compare Table1 and Table2 by ID. If Data1 to Data5 in Table2 is equal to Data in Table1, set the field to blank.
Example:
Table1 | |
ID | Data |
ABC | R5 |
ABC | J3 |
ABC | E1 |
ABC | C7 |
DEF | D0 |
DEF | Za |
DEF | Z1 |
DEF | X4 |
GHI | R2 |
GHI | A7 |
GHI | E5 |
GHI | F3 |
GHI | H4 |
Table2 | ||||||||
ID | Data1 | Data2 | Data3 | Data4 | Data5 | Var1 | Va2 | Var3 |
ABC | R5 | Mx | C7 | C8 | J3 | |||
ABC | R5 | Mx | C7 | C8 | J3 | |||
ABC | R5 | Mx | C7 | C8 | J3 | |||
ABC | R5 | Mx | C7 | C8 | J3 | |||
ABC | R5 | Mx | C7 | C8 | J3 | |||
ABC | R5 | Mx | C7 | C8 | J3 | |||
ABC | R5 | Mx | C7 | C8 | J3 | |||
DEF | Z1 | E3 | Iw | Za | D1 | |||
DEF | Z1 | E3 | Iw | Za | D1 | |||
DEF | Z1 | E3 | Iw | Za | D1 | |||
DEF | Z1 | E3 | Iw | Za | D1 | |||
DEF | Z1 | E3 | Iw | Za | D1 | |||
DEF | Z1 | E3 | Iw | Za | D1 | |||
DEF | Z1 | E3 | Iw | Za | D1 | |||
DEF | Z1 | E3 | Iw | Za | D1 | |||
DEF | Z1 | E3 | Iw | Za | D1 | |||
GHI | R2 | E5 | Q4 | I9 | Jk | |||
GHI | R2 | E5 | Q4 | I9 | Jk | |||
GHI | R2 | E5 | Q4 | I9 | Jk | |||
GHI | R2 | E5 | Q4 | I9 | Jk | |||
GHI | R2 | E5 | Q4 | I9 | Jk | |||
GHI | R2 | E5 | Q4 | I9 | Jk | |||
GHI | R2 | E5 | Q4 | I9 | Jk |
Expected Output | ||||||||
ID | Data1 | Data2 | Data3 | Data4 | Data5 | Var1 | Va2 | Var3 |
ABC | Mx | C8 | ||||||
ABC | Mx | C8 | ||||||
ABC | Mx | C8 | ||||||
ABC | Mx | C8 | ||||||
ABC | Mx | C8 | ||||||
ABC | Mx | C8 | ||||||
ABC | Mx | C8 | ||||||
DEF | E3 | Iw | D1 | |||||
DEF | E3 | Iw | D1 | |||||
DEF | E3 | Iw | D1 | |||||
DEF | E3 | Iw | D1 | |||||
DEF | E3 | Iw | D1 | |||||
DEF | E3 | Iw | D1 | |||||
DEF | E3 | Iw | D1 | |||||
DEF | E3 | Iw | D1 | |||||
DEF | E3 | Iw | D1 | |||||
GHI | Q4 | I9 | Jk | |||||
GHI | Q4 | I9 | Jk | |||||
GHI | Q4 | I9 | Jk | |||||
GHI | Q4 | I9 | Jk | |||||
GHI | Q4 | I9 | Jk | |||||
GHI | Q4 | I9 | Jk | |||||
GHI | Q4 | I9 | Jk |
Thanks so much for your help!
From the first data set, create a new one with one row per ID and a new variable (ex: conc) containing the data value separated by ,
ex:
ID conc
ABC R5,J3,E1,C7
DEF D0, Za, Z1, X4
...
Then merge that data set with the second one on ID.
For each data1-data5, check if the concatenated contains the value, and if so blank it.
if conc contains data1 then data1 = '';
if conc contains data2 then data2 = '';
...
Just as an example of how to solve it, it can of course be made nicer with macros and stuff, but try the simple way first 🙂
//Fredrik
Thanks for your response!
data Table1 ; input (ID Data) ($); cards; ABC R5 ABC J3 ABC E1 ABC C7 DEF D0 DEF Za DEF Z1 DEF X4 GHI R2 GHI A7 GHI E5 GHI F3 GHI H4 ; run; data Table2; input (ID Data1 Data2 Data3 Data4 Data5) ($); cards; ABC R5 Mx C7 C8 J3 ABC R5 Mx C7 C8 J3 ABC R5 Mx C7 C8 J3 ABC R5 Mx C7 C8 J3 ABC R5 Mx C7 C8 J3 ABC R5 Mx C7 C8 J3 ABC R5 Mx C7 C8 J3 DEF Z1 E3 Iw Za D1 DEF Z1 E3 Iw Za D1 DEF Z1 E3 Iw Za D1 DEF Z1 E3 Iw Za D1 DEF Z1 E3 Iw Za D1 DEF Z1 E3 Iw Za D1 DEF Z1 E3 Iw Za D1 DEF Z1 E3 Iw Za D1 DEF Z1 E3 Iw Za D1 GHI R2 E5 Q4 I9 Jk GHI R2 E5 Q4 I9 Jk GHI R2 E5 Q4 I9 Jk GHI R2 E5 Q4 I9 Jk GHI R2 E5 Q4 I9 Jk GHI R2 E5 Q4 I9 Jk GHI R2 E5 Q4 I9 Jk ; run; data want; if _n_=1 then do; if 0 then set table1; declare hash h(dataset:'table1'); h.definekey('id','data'); h.definedone(); end; set table2; array x{*} $ data1-data5; do i=1 to dim(x); data=x{i};if h.check()=0 then call missing(x{i}); end; drop i data; run;
Thanks for your response. The code from Ksharp did not quite produce the expected output.
Here's what I've done:
libname test 'C:\Data';
PROC SORT data=test.Table1;
by ID;
RUN;
PROC TRANSPOSE data=test.Table1 out=test.Table3;
var Data;
by ID;
RUN;
PROC SQL;
CREATE TABLE test.combine AS
SELECT
t1.*
,t2.col1
,t2.col2
,t2.col3
,t2.col4
,t2.col5
FROM
test.table2 t1
LEFT JOIN
test.table3 t2 on
t1.id = t2.id;
QUIT;
PROC SQL;
CREATE TABLE test.final AS
SELECT
ID
,Case When ((Data1 = col1) or
(Data1 = col2) or
(Data1 = col3) or
(Data1 = col4) or
(Data1 = col5)) then '' else Data1 end as Data1
,Case When ((Data2 = col1) or
(Data2 = col2) or
(Data2 = col3) or
(Data2 = col4) or
(Data2 = col5)) then '' else Data2 end as Data2
,Case When ((Data3 = col1) or
(Data3 = col2) or
(Data3 = col3) or
(Data3 = col4) or
(Data3 = col5)) then '' else Data3 end as Data3
,Case When ((Data4 = col1) or
(Data4 = col2) or
(Data4 = col3) or
(Data4 = col4) or
(Data4 = col5)) then '' else Data4 end as Data4
,Case When ((Data5 = col1) or
(Data5 = col2) or
(Data5 = col3) or
(Data5 = col4) or
(Data5 = col5)) then '' else Data5 end as Data5
FROM test.combine;
QUIT;
Here's the problem:
I'm dealing with thousands, even millions, of records. Also, the data variable could go from Data1 to Data25, which means this would be a long program. Is there a way to shorten the code in the last PROC SQL step? Thanks!
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.