Hi-
Asking this super SAS support group for some help. I am trying to compare the contents (data) from two different datasets. Both have the same variable names but the number of observations differ. I tried proc compare to compare to obtain differences but it was difficult to interpret the proc compare results. Can anyone help me on how I can go about producing an outpuf file upon comparing patient information from both datasets. It would be helpful if the output would contain only patient data from from table A not matching patient data in table B. Any help would be appreciated. Thank you all in advance!
Proc compare has it's limitations. A simple SQL query will generate what you want, the code below will select observations from TableA where the ID is not in TableB. You can replace the ID variable by your ID variable and table name with your table names.
proc sql;
create table missing as
select * from tableA
where ID not in (select distinct ID from tableB);
quit;
Can you have multiple records per patient? How should that be handled?
Hi- Thank you so much for responding in my quest for help!!
Yes. There could be multiple records per patient and the patient record data could be reported from different location sites. I want to compare the patient data at location ABC from Table A against the same patient data at location XYZ from Table B. Thank you!
Maybe not very elegant, but this should do the trick.
/* Get some dummy data. */
%let varnames=age height weight;
data site1;
set sashelp.class;
age = age +1;
run;
data site2;
set sashelp.class;
run;
/* Rename your variables. */
%macro xxx (site);
proc datasets library=work memtype=data nolist;
modify &site.;
rename
%do i=1 %to %sysfunc(countw(&varnames));
%scan(&varnames,&i)= &site._&i
%end;
;
quit;
%mend xxx;
%xxx (site1);
%xxx (site2);
/* Comparison. */
%macro yyy;
data comp;
merge site1 (in = a) site2 (in = b);
by name ; /* plus any other unique keys you may have */
if a and b then flag = "Site 1 & 2 ";
else if a and not b then flag = "Site 1 only";
else if not a and b then flag = "Site 2 only";
%do i=1 %to %sysfunc(countw(&varnames));
diff&i. = sum (site1_&i, -site2_&i);
%end;
run;
%mend yyy;
%yyy;
Thank you for your response. I'm not too versed with macros but I'll pull my reference books out to review again. Do you know if the SQL stmts below would work? To get the differences and then another pass to get the intersects/commons?
Proc SQL;
create table missing as
select * from Tab1
except
select * from Tab2;
Quit;
Proc SQL;
create table missing as
select * from Tab1
intersect
select * from Tab2;
Quit;
Hi-
Thank you for your help. I am trying to compare 15 variables from Tab1 and Tab2. Can you help explain what this part of your code does.
/* Rename your variables. */
%macro xxx (site);
proc datasets library=work memtype=data nolist;
modify &site.;
rename
%do i=1 %to %sysfunc(countw(&varnames));
%scan(&varnames,&i)= &site._&i
%end;
;
quit;
%mend xxx;
%xxx (site1);
%xxx (site2);
@SannaSanna Did you try the initial code I suggested?
Hi Reeza,
Yes I did. Unfortunately it didnt work. I think its because I have duplicate IDs (which is valid for my dataset) But I dont think SAS liked it.
@SannaSanna wrote:
Hi Reeza,
Yes I did. Unfortunately it didnt work. I think its because I have duplicate IDs (which is valid for my dataset) But I dont think SAS liked it.
@SannaSanna Duplicate ID's shouldn't have mattered, did you get an error or results that were unsatisfactory?
Hi Reeza,
Thank you so much for your help.
** trying to compare Table A with Table B to determine contents in Table A changed/missing in Table B and produce output for additional review**
This is the code I applied which you suggested.
proc sql;
create table missing as
select * from TABLEA
where PATIENTID LOCATION DATE PROC OUTCOME
not in (select distinct PATIENTID LOCATION DATE PROC OUTCOME from TABLEB);
quit;
Syntax error after PatientID
ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, (, *, **, +, -, '.', /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, EXCEPT, GE, GET, GROUP, GT, GTT, HAVING, IN,
INTERSECT, IS, LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN, OR, ORDER, OUTER, UNION, ^, ^=, |, ||, ~, ~=.
ERROR 76-322: Syntax error, statement will be ignored.
Hi Suzanne,
Tha code simply renames your variables into site1_1 (the first variable from your first dataset, in the order you listed in varnames: %let varnames=age height weight;), site1_2 ( the second variable from your first dataset),... site1_n, site2_1, etc.
I assumed that you want to do a difference between the two tables (diff&i. = sum (site1_&i, -site2_&i);) which was the reason I renamed the variables in the first place.
If that's the case, to use that code do the following:
1) in %let varnames=age height weight; replace age, height, weight with your variables from the two datasets.
2) name your datasets site1 and site2 (or change their name when you call the macro xxx).
3) replace by name with by (your list of unique identifiers).
I hope this helps.
Hi-
I think I almost got it to run correctly. The code failed untimately. The comparison code- why is my data being converted from Character to Numeric? The data contents within TableA and TableB are all string/characters/YN and no numeric. And does the data need to be sorted first in the merge step? << got error message too
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.