SAS Programming

abraham1 · Posted 11-08-2022 01:31 AM

Hi everyone

I have two datasets from phase I and phase 2 trials. I need to combine the data and compare each variable value from phase2 to each variable from phase 1 and high-light all mismatch value with red color.

Along with this, a new column needs to be inserted ' comment' where if data are matching, it should populate 'matching'. For every mismatch, it will display which variable mismatch and the respective value in parenthesis. Multiple mismatches should be separated by comma.

I need to repeat the same process for 500 variables and lakh of records in new dataset

data phase1;
input PID AGE Visit	phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
;
run;

data phase2;
input PID AGE Visit	phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
;
run;
proc sort data=phase1;by pid visit phase;run;
proc sort data=phase2;by pid visit phase;run;
proc sql;
create table final as
select * from phase1 union all
select * from phase2;
run;

Output

Ksharp · Posted 11-08-2022 06:40 AM

data phase1;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
;
run;

data phase2;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
;
run;








libname old v9 'c:\temp\old';
data old.phase;
 set phase1;
 drop phase;
run;
libname new v9 'c:\temp\new';
data new.phase;
 set phase2;
 drop phase;
run;

%macro type_two(dsn=,subjid=,key=);
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&dsn. out=old;by  &subjid. &key. ;run;
proc sort data=new.&dsn. out=new;by  &subjid. &key. ;run;
proc compare data=old compare=new noprint out=diff_&dsn.;
id &subjid. &key. ;
run;
data call_define(where=(upcase(vname) not in ("%upcase(&subjid.)"  "%upcase(&key.)" )));
 set diff_&dsn.(drop=_TYPE_ _OBS_  );
 length vname $ 40;
 array n{*} _numeric_;
 array c{*} _character_;
 do i=1 to dim(n);
   if n{i} not in (. 0) then do;vname=vname(n{i});output;end;
 end;
 do i=1 to dim(c);
   if findc(c{i},'X') then do;vname=vname(c{i});output;end;
 end;
keep &subjid. &key. vname;
run;
filename t temp;
data _null_;
 set call_define;
 file t;
 put "if &subjid.='" &subjid. "' and &key.='" &key. "' then call define('" vname "','style','style={background=red}');";
run;
data &dsn.;
if _n_=1 then do;
 if 0 then set new.&dsn.;
 if 0 then set old.&dsn.;
 declare hash h(dataset:"old.&dsn.");
 h.definekey("&subjid.","&key.");
 h.definedone();

 declare hash h1(dataset:'call_define');
 h1.definekey("&subjid.","&key.");
 h1.definedone();
end;
 set new.&dsn.;
 length new_change $ 40;
 if h.check() ne 0 then new_change='new' ;
 if h1.check() = 0 then new_change='change' ;
label new_change='new/change';
run;
ods excel file="c:\temp\want.xlsx" options(sheet_name="&dsn." autofilter='all' ) ;
proc report data=&dsn. split='*'  nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
if new_change='new' then call define(_row_,'style','style={background=red}');
endcomp;
run;
ods excel close;
%mend;



%type_two(dsn=phase,subjid=pid ,key=visit )

View solution in original post

andreas_lds · Posted 11-08-2022 03:05 AM

Why is "comment" empty in the first six obs in the result?

You can't change the background colour of cells in a dataset, so you need to create a report. Which file format is expected?

In the data shown all variables that need to be compared are alphanumeric. Is this true for all variables in your original data?

EDIT

Please check the data you have posted an align it with the expected result. The value "upset" is not in the data shown.

abraham1 · Posted 11-08-2022 03:12 AM

Thank you sir for checking

As I need to compare new dataset i.e. phase 2 to phase 1 (master sheet), the comment column should display discrepancy only on new dataset. Therefore phase 1 comment column kept blank

I need to generate the report in excel format for analysis.

No, some variables contain values of numeric, some character and some alphanumeric.

andreas_lds · Posted 11-08-2022 03:29 AM

@abraham1 wrote:

Thank you sir for checking

As I need to compare new dataset i.e. phase 2 to phase 1 (master sheet), the comment column should display discrepancy only on new dataset. Therefore phase 1 comment column kept blank

I need to generate the report in excel format for analysis.

No, some variables contain values of numeric, some character and some alphanumeric.

Then please change the data you have posted to match what you really have.

abraham1 · Posted 11-08-2022 03:56 AM

I have added one numeric variable ('dose'). please have a look.

The discrepancy in last variable if found during comparison will be added into comment column. I have not updated the final output. Hope it is fine.

data phase1;
input PID AGE Visit	phase event $ drug $ study $ dose;
cards;
101 23 1 1 asthma parace Yes 5
102 32 1 1 asthma sinarest Yes 10
101 23 2 1 headache drg1 No 5
103 43 1 1 vomit drg4 Yes 5 
103 43 2 1 pain drg5 No 100
103 43 3 1 pain drg5 Yes 500
;
run;

data phase2;
input PID AGE Visit	phase event $ drug $ study $ dose;
cards;
101 23 1 2 asthma parace Yes 5
102 32 1 2 asthma . Yes 5
101 23 2 2 acidity drg1 No 5
103 43 1 2 vomit drg4 Yes 100
103 43 2 2 pain drg5 No 100
103 43 4 2 pan drg5 Yes 500
107 43 3 2 pain drg5 Yes 200
;
run;
proc sort data=phase1;by pid visit phase;run;
proc sort data=phase2;by pid visit phase;run;
proc sql;
create table final as
select * from phase1 union all
select * from phase2;
run;

Shmuel · Posted 11-08-2022 03:15 AM

Have checked the PROC COMPARE to achieve the differences, though it is not reported in the required format.

Ksharp · Posted 11-08-2022 06:40 AM

data phase1;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
;
run;

data phase2;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
;
run;








libname old v9 'c:\temp\old';
data old.phase;
 set phase1;
 drop phase;
run;
libname new v9 'c:\temp\new';
data new.phase;
 set phase2;
 drop phase;
run;

%macro type_two(dsn=,subjid=,key=);
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&dsn. out=old;by  &subjid. &key. ;run;
proc sort data=new.&dsn. out=new;by  &subjid. &key. ;run;
proc compare data=old compare=new noprint out=diff_&dsn.;
id &subjid. &key. ;
run;
data call_define(where=(upcase(vname) not in ("%upcase(&subjid.)"  "%upcase(&key.)" )));
 set diff_&dsn.(drop=_TYPE_ _OBS_  );
 length vname $ 40;
 array n{*} _numeric_;
 array c{*} _character_;
 do i=1 to dim(n);
   if n{i} not in (. 0) then do;vname=vname(n{i});output;end;
 end;
 do i=1 to dim(c);
   if findc(c{i},'X') then do;vname=vname(c{i});output;end;
 end;
keep &subjid. &key. vname;
run;
filename t temp;
data _null_;
 set call_define;
 file t;
 put "if &subjid.='" &subjid. "' and &key.='" &key. "' then call define('" vname "','style','style={background=red}');";
run;
data &dsn.;
if _n_=1 then do;
 if 0 then set new.&dsn.;
 if 0 then set old.&dsn.;
 declare hash h(dataset:"old.&dsn.");
 h.definekey("&subjid.","&key.");
 h.definedone();

 declare hash h1(dataset:'call_define');
 h1.definekey("&subjid.","&key.");
 h1.definedone();
end;
 set new.&dsn.;
 length new_change $ 40;
 if h.check() ne 0 then new_change='new' ;
 if h1.check() = 0 then new_change='change' ;
label new_change='new/change';
run;
ods excel file="c:\temp\want.xlsx" options(sheet_name="&dsn." autofilter='all' ) ;
proc report data=&dsn. split='*'  nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
if new_change='new' then call define(_row_,'style','style={background=red}');
endcomp;
run;
ods excel close;
%mend;



%type_two(dsn=phase,subjid=pid ,key=visit )

abraham1 · Posted 11-11-2022 02:55 PM

Thank you so much sir

possible · Posted 06-06-2023 06:30 PM

Hi Sir,

Thank you for detailed information on the question. I have one question, can we please help me what to do if the data from target is deleted while comparing to source? For the current context only. I just add a new record in the phase1 dataset with subject record 104. What to do if the 104 record is deleted in the phase2 dataset.

data phase1;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
104 20 3 1 fever drg6 No
;
run;

data phase2;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
;
run;

Thank you

Ksharp · Posted 06-07-2023 07:32 AM

/*Try this one*/
data phase1;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
104 20 3 1 fever drg6 No
;
run;

data phase2;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
;
run;


libname old v9 'c:\temp\old';
data old.phase;
 set phase1;
 drop phase;
run;
libname new v9 'c:\temp\new';
data new.phase;
 set phase2;
 drop phase;
run;


%macro type_two(dsn=,subjid=,key=);
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&dsn. out=old;by  &subjid. &key. ;run;
proc sort data=new.&dsn. out=new;by  &subjid. &key. ;run;
proc compare data=old compare=new noprint out=diff_&dsn.;
id &subjid. &key. ;
run;
data call_define(where=(upcase(vname) not in ("%upcase(&subjid.)"  "%upcase(&key.)" )));
 set diff_&dsn.(drop=_TYPE_ _OBS_  );
 length vname $ 40;
 array n{*} _numeric_;
 array c{*} _character_;
 do i=1 to dim(n);
   if n{i} not in (. 0) then do;vname=vname(n{i});output;end;
 end;
 do i=1 to dim(c);
   if findc(c{i},'X') then do;vname=vname(c{i});output;end;
 end;
keep &subjid. &key. vname;
run;
filename t temp;
data _null_;
 set call_define;
 file t;
 put "if &subjid.='" &subjid. "' and &key.='" &key. "' then call define('" vname "','style','style={background=red}');";
run;
data &dsn.;
if _n_=1 then do;
 if 0 then set new.&dsn.;
 if 0 then set old.&dsn.;
 declare hash h(dataset:"old.&dsn.");
 h.definekey("&subjid.","&key.");
 h.definedone();

 declare hash h1(dataset:'call_define');
 h1.definekey("&subjid.","&key.");
 h1.definedone();
end;
 set new.&dsn.;
 length new_change $ 40;
 if h.check() ne 0 then new_change='new' ;
 if h1.check() = 0 then new_change='change' ;
label new_change='new/change/delete';
run;
proc sql;
create table &dsn.2 as
select * from &dsn.
union
(select *,'delete' from old.&dsn. where &subjid. not in (select &subjid. from new.&dsn.));
quit;

ods excel file="c:\temp\want.xlsx" options(sheet_name="&dsn." autofilter='all' ) ;
proc report data=&dsn.2 split='*'  nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
if new_change='new' then call define(_row_,'style','style={background=verylightblue}');
if new_change='delete' then call define(_row_,'style','style={background=gold}');
endcomp;
run;
ods excel close;
%mend;

%type_two(dsn=phase,subjid=pid ,key=visit )

possible · Posted 06-07-2023 10:56 AM

Thank you so much sir 🙂

possible · Posted 06-07-2023 05:25 PM

Thank you so much sir.

Sorry for the trouble. I have one more question, With the above program we would be able to get the subject only if the subject has one occurrence right?

I have a scenario that the subject has multiple occurrence in the source file like 5 time and in the target couple of record are deleted. So, I tried adding a variable to the where cause for teh select statement but i think its not working. i have multiple datapoint with same variable value. How can filter the deleted record if the multiple variable are same. Based on my scenario, updated few data point below. Can you please review.

data phase1;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
104 20 3 1 fever drg6 No
104 20 3 1 headache drg1 No
104 20 3 1 pain drg3 yes
104 23 3 1 headache drg3 yes
104 20 2 1 pain drg5 No
104 20 4 1 pain drg1 No
104 43 1 1 vomit drg4 Yes
104 23 5 1 headache drg1 No
;
run;

data phase2;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
104 20 3 1 fever drg6 No
104 20 3 1 headache drg1 No
104 20 2 1 pain drg5 No
;
run;

Ksharp · Posted 06-08-2023 07:43 AM

/*
" we would be able to get the subject only if the subject has one occurrence right? "
That is not true. My code would pick up all the subject's obs if it is not in OLD dataset.

If you  want include VISIT variable too ,try the following code.

NOTE: SUBJID= and KEY= identity a unique obs.
      You cann't have two obs have the same PID and same VISIT

Therefore,I modify your data to conform this rule.
*/

data phase1;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
104 20 6 1 fever drg6 No
104 20 7 1 headache drg1 No
104 20 8 1 pain drg3 yes
104 23 3 1 headache drg3 yes
104 20 2 1 pain drg5 No
104 20 4 1 pain drg1 No
104 43 1 1 vomit drg4 Yes
104 23 5 1 headache drg1 No
;
run;

data phase2;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
104 20 1 1 fever drg6 No
104 20 3 1 headache drg1 No
104 20 2 1 pain drg5 No
;
run;


libname old v9 'c:\temp\old';
data old.phase;
 set phase1;
 drop phase;
run;
libname new v9 'c:\temp\new';
data new.phase;
 set phase2;
 drop phase;
run;


%macro type_two(dsn=,subjid=,key=);
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&dsn. out=old;by  &subjid. &key. ;run;
proc sort data=new.&dsn. out=new;by  &subjid. &key. ;run;
proc compare data=old compare=new noprint out=diff_&dsn.;
id &subjid. &key. ;
run;
data call_define(where=(upcase(vname) not in ("%upcase(&subjid.)"  "%upcase(&key.)" )));
 set diff_&dsn.(drop=_TYPE_ _OBS_  );
 length vname $ 40;
 array n{*} _numeric_;
 array c{*} _character_;
 do i=1 to dim(n);
   if n{i} not in (. 0) then do;vname=vname(n{i});output;end;
 end;
 do i=1 to dim(c);
   if findc(c{i},'X') then do;vname=vname(c{i});output;end;
 end;
keep &subjid. &key. vname;
run;
filename t temp;
data _null_;
 set call_define;
 file t;
 put "if &subjid.='" &subjid. "' and &key.='" &key. "' then call define('" vname "','style','style={background=red}');";
run;
data &dsn.;
if _n_=1 then do;
 if 0 then set new.&dsn.;
 if 0 then set old.&dsn.;
 declare hash h(dataset:"old.&dsn.");
 h.definekey("&subjid.","&key.");
 h.definedone();

 declare hash h1(dataset:'call_define');
 h1.definekey("&subjid.","&key.");
 h1.definedone();
end;
 set new.&dsn.;
 length new_change $ 40;
 if h.check() ne 0 then new_change='new' ;
 if h1.check() = 0 then new_change='change' ;
label new_change='new/change/delete';
run;
proc sql;
create table &dsn.2 as
select * from &dsn.
union
(select *,'delete' from old.&dsn. where catx('|',&subjid.,&key.) not in (select catx('|',&subjid.,&key.) from new.&dsn.));
quit;

ods excel file="c:\temp\want.xlsx" options(sheet_name="&dsn." autofilter='all' ) ;
proc report data=&dsn.2 split='*'  nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
if new_change='new' then call define(_row_,'style','style={background=verylightblue}');
if new_change='delete' then call define(_row_,'style','style={background=gold}');
endcomp;
run;
ods excel close;
%mend;

%type_two(dsn=phase,subjid=pid ,key=visit )

possible · Posted 06-08-2023 05:23 PM

Thank you, sir for responding. In my scenario, one of the dataset is recurrence kind of log line/repeating section. So, the visit is repeating for few datapoint. So, I add the repeating section datapoint to the catx statement and was able to get generate the report. For other datasets, reach out to get a more stable data. thank you so much sir for helping me.

SAS Programming

comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Re: comparing all variable values between two datasets and high-light mismatch

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...