BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
abraham1
Obsidian | Level 7

Hi everyone

I have two datasets from phase I and phase 2 trials. I need to combine the data and compare each variable value from phase2 to each variable from phase 1 and high-light all mismatch value with red color.

Along with this, a new column needs to be inserted ' comment' where if data are matching, it should populate 'matching'. For every mismatch, it will display which variable mismatch and the respective value in parenthesis. Multiple mismatches should be separated by comma.

I need to repeat the same process for 500 variables and lakh of records in new dataset

 

data phase1;
input PID AGE Visit	phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
;
run;

data phase2;
input PID AGE Visit	phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
;
run;
proc sort data=phase1;by pid visit phase;run;
proc sort data=phase2;by pid visit phase;run;
proc sql;
create table final as
select * from phase1 union all
select * from phase2;
run;

Output

 

2022-11-08_11-59-07.png

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data phase1;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
;
run;

data phase2;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
;
run;








libname old v9 'c:\temp\old';
data old.phase;
 set phase1;
 drop phase;
run;
libname new v9 'c:\temp\new';
data new.phase;
 set phase2;
 drop phase;
run;

%macro type_two(dsn=,subjid=,key=);
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&dsn. out=old;by  &subjid. &key. ;run;
proc sort data=new.&dsn. out=new;by  &subjid. &key. ;run;
proc compare data=old compare=new noprint out=diff_&dsn.;
id &subjid. &key. ;
run;
data call_define(where=(upcase(vname) not in ("%upcase(&subjid.)"  "%upcase(&key.)" )));
 set diff_&dsn.(drop=_TYPE_ _OBS_  );
 length vname $ 40;
 array n{*} _numeric_;
 array c{*} _character_;
 do i=1 to dim(n);
   if n{i} not in (. 0) then do;vname=vname(n{i});output;end;
 end;
 do i=1 to dim(c);
   if findc(c{i},'X') then do;vname=vname(c{i});output;end;
 end;
keep &subjid. &key. vname;
run;
filename t temp;
data _null_;
 set call_define;
 file t;
 put "if &subjid.='" &subjid. "' and &key.='" &key. "' then call define('" vname "','style','style={background=red}');";
run;
data &dsn.;
if _n_=1 then do;
 if 0 then set new.&dsn.;
 if 0 then set old.&dsn.;
 declare hash h(dataset:"old.&dsn.");
 h.definekey("&subjid.","&key.");
 h.definedone();

 declare hash h1(dataset:'call_define');
 h1.definekey("&subjid.","&key.");
 h1.definedone();
end;
 set new.&dsn.;
 length new_change $ 40;
 if h.check() ne 0 then new_change='new' ;
 if h1.check() = 0 then new_change='change' ;
label new_change='new/change';
run;
ods excel file="c:\temp\want.xlsx" options(sheet_name="&dsn." autofilter='all' ) ;
proc report data=&dsn. split='*'  nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
if new_change='new' then call define(_row_,'style','style={background=red}');
endcomp;
run;
ods excel close;
%mend;



%type_two(dsn=phase,subjid=pid ,key=visit )

Ksharp_0-1667907678009.png

 

View solution in original post

13 REPLIES 13
andreas_lds
Jade | Level 19

Why is "comment" empty in the first six obs in the result?

You can't change the background colour of cells in a dataset, so you need to create a report. Which file format is expected?

In the data shown all variables that need to be compared are alphanumeric. Is this true for all variables in your original data?

 

EDIT

Please check the data you have posted an align it with the expected result. The value "upset" is not in the data shown.

abraham1
Obsidian | Level 7

Thank you sir for checking

As I need to compare new dataset i.e. phase 2 to phase 1 (master sheet), the comment column should display discrepancy only on new dataset. Therefore phase 1 comment column kept blank

 

I need to generate the report in excel format for analysis. 

No, some variables contain values of numeric, some character and some alphanumeric.

andreas_lds
Jade | Level 19

@abraham1 wrote:

Thank you sir for checking

As I need to compare new dataset i.e. phase 2 to phase 1 (master sheet), the comment column should display discrepancy only on new dataset. Therefore phase 1 comment column kept blank

 

I need to generate the report in excel format for analysis. 

No, some variables contain values of numeric, some character and some alphanumeric.


Then please change the data you have posted to match what you really have.

abraham1
Obsidian | Level 7

I have added one numeric variable ('dose'). please have a look.

The discrepancy in last variable if found during comparison will be added into comment column. I have not updated the final output. Hope it is fine.

data phase1;
input PID AGE Visit	phase event $ drug $ study $ dose;
cards;
101 23 1 1 asthma parace Yes 5
102 32 1 1 asthma sinarest Yes 10
101 23 2 1 headache drg1 No 5
103 43 1 1 vomit drg4 Yes 5 
103 43 2 1 pain drg5 No 100
103 43 3 1 pain drg5 Yes 500
;
run;

data phase2;
input PID AGE Visit	phase event $ drug $ study $ dose;
cards;
101 23 1 2 asthma parace Yes 5
102 32 1 2 asthma . Yes 5
101 23 2 2 acidity drg1 No 5
103 43 1 2 vomit drg4 Yes 100
103 43 2 2 pain drg5 No 100
103 43 4 2 pan drg5 Yes 500
107 43 3 2 pain drg5 Yes 200
;
run;
proc sort data=phase1;by pid visit phase;run;
proc sort data=phase2;by pid visit phase;run;
proc sql;
create table final as
select * from phase1 union all
select * from phase2;
run;
Shmuel
Garnet | Level 18

Have checked the PROC COMPARE to achieve the differences, though it is not reported in the required format.

Ksharp
Super User
data phase1;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
;
run;

data phase2;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
;
run;








libname old v9 'c:\temp\old';
data old.phase;
 set phase1;
 drop phase;
run;
libname new v9 'c:\temp\new';
data new.phase;
 set phase2;
 drop phase;
run;

%macro type_two(dsn=,subjid=,key=);
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&dsn. out=old;by  &subjid. &key. ;run;
proc sort data=new.&dsn. out=new;by  &subjid. &key. ;run;
proc compare data=old compare=new noprint out=diff_&dsn.;
id &subjid. &key. ;
run;
data call_define(where=(upcase(vname) not in ("%upcase(&subjid.)"  "%upcase(&key.)" )));
 set diff_&dsn.(drop=_TYPE_ _OBS_  );
 length vname $ 40;
 array n{*} _numeric_;
 array c{*} _character_;
 do i=1 to dim(n);
   if n{i} not in (. 0) then do;vname=vname(n{i});output;end;
 end;
 do i=1 to dim(c);
   if findc(c{i},'X') then do;vname=vname(c{i});output;end;
 end;
keep &subjid. &key. vname;
run;
filename t temp;
data _null_;
 set call_define;
 file t;
 put "if &subjid.='" &subjid. "' and &key.='" &key. "' then call define('" vname "','style','style={background=red}');";
run;
data &dsn.;
if _n_=1 then do;
 if 0 then set new.&dsn.;
 if 0 then set old.&dsn.;
 declare hash h(dataset:"old.&dsn.");
 h.definekey("&subjid.","&key.");
 h.definedone();

 declare hash h1(dataset:'call_define');
 h1.definekey("&subjid.","&key.");
 h1.definedone();
end;
 set new.&dsn.;
 length new_change $ 40;
 if h.check() ne 0 then new_change='new' ;
 if h1.check() = 0 then new_change='change' ;
label new_change='new/change';
run;
ods excel file="c:\temp\want.xlsx" options(sheet_name="&dsn." autofilter='all' ) ;
proc report data=&dsn. split='*'  nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
if new_change='new' then call define(_row_,'style','style={background=red}');
endcomp;
run;
ods excel close;
%mend;



%type_two(dsn=phase,subjid=pid ,key=visit )

Ksharp_0-1667907678009.png

 

possible
Fluorite | Level 6

Hi Sir, 

 

Thank you for detailed information on the question. I have one question, can we please help me what to do if the data from target is deleted while comparing to source?   For the current context only. I just add a new record in the phase1 dataset with subject record 104. What to do if the 104 record is deleted in the phase2 dataset.

 

data phase1;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
104 20 3 1 fever drg6 No
;
run;

data phase2;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
;
run;

 

Thank you 

Ksharp
Super User
/*Try this one*/
data phase1;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
104 20 3 1 fever drg6 No
;
run;

data phase2;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
;
run;


libname old v9 'c:\temp\old';
data old.phase;
 set phase1;
 drop phase;
run;
libname new v9 'c:\temp\new';
data new.phase;
 set phase2;
 drop phase;
run;


%macro type_two(dsn=,subjid=,key=);
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&dsn. out=old;by  &subjid. &key. ;run;
proc sort data=new.&dsn. out=new;by  &subjid. &key. ;run;
proc compare data=old compare=new noprint out=diff_&dsn.;
id &subjid. &key. ;
run;
data call_define(where=(upcase(vname) not in ("%upcase(&subjid.)"  "%upcase(&key.)" )));
 set diff_&dsn.(drop=_TYPE_ _OBS_  );
 length vname $ 40;
 array n{*} _numeric_;
 array c{*} _character_;
 do i=1 to dim(n);
   if n{i} not in (. 0) then do;vname=vname(n{i});output;end;
 end;
 do i=1 to dim(c);
   if findc(c{i},'X') then do;vname=vname(c{i});output;end;
 end;
keep &subjid. &key. vname;
run;
filename t temp;
data _null_;
 set call_define;
 file t;
 put "if &subjid.='" &subjid. "' and &key.='" &key. "' then call define('" vname "','style','style={background=red}');";
run;
data &dsn.;
if _n_=1 then do;
 if 0 then set new.&dsn.;
 if 0 then set old.&dsn.;
 declare hash h(dataset:"old.&dsn.");
 h.definekey("&subjid.","&key.");
 h.definedone();

 declare hash h1(dataset:'call_define');
 h1.definekey("&subjid.","&key.");
 h1.definedone();
end;
 set new.&dsn.;
 length new_change $ 40;
 if h.check() ne 0 then new_change='new' ;
 if h1.check() = 0 then new_change='change' ;
label new_change='new/change/delete';
run;
proc sql;
create table &dsn.2 as
select * from &dsn.
union
(select *,'delete' from old.&dsn. where &subjid. not in (select &subjid. from new.&dsn.));
quit;

ods excel file="c:\temp\want.xlsx" options(sheet_name="&dsn." autofilter='all' ) ;
proc report data=&dsn.2 split='*'  nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
if new_change='new' then call define(_row_,'style','style={background=verylightblue}');
if new_change='delete' then call define(_row_,'style','style={background=gold}');
endcomp;
run;
ods excel close;
%mend;

%type_two(dsn=phase,subjid=pid ,key=visit )

Ksharp_0-1686137558535.png

 

possible
Fluorite | Level 6
Thank you so much sir 🙂
possible
Fluorite | Level 6

Thank you so much sir. 

Sorry for the trouble. I have one more question, With the above program we would be able to get the subject only if the subject has one occurrence right? 

I have a scenario that the subject has multiple occurrence in the source file like 5 time and in the target couple of record are deleted. So, I tried adding a variable to the where cause for teh select statement but i think its not working. i have multiple datapoint with same variable value.  How can filter the deleted record if the multiple variable are same. Based on my scenario, updated few data point below. Can you please review.  

 

data phase1;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
104 20 3 1 fever drg6 No
104 20 3 1 headache drg1 No
104 20 3 1 pain drg3 yes
104 23 3 1 headache drg3 yes
104 20 2 1 pain drg5 No
104 20 4 1 pain drg1 No
104 43 1 1 vomit drg4 Yes
104 23 5 1 headache drg1 No
;
run;

data phase2;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
104 20 3 1 fever drg6 No
104 20 3 1 headache drg1 No
104 20 2 1 pain drg5 No
;
run;

 

Ksharp
Super User
/*
" we would be able to get the subject only if the subject has one occurrence right? "
That is not true. My code would pick up all the subject's obs if it is not in OLD dataset.

If you  want include VISIT variable too ,try the following code.

NOTE: SUBJID= and KEY= identity a unique obs.
      You cann't have two obs have the same PID and same VISIT

Therefore,I modify your data to conform this rule.
*/

data phase1;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 1 asthma parace Yes
102 32 1 1 asthma sinarest Yes
101 23 2 1 headache drg1 No
103 43 1 1 vomit drg4 Yes
103 43 2 1 pain drg5 No
103 43 3 1 pain drg5 Yes
104 20 6 1 fever drg6 No
104 20 7 1 headache drg1 No
104 20 8 1 pain drg3 yes
104 23 3 1 headache drg3 yes
104 20 2 1 pain drg5 No
104 20 4 1 pain drg1 No
104 43 1 1 vomit drg4 Yes
104 23 5 1 headache drg1 No
;
run;

data phase2;
input PID AGE Visit phase event $ drug $ study $;
cards;
101 23 1 2 asthma parace Yes
102 32 1 2 asthma . Yes
101 23 2 2 acidity drg1 No
103 43 1 2 vomit drg4 Yes
103 43 2 2 pain drg5 No
103 43 4 2 pan drg5 Yes
107 43 3 2 pain drg5 Yes
104 20 1 1 fever drg6 No
104 20 3 1 headache drg1 No
104 20 2 1 pain drg5 No
;
run;


libname old v9 'c:\temp\old';
data old.phase;
 set phase1;
 drop phase;
run;
libname new v9 'c:\temp\new';
data new.phase;
 set phase2;
 drop phase;
run;


%macro type_two(dsn=,subjid=,key=);
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&dsn. out=old;by  &subjid. &key. ;run;
proc sort data=new.&dsn. out=new;by  &subjid. &key. ;run;
proc compare data=old compare=new noprint out=diff_&dsn.;
id &subjid. &key. ;
run;
data call_define(where=(upcase(vname) not in ("%upcase(&subjid.)"  "%upcase(&key.)" )));
 set diff_&dsn.(drop=_TYPE_ _OBS_  );
 length vname $ 40;
 array n{*} _numeric_;
 array c{*} _character_;
 do i=1 to dim(n);
   if n{i} not in (. 0) then do;vname=vname(n{i});output;end;
 end;
 do i=1 to dim(c);
   if findc(c{i},'X') then do;vname=vname(c{i});output;end;
 end;
keep &subjid. &key. vname;
run;
filename t temp;
data _null_;
 set call_define;
 file t;
 put "if &subjid.='" &subjid. "' and &key.='" &key. "' then call define('" vname "','style','style={background=red}');";
run;
data &dsn.;
if _n_=1 then do;
 if 0 then set new.&dsn.;
 if 0 then set old.&dsn.;
 declare hash h(dataset:"old.&dsn.");
 h.definekey("&subjid.","&key.");
 h.definedone();

 declare hash h1(dataset:'call_define');
 h1.definekey("&subjid.","&key.");
 h1.definedone();
end;
 set new.&dsn.;
 length new_change $ 40;
 if h.check() ne 0 then new_change='new' ;
 if h1.check() = 0 then new_change='change' ;
label new_change='new/change/delete';
run;
proc sql;
create table &dsn.2 as
select * from &dsn.
union
(select *,'delete' from old.&dsn. where catx('|',&subjid.,&key.) not in (select catx('|',&subjid.,&key.) from new.&dsn.));
quit;

ods excel file="c:\temp\want.xlsx" options(sheet_name="&dsn." autofilter='all' ) ;
proc report data=&dsn.2 split='*'  nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
if new_change='new' then call define(_row_,'style','style={background=verylightblue}');
if new_change='delete' then call define(_row_,'style','style={background=gold}');
endcomp;
run;
ods excel close;
%mend;

%type_two(dsn=phase,subjid=pid ,key=visit )

Ksharp_0-1686224634803.png

 

possible
Fluorite | Level 6
Thank you, sir for responding. In my scenario, one of the dataset is recurrence kind of log line/repeating section. So, the visit is repeating for few datapoint. So, I add the repeating section datapoint to the catx statement and was able to get generate the report. For other datasets, reach out to get a more stable data. thank you so much sir for helping me.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1929 views
  • 4 likes
  • 5 in conversation