Hi
Need help to Highlight mismatched rows(compare 2-3 , 4-5,6-7 rows so on dynamically) data Using SAS while generating a multi sheet excel file .
SAS is on Unix server. I am using SAS EG 7.1
Below is the data I have to compare every two rows and highlight the mismatched data while generating the Excel and send it via email.
Note - there are multiple sheets and columns vary (sometimes there maybe 10 columns, sometimes it may be 18)
I have SAS code(used ODS) to generate excel with multiple sheets and send it via email and separate a VBA macro to highlight the mismatched data which I run manually. But I am unable to integrate both in to single code so that no manual work is needed, I really appreciate if anyone can suggest any way to do it in one single code.
Below is sample dataset , rows and column are dynamically created based on the project. Need to compare 2-3,4-5, 6-7..so-on rows and highlight(I have given blue to the font but i need whole box as yellow) only when they don’t match.
ssn | firstname | lastname | empid | address | city | state | zip |
000-11-1234 | abc | d | 1001 | plotno 123 | abcd | rd | 12345 |
000-11-1234 | abc | e | 1001 | plotno 123 | abcd | rd | 12345 |
111-12-3456 | efgh | f | 2001 | house 897 | tyui | fg | 23456-789 |
111-12-3456 | efgh | f | 2001 | hs 897 | tyui | fg | 23456 |
444-56-7890 | wrt | y | 3001 |
|
| xc | 12345 |
444-56-7890 | wert | y | 3002 |
|
| xc | 12345 |
my VBA code:
Sub compare_Auto()
Dim LastColumn As Long
LastColumn = ActiveSheet.Range("A1").CurrentRegion.Columns.Count
Dim rng As Range, rngDiff As Range
Set rng = Range(Cells(2, 7), Cells(3, LastColumn))
Do While Application.CountA(rng) > 0 'loop while have content
Set rngDiff = Nothing 'reset range
On Error Resume Next 'ignore "No cells were found" error if no differences
Set rngDiff = rng.ColumnDifferences(Comparison:=rng.Cells(1))
On Error GoTo 0 'stop ignoring errors
If Not rngDiff Is Nothing Then
rngDiff.Interior.ColorIndex = 6 'color all differences
rngDiff.Offset(-1).Interior.ColorIndex = 6
End If
Set rng = rng.Offset(2) 'two rows down
Loop
End Sub
You could have an Excel file with no data that contains the VBA, and use SAS to add data to the file (or rather to a copy of the file as you want to keep the original file untouched, ready for the next run).
You should be able to use PROC REPORT to highlight the cells, so no need for any Excel code.
Can you explain the rule for highlighting a value? (and don't use cell numbers).
Are just just comparing to the first observation in the BY group?
Or just to the previous observation?
Instead of stacking the datasets into one, use PROC COMPARE to find the mismatches, and create a report from that. No need for VBA coding at all.
So if every SSN has two observations then it should be easy to make the FLAG variables using BY group processing an LAG() function.
First let's convert your listing back into a dataset.
data have;
infile cards dsd dlm='|' truncover;
length ssn $11 firstname lastname $30 empid $10 address $80 city $30 state $2 zip $9;
input ssn--zip;
cards;
000-11-1234|abc|d|1001|plotno 123|abcd|rd|12345
000-11-1234|abc|e|1001|plotno 123|abcd|rd|12345
111-12-3456|efgh|f|2001|house 897|tyui|fg|23456-789
111-12-3456|efgh|f|2001|hs 897|tyui|fg|23456
444-56-7890|wrt|y|3001|||xc|12345
444-56-7890|wert|y|3002|||xc|12345
;
Now we can make some flag variables and remerge it with the data.
data flags;
set have;
by ssn;
keep ssn F_: ;
F_firstname=firstname ne lag(firstname);
F_lastname=lastname ne lag(lastname);
F_empid=empid ne lag(empid);
F_address=address ne lag(address);
F_city=city ne lag(city);
F_state=state ne lag(state);
F_zip=zip ne lag(zip);
if last.ssn;
run;
data want;
merge have flags;
by ssn;
run;
Now we can use compute blocks to set the text color based on the flag variable values.
proc report data=want;
column ssn f_: firstname--zip;
define f_: / noprint display;
compute firstname;
if f_firstname then call define(_col_, "style", "style=[color=blue]");
endcomp;
compute lastname;
if f_lastname then call define(_col_, "style", "style=[color=blue]");
endcomp;
compute empid;
if f_empid then call define(_col_, "style", "style=[color=blue]");
endcomp;
compute address;
if f_address then call define(_col_, "style", "style=[color=blue]");
endcomp;
compute city;
if f_city then call define(_col_, "style", "style=[color=blue]");
endcomp;
compute state;
if f_state then call define(_col_, "style", "style=[color=blue]");
endcomp;
compute zip;
if f_zip then call define(_col_, "style", "style=[color=blue]");
endcomp;
run;
Result
100 columns is a pain to work with. I would probably transpose the data to have one record per SSN-VariableName-Value. That will make it easy to check for each SSN-VariableName whether the two values match or not. Probably better to keep numeric and character values separate, so SSN-VariableName-NumericValue-CharacterValue.
Then to make the wide report, you can either transpose back to wide format, or use some fancy PROC REPORT across variable or whatever.
The other option would be to do it with arrays or macros, but it gets loopy.
Just use code generation.
So get the list of variables:
proc contents data=have(drop=ssn) noprint out=contents; run;
proc sort data=contents; by varnum; run;
Then use it to write some code:
filename code temp;
data _null_;
set contents end=eof;
file code;
if _n_=1 then put
'data flags;'
/' set ' libname +(-1) '.' memname ';'
/' by ssn;'
;
put 'F__' _n_ '=' name 'ne lag(' name ');';
if eof then put
' if last.ssn;'
/' keep ssn F__: ;'
/'run;'
/'data want;'
/' merge ' libname +(-1) '.' memname 'flags;'
/' by ssn;'
/'run;'
;
run;
data _null_;
set contents end=eof;
file code mod ;
if _n_=1 then put
'proc report data=want;'
/' column ssn f__: ' @
;
put name @ ;
if eof then put
';'
/' define f__: / noprint display;'
;
run;
data _null_;
set contents end=eof;
file code mod;
put
' compute ' name ';'
/' if f__' _n_ 'then call define(_col_, "style", "style=[color=blue]");'
/' endcomp;'
;
if eof then put
'run;'
;
run;
Then run the code:
%include code / source2;
Result:
67 +data flags; 68 + set WORK.HAVE ; 69 + by ssn; 70 +F__1 =firstname ne lag(firstname ); 71 +F__2 =lastname ne lag(lastname ); 72 +F__3 =empid ne lag(empid ); 73 +F__4 =address ne lag(address ); 74 +F__5 =city ne lag(city ); 75 +F__6 =state ne lag(state ); 76 +F__7 =zip ne lag(zip ); 77 + if last.ssn; 78 + keep ssn F__: ; 79 +run; NOTE: There were 6 observations read from the data set WORK.HAVE. NOTE: The data set WORK.FLAGS has 3 observations and 8 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 80 +data want; 81 + merge WORK.HAVE flags; 82 + by ssn; 83 +run; NOTE: There were 6 observations read from the data set WORK.HAVE. NOTE: There were 3 observations read from the data set WORK.FLAGS. NOTE: The data set WORK.WANT has 6 observations and 15 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 84 +proc report data=want; 85 + column ssn f__: firstname lastname empid address city state zip ; 86 + define f__: / noprint display; 87 + compute firstname ; 88 + if f__1 then call define(_col_, "style", "style=[color=blue]"); 89 + endcomp; 90 + compute lastname ; 91 + if f__2 then call define(_col_, "style", "style=[color=blue]"); 92 + endcomp; 93 + compute empid ; 94 + if f__3 then call define(_col_, "style", "style=[color=blue]"); 95 + endcomp; 96 + compute address ; 97 + if f__4 then call define(_col_, "style", "style=[color=blue]"); 98 + endcomp; 99 + compute city ; 100 + if f__5 then call define(_col_, "style", "style=[color=blue]"); 101 + endcomp; 102 + compute state ; 103 + if f__6 then call define(_col_, "style", "style=[color=blue]"); 104 + endcomp; 105 + compute zip ; 106 + if f__7 then call define(_col_, "style", "style=[color=blue]"); 107 + endcomp; 108 +run; NOTE: There were 6 observations read from the data set WORK.WANT. NOTE: PROCEDURE REPORT used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
Thanks so much. It worked, i used macro to create multiple sheets with different columns with the above dynamic code .Only this is its taking so much time when there is more data.
ex- it took 25minutes to process 60k rows.
sorry for the delayed response, i was sick so unable to test the code earlier.
Agree with others, I would do all the logic / crunching in SAS, and just right the report to Excel with traffic lighting. I wouldn't use VBA, it only adds complexity.
What is the logic for the comparison, is it just to highlight any values that are not constant within a group of records with a single SSN? Or in other words, for every variable there should be only one distinct value within an SSN group? Do you always have two records per SSN, or could there be more than two?
The following code is I wrote before to compare two SAS database to check the difference of row between two tables (they have the same table name,but in different SAS database).
You need to transform my code to your case,since mine is coding for CDISC standard.
%let old_data=D:\XiaKeShan\listing编程需求\LKM-2023-BCG01_study_data_sas_20240117134719\dataset ; *the path for old database;
%let new_data=D:\XiaKeShan\listing编程需求\LKM-2023-BCG01_study_data_sas_20240129111311\dataset ; *the path for new database;
%let out= D:\XiaKeShan\MM listing编程需求 ; /*the path of output excel file (want.xlsx)*/
%let subjid=subject_code; *the subject id variable which is unique in all table. 注意:所有的数据集中是一样的,且值是唯一的;
%let drop= SITE_NAME VERSION_NAME PI_NAME USERNAME SUBMIT_TIME UPDATE_USER UPDATE_TIME ; *the variables you need to drop from all tables 需要去掉的变量名;
options validvarname=any validmemname=extend mrecall nofmterr nonumber nodate dkricond=nowarn dkrocond=nowarn;
/*******比较 前一次数据 和 新数据********/
libname old v9 "&old_data." access=readonly;
libname new v9 "&new_data." access=readonly;
%macro type_one(idx=,dsn=,title=);
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&dsn.(drop=&drop.) out=old;by &subjid. ;run;
proc sort data=new.&dsn.(drop=&drop.) out=new;by &subjid. ;run;
proc compare data=old compare=new noprint out=diff_&dsn.;
id &subjid.;
run;
data call_define;
set diff_&dsn.(drop=_TYPE_ _OBS_);
retain _dummy_ .;
array n{*} _numeric_;
array c{*} _character_;
length vname $ 40;
do i=1 to dim(n);
if n{i} not in (. 0) then do;vname=vname(n{i}); if upcase(vname) not in ("%upcase(&subjid.)") then output;end;
end;
do i=1 to dim(c);
if findc(c{i},'X') then do;vname=vname(c{i}); if upcase(vname) not in ("%upcase(&subjid.)") then output;end;
end;
keep &subjid. vname;
run;
filename t temp;
data _null_;
set call_define;
file t;
/* put 'if &subjid.="' &subjid. '" then call define("' vname '","style","style={background=yellow}");';*/
put "if &subjid.='" &subjid. "' then call define('" vname "','style','style={background=yellow}');";
run;
data &dsn.;
if _n_=1 then do;
length new_change $ 40;
if 0 then set old.&dsn.;
declare hash h(dataset:"old.&dsn.");
h.definekey("&subjid.");
h.definedone();
declare hash h1(dataset:'call_define');
h1.definekey("&subjid.");
h1.definedone();
end;
set new.&dsn.;
new_change='old' ;
if h.check() ne 0 then new_change='new' ;
if h1.check() = 0 then new_change='change' ;
label new_change='数据状态';
run;
/*ods excel file="&out.\&idx..&dsn..xlsx" options(sheet_name="&dsn." autofilter='all' ) ;*/
ods excel options(sheet_name="&title." Embedded_Titles='yes' autofilter='all' ) ;
title j=l link="#目录!A1" "(*ESC*)S={textdecoration=underline color=blue }返回目录";
proc report data=&dsn. split='*' nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
endcomp;
run;
/*ods excel close;*/
%mend;
%macro type_two(idx=,dsn=,key=,title=);
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&dsn.(drop=&drop.) out=old;by &subjid. &key. ;run;
proc sort data=new.&dsn.(drop=&drop.) out=new;by &subjid. &key. ;run;
proc compare data=old compare=new noprint out=diff_&dsn.;
id &subjid. &key. ;
run;
data call_define;
set diff_&dsn.(drop=_TYPE_ _OBS_);
retain _dummy_ .;
array n{*} _numeric_;
array c{*} _character_;
length vname $ 40;
do i=1 to dim(n);
if n{i} not in (. 0) then do;vname=vname(n{i});if upcase(vname) not in ("%upcase(&subjid.)" "%upcase(&key.)") then output;end;
end;
do i=1 to dim(c);
if findc(c{i},'X') then do;vname=vname(c{i});if upcase(vname) not in ("%upcase(&subjid.)" "%upcase(&key.)") then output;end;
end;
keep &subjid. &key. vname;
run;
filename t temp;
data _null_;
set call_define;
file t;
put "if &subjid.='" &subjid. "' and &key.='" &key. "' then call define('" vname "','style','style={background=yellow}');";
run;
data &dsn.;
if _n_=1 then do;
length new_change $ 40;
if 0 then set new.&dsn.;
if 0 then set old.&dsn.;
declare hash h(dataset:"old.&dsn.");
h.definekey("&subjid.","&key.");
h.definedone();
declare hash h1(dataset:'call_define');
h1.definekey("&subjid.","&key.");
h1.definedone();
end;
set new.&dsn.;
new_change='old' ;
if h.check() ne 0 then new_change='new' ;
if h1.check() = 0 then new_change='change' ;
label new_change='数据状态';
run;
/*ods excel file="&out.\&idx..&dsn..xlsx" options(sheet_name="&dsn." autofilter='all' ) ;*/
ods excel options(sheet_name="&title." Embedded_Titles='yes' autofilter='all' ) ;
title j=l link="#目录!A1" "(*ESC*)S={textdecoration=underline color=blue }返回目录";
proc report data=&dsn. split='*' nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
endcomp;
run;
/*ods excel close;*/
%mend;
%macro type_three(idx=,dsn=,key=,key2=,title=);
%local value lower upper ;
%if %upcase(%substr(&dsn.,1,2))=LB %then %do;
proc sql noprint;
select name into :value trimmed from dictionary.columns where libname='OLD' and memname="%upcase(&dsn.)" and label='检查结果';
select name into :lower trimmed from dictionary.columns where libname='OLD' and memname="%upcase(&dsn.)" and label='正常值下限';
select name into :upper trimmed from dictionary.columns where libname='OLD' and memname="%upcase(&dsn.)" and label='正常值上限';
quit;
%put &=value. &=lower. &=upper.;
%end;
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&dsn.(drop=&drop.) out=old;by &subjid. &key. &key2.;run;
proc sort data=new.&dsn.(drop=&drop.) out=new;by &subjid. &key. &key2.;run;
proc compare data=old compare=new noprint out=diff_&dsn.;
id &subjid. &key. &key2.;
run;
data call_define;
set diff_&dsn.(drop=_TYPE_ _OBS_);
retain _dummy_ .;
array n{*} _numeric_;
array c{*} _character_;
length vname $ 40;
do i=1 to dim(n);
if n{i} not in (. 0) then do;vname=vname(n{i});if upcase(vname) not in ("%upcase(&subjid.)" "%upcase(&key.)" "%upcase(&key2.)") then output;end;
end;
do i=1 to dim(c);
if findc(c{i},'X') then do;vname=vname(c{i});if upcase(vname) not in ("%upcase(&subjid.)" "%upcase(&key.)" "%upcase(&key2.)") then output;end;
end;
keep &subjid. &key. &key2. vname;
run;
filename t temp;
data _null_;
set call_define;
file t;
put "if &subjid.='" &subjid. "' and &key.='" &key. "' and &key2.='" &key2. "' then call define('" vname "','style','style={background=yellow}');";
run;
data &dsn.;
if _n_=1 then do;
length new_change $ 40;
if 0 then set old.&dsn.;
declare hash h(dataset:"old.&dsn.");
h.definekey("&subjid.","&key.","&key2.");
h.definedone();
declare hash h1(dataset:'call_define');
h1.definekey("&subjid.","&key.","&key2.");
h1.definedone();
end;
set new.&dsn.;
new_change='old' ;
if h.check() ne 0 then new_change='new' ;
if h1.check() = 0 then new_change='change' ;
label new_change='数据状态';
%if %upcase(%substr(&dsn.,1,2))=LB %then %do;
_value=input(&value.,?? best.);
_lower=input(&lower.,?? best.);
_upper=input(&upper.,?? best.);
if not missing(_value) and not missing(_lower) then do; if _value<_lower then &value.=catx(' ',&value.,'-'); end;
if not missing(_value) and not missing(_upper) then do; if _value>_upper then &value.=catx(' ',&value.,'+'); end;
drop _value _lower _upper;
%end;
run;
/*ods excel file="&out.\&idx..&dsn..xlsx" options(sheet_name="&dsn." autofilter='all' ) ;*/
ods excel options(sheet_name="&title." Embedded_Titles='yes' autofilter='all' ) ;
title j=l link="#目录!A1" "(*ESC*)S={textdecoration=underline color=blue }返回目录";
proc report data=&dsn. split='*' nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
endcomp;
run;
/*ods excel close;*/
%mend;
/********
注意:
1)目录的值和宏参数(title=)的值保持一致 。
2)目录的值不能包括 /\*~ 等非法字符,因为Sheet名无法显示这些字符。
********/
data 目录;
infile cards truncover expandtabs;
input 序号 目录 $80.;
cards;
1 标题
2 签署知情同意书
3 访视日期
4 人口学资料
5 非膀胱癌既往病史-手术史问询
6 非膀胱癌既往病史-手术史
7 过敏史
8 一般问询
9 膀胱癌病史
10 膀胱癌既往治疗史问询
11 膀胱癌既往治疗史
12 体格检查
13 生命体征问询
14 生命体征
15 ECOG评分
16 胸部X线片
17 超声心动图
18 十二导联心电图
19 C-TST试验
20 血妊娠试验
21 血常规
22 尿常规
23 血生化
24 肌酐
25 凝血功能
26 传染病筛查
27 泌尿系统彩超
28 影像学检查
29 入选排除标准问询
30 入选排除标准
31 筛选结果
32 给药记录表(1周1次)
33 PK采血
34 细胞因子血样采集
35 细胞因子尿样采集
36 药物脱落情况尿样采集
37 不良事件问询
38 不良事件
39 既往或合并用药问询
40 既往或合并用药
41 既往或合并非药物治疗问询
;
ods noresults;
ods _all_ close;
ods excel file="&out.\want.xlsx" ;
ods excel options(sheet_name="目录" Embedded_Titles='yes' ) ;
title "医学监查数据审核表";
proc report data=目录 split='*' nowd;
compute 目录;
call define(_col_,'url',cats('#''',目录,'''!A1') );
call define(_col_,'style','style={textdecoration=underline color=blue}');
endcomp;
run;
%type_one(idx=1,dsn=PRIMARY,title=标题)
%type_one(idx=2,dsn=DS,title=签署知情同意书)
%type_two(idx=3,dsn=SV1,key=folder_name,title=访视日期)
%type_one(idx=4,dsn=DM,title=人口学资料)
%type_one(idx=5,dsn=MH1CO ,title=非膀胱癌既往病史-手术史问询)
%type_two(idx=6,dsn=MH1,key=spid,title=非膀胱癌既往病史-手术史)
%type_one(idx=7,dsn=MH2,title=过敏史)
%type_one(idx=8,dsn=SU,title=一般问询)
%type_one(idx=9,dsn=MH3,title=膀胱癌病史)
%type_one(idx=10,dsn=MH4CO,title=膀胱癌既往治疗史问询)
%type_two(idx=11,dsn=MH4,key=spid,title=膀胱癌既往治疗史)
%type_three(idx=12,dsn=PE,key=folder_name,key2=OID,title=体格检查)
%type_two(idx=13,dsn=VSCO,key=folder_name ,title=生命体征问询)
%type_three(idx=14,dsn=VS,key=spid,key2=OID,title=生命体征)
%type_two(idx=15,dsn=QS,key=folder_name,title=ECOG评分)
%type_two(idx=16,dsn=FA1,key=folder_name,title= 胸部X线片)
%type_two(idx=17,dsn=FA2,key=folder_name,title=超声心动图)
%type_two(idx=18,dsn=EG,key=folder_name,title=十二导联心电图)
%type_two(idx=19,dsn=TST,key=folder_name,title=C-TST试验)
%type_three(idx=20,dsn=LBHCG,key=folder_name,key2=OID ,title=血妊娠试验)
%type_three(idx=21,dsn=LBBR,key=folder_name,key2=OID,title=血常规)
%type_three(idx=22,dsn=LBUR,key=folder_name,key2=OID,title=尿常规)
%type_three(idx=23,dsn=LBBB,key=folder_name,key2=OID,title=血生化)
%type_three(idx=24,dsn=LBSCR,key=folder_name,key2=OID,title=肌酐)
%type_three(idx=25,dsn=LBCG,key=folder_name,key2=OID,title=凝血功能)
%type_two(idx=26,dsn=IFD,key=folder_name,title=传染病筛查)
%type_three(idx=27,dsn=FA3,key=folder_name,key2=OID,title=泌尿系统彩超)
%type_two(idx=28,dsn=FA4,key=folder_name,title=影像学检查)
%type_two(idx=29,dsn=IECO,key=folder_name,title=入选排除标准问询)
%type_two(idx=30,dsn=IE,key=spid,title=入选排除标准)
%type_one(idx=31,dsn=DS1 ,title=筛选结果)
%type_two(idx=32,dsn=EX1,key=OID,title=给药记录表(1周1次))
%type_two(idx=33,dsn=PC,key=OID ,title=PK采血)
%type_two(idx=34,dsn=CB,key=OID,title=细胞因子血样采集)
%type_two(idx=35,dsn=CU,key=OID,title=细胞因子尿样采集)
%type_two(idx=36,dsn=MU,key=OID,title=药物脱落情况尿样采集)
%type_one(idx=37,dsn=AECO,title=不良事件问询)
%type_two(idx=38,dsn=AE,key=spid,title=不良事件)
%type_one(idx=39,dsn=CMCO,title=既往或合并用药问询)
%type_two(idx=40,dsn=CM,key=spid,title=既往或合并用药)
%type_one(idx=41,dsn=CM1CO,title=既往或合并非药物治疗问询)
ods excel close;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.