I have some large files I'm running through PROC COMPARE under SAS 9.3 running on an AIX machine with code submitted via SAS Enterprise Guide 7.1. The datasets are 8 million rows plus each with 230 columns, and there might be thousands of differences. I'm creating a spreadsheet using ODS ExcelXP that contains the BASE, COMPARE, and DIF records. I want to highlight the cells that have discrepancies of just the DIF records. I've got this working for numeric values just fine using the following format:
PROC FORMAT;
VALUE Highlight_Num . - .E = 'WhiteSmoke'
-9999999999 - -1 = 'Pink'
0 - 9999999998 = 'Red'
;
What I can't seem to figure out is how to highlight discrepances in character variables. PROC COMPARE returns periods when there is equality and an "X" in any position that has a discrepancy. For example, if the third from the left character in a seven position character variable differed between the BASE and COMPARE datasets, SAS would return the following in the DIF record:
..X....
Basically, if there's an "X" anywhere in a DIF record, I want to highlight the cell that the "X" occurs in. Yes, I could just programmatically change cells that have a mixture of periods and "X's" to just a single "X" and format accordingly, but then I would lose the intelligence of knowing which position the discrepancy occurred in. Some of my character fields are 84 positions long and are a string of codes. It's difficult to spot by eye exactly where the discrepancy lies without the '.......X......' showing you where the discrepancy is.
I can't imagine coding all possible combinations of periods and X's in an 84 position character field. I was hoping to use a regular expression something like the following:
VALUE $Highlight_Char '/X/' (REGEXP) = 'Yellow'
OTHER = 'WhiteSmoke'
;
But this doesn't appear to be valid SAS code. I'm getting the following message:
38 '/X/' (REGEXP) = 'Yellow'
_
22
76
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,
a missing value, LOW.
ERROR 76-322: Syntax error, statement will be ignored.
Can anyone point me in the right direction to highlight only cells with discepancies in only the DIF rows from a PROC COMPARE for character variables?
Thanks,
Jim
For reference, here's the code I'm using to create the spreadsheet:
ODS TAGSETS.ExcelXP options(frozen_headers='Yes' FontName="Calibri" Size="8") FILE="&DifOnly" PATH="&XMLlib." STYLE=styles.XLsansPrinter;
PROC PRINT DATA=WORK.&Dif.2 (OBS=&PrintObs2)
;
** Apply colors to numeric and character variables . Use $Highlight_Cntl, Highlight_Num, and $Highlight_Char with DIF only. **;
VAR _OBS_
/ STYLE(DATA) = [BACKGROUND=Highlight_Reg. font=("Calibri",8pt) ]
;
VAR _TYPE_
/ STYLE(DATA) = [BACKGROUND=$Highlight_Cntl. font=("Calibri",8pt) ]
;
VAR TIP_PIN_N
/* TIP_PIN */
/ STYLE(DATA) = [BACKGROUND=Highlight_Reg. font=("Calibri",8pt) ]
;
VAR TIP_TIN
/ STYLE(DATA) = [BACKGROUND=$Highlight_Cntl. font=("Calibri",8pt) ]
;
VAR TIP_ACCOUNT_NUMBER
TIP_TRADE_BASE_DT_MDCY
TIP_STATE_CODE
TIP_ID
/* TIP_SPECIAL_COMMENT */
TIP_ENHANCED_SPECIAL_CMT
TIP_DATE_OPENED
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_DATE_OPENED_MONTH
TIP_DATE_OPENED_DAY
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_DATE_REPORTED
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_DATE_REPORTED_MONTH
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_AMT1_QUAL
TIP_AMT2_QUAL
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_AMOUNT_1
TIP_AMOUNT_2
TIP_INQUIRY_AMOUNT
TIP_PUB_REC_AMOUNT
TIP_BKRPT_LIABILITY_AMT
TIP_SUBSCRIBER_ID
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_ACCT_TYPE_CD
TIP_ENHANCED_ACCT_TYPE_CD
TIP_TERMS
TIP_TERMS_V08
TIP_TERMS_FREQ_CODE
TIP_STATUS_CODE
TIP_ENHANCED_STATUS_CODE
TIP_ACCOUNT_COND_CODE
TIP_LEGAL_DESIGNATOR_CODE
TIP_ECOA_CODE
TIP_TR_BALANCE_DATE
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_TR_BAL_DATE_MONTH
TIP_TR_BAL_DATE_DAY
TIP_BALANCE_AMOUNT
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_PR_STATUS_DATE
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_R_STATUS_DATE_MONTH
TIP_PR_STATUS_DATE_DAY
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_PR_FILE_DATE
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_PR_FILE_DATE_MONTH
TIP_PR_FILE_DATE_DAY
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_INQUIRY_DATE
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_INQUIRY_DATE_MONTH
TIP_INQUIRY_DATE_DAY
TIP_AMOUNT_PAST_DUE
TIP_SCH_PMT_AMOUNT
TIP_ACT_PMT_AMOUNT
TIP_BALLOON_PMT_AMOUNT
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_DEFER_FUTURE_PAY_DATE
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_PAST_PMT_DATE_MONTH
TIP_FUT_PMT_DATE_MONTH
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_LAST_PMT_DATE
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_LAST_PMT_DATE_MONTH
TIP_COMPANY_ID
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_KOB_CODE
TIP_ORIG_CREDITOR_CODE
TIP_SEC_AGENCY_CODE
TIP_CII_IND
TIP_CCC_CODE
TIP_PORTFOLIO_CODE
TIP_MORTGAGE_ID
TIP_GRID_V08
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_CREDIT_LIMIT
TIP_HIGH_BALANCE
TIP_CHARGEOFF_AMOUNT
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_DELINQ_DATE_1
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_DELINQ_DATE_1_MONTH
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_DELINQ_DATE_2
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_DELINQ_DATE_2_MONTH
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_MAX_DELINQ_DATE
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_MAX_DELINQ_DATE_MONTH
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_MAX_DELINQ_CODE
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
VAR TIP_30_DAY_DELINQ_CT
TIP_60_DAY_DELINQ_CT
TIP_90_DAY_DELINQ_CT
TIP_DEROG_CT
TIP_ORIGINAL_LOAN_AMT
/ STYLE(DATA) = [BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
;
VAR TIP_SPECIAL_PAYMENT_CODE
TIP_TRADE_INCLUSION_FLAG
/ STYLE(DATA) = [BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
;
RUN;
ODS TAGSETS.ExcelXP CLOSE;
Hi, Cynthia,
I was hoping someone would suggest something easy that I had overlooked. I was hoping to not have to re-write the code. Alas, it seems not to be.
Too bad a regular expression can't be used on a VALUE but only on an INVALUE. Rick Langston's paper had me going for a minute there.
I guess I can pick my poison:
1. Add a separate line to the spreadsheet with unique values for highlighting.
2. Reduce all ".....X......X...." type values to a single "X" and create a format to highlight the single "X."
3. Use VBA or highlighting rules inside of Excel
4. Re-write the code in PROC REPORT
I'll probably just go the Excel route since I've already spent too much time on the SAS code for a simple report.
Thanks, Cynthia,
HJ
I tried the following:
VALUE $Highlight_Char '^' = 'Black'
'X' =: 'Yellow'
'.X' =: 'Yellow'
'..X' =: 'Yellow'
'...X' =: 'Yellow'
'....X' =: 'Yellow'
'.....X' =: 'Yellow'
'......X' =: 'Yellow'
'.......X' =: 'Yellow'
'........X' =: 'Yellow'
'.........X' =: 'Yellow'
'..........X' =: 'Yellow'
'...........X' =: 'Yellow'
'............X' =: 'Yellow'
'.............X' =: 'Yellow'
'..............X' =: 'Yellow'
'...............X' =: 'Yellow'
'................X' =: 'Yellow'
'.................X' =: 'Yellow'
'..................X' =: 'Yellow'
'...................X' =: 'Yellow'
'....................X' =: 'Yellow'
'.....................X' =: 'Yellow'
'......................X' =: 'Yellow'
'.......................X' =: 'Yellow'
'........................X' =: 'Yellow'
OTHER = 'WhiteSmoke'
;
But it causes the generated xml to be unreadable by Excel. The following gets inserted into the xml:
<Style ss:ID="data__l_3" ss:Parent="data__l"> <Font ss:FontName="Calibri" ss:Size="8" /> <Interior ss:Color=":Yellow" ss:Pattern="Solid" /> <Protection ss:Protected="1" /> <NumberFormat ss:Format="General" /> </Style>
The ":Yellow" causes an error in Excel. This appears to be a bug in ODS.
Jim
Incomplete code is pretty hard to diagnose. It appears that you may have used Proc Format to create a character format. Is that correct?
Any reason that you used =: instead of =?
If you do a simple print or put using that format you will see that the : is put by the format and would not be a valid colorname.
Hi:
I would stop trying the FORMAT approach, and instead look to ESCAPECHAR and "in-line" formatting, as described in this paper: http://www2.sas.com/proceedings/forum2007/099-2007.pdf -- the paper was originally written in 2007 and shows the "original" style ESCAPECHAR syntax, but also has examples of the newer syntax that was introduced in SAS 9.2.
My program produces this output -- where all the X in the string are made red. (you can take out the PUTLOG after you review the log and see how the MODSTR variable is being created.
I didn't run a PROC COMPARE -- this is just a proof of concept that you can change one character in a string but not with a format. ODS ESCAPECHAR with the STYLE function allows you to specify a style attribute change (such as color=red) for 1 character in a string. Using the TRANWRD function allows me to change all the occurences of X in the string to have the appropriate ESCAPECHAR function syntax.
As you can see from the screen shot, the periods are black and the X is red in both HTML and in Excel when the TAGSETS.EXCELXP output is opened.
cynthia
here's the code:
options ls=256;
** to code this, you have to know what your ESCAPECHAR is going to be;
** in this example, it is set to ^;
data fakedata;
length charstr $100 modstr $1000 ;
infile datalines;
input charstr $;
countX = count(charstr,'X');
modstr=tranwrd(charstr,'X',"^{style[color=red]X}" );
putlog _n_= modstr=;
return;
datalines;
.....X.....X..........X.........X.....
X...X........X......X........X........
.......X...X...X...X...X...X...X.....X
;
run;
ods _all_ close;
** declare the ESCAPECHAR that is in the data;
ods escapechar='^';
ods html(id=ht) file='c:\temp\colorX.html';
ods tagsets.excelxp(id=xp) file='c:\temp\colorX.xml' style=htmlblue
options(autofit_height='yes');
proc report data=fakedata
style(column)={font_face='Courier New'};
column charstr modstr countX;
define charstr / 'Original character string'
style(column)={width=4in};
define modstr / 'Only X is red'
style(column)={width=4in};
define countx / 'Count of X';
run;
ods html(id=ht) close;
ods tagsets.excelxp(id=xp) close;
OK, you were absolutely right, @Cynthia_sas. PROC REPORT is absolutely the way to go; the key functionality being the ability to highlight a cell based on values in other cells as well as the particular cell itself.
Basically, I wanted to highlight cells with diferences in the the DIF records from a PROC COMPARE. If there's a numeric difference, that's easy; just use a FORMAT with a range of values and a corresponding color. It was the character variables I was having trouble with.
The solution is fairly staightforward; sample code is shown below. If there's an "X" anywhere in the cell, highlight the cell in yellow to draw attention to it (you need that when your data is 230 columns wide) -- AND only do this for DIF records. If I have an "X" in one of the BASE or COMPARE records, for example "Xerox" or "242XB", I do NOT want to highlight that.
COMPUTE TIP_ACCOUNT_NUMBER;
IF _TYPE_ = 'DIF' THEN
DO;
IF INDEX(TIP_ACCOUNT_NUMBER ,'X') > 0 THEN
CALL DEFINE ( _col_,'style','style=[BACKGROUND=Yellow TAGATTR="Text" font=("Calibri",8pt) ]');
END;
ENDCOMP;
Thanks,
Jim
Thank you very much, Cynthia.
I think the take away here at least for me is that if I'm doing something "quick and dirty", then proc print may well be fine, but if I need to be very specific about highlighting and such, then I need to go with proc report.
Jim
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.