BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jimbarbour
Meteorite | Level 14

I have some large files I'm running through PROC COMPARE under SAS 9.3 running on an AIX machine with code submitted via SAS Enterprise Guide 7.1.  The datasets are 8 million rows plus each with 230 columns, and there might be thousands of differences.  I'm creating a spreadsheet using ODS ExcelXP that contains the BASE, COMPARE, and DIF records.  I want to highlight the cells that have discrepancies of just the DIF records.  I've got this working for numeric values just fine using the following format:

PROC	FORMAT;
	VALUE	Highlight_Num	.	-	.E		=	'WhiteSmoke'
					-9999999999	-	-1		=	'Pink'	
					0	-	9999999998		=	'Red'	
					;

 

What I can't seem to figure out is how to highlight discrepances in character variables.  PROC COMPARE returns periods when there is equality and an "X" in any position that has a discrepancy.  For example, if the third from the left character in a seven position character variable differed between the BASE and COMPARE datasets, SAS would return the following in the DIF record:

..X....

 

Basically, if there's an "X" anywhere in a DIF record, I want to highlight the cell that the "X" occurs in.  Yes, I could just programmatically change cells that have a mixture of periods and "X's" to just a single "X" and format accordingly, but then I would lose the intelligence of knowing which position the discrepancy occurred in.  Some of my character fields are 84 positions long and are a string of codes.  It's difficult to spot by eye exactly where the discrepancy lies without the '.......X......' showing you where the discrepancy is.

 

I can't imagine coding all possible combinations of periods and X's in an 84 position character field.  I was hoping to use a regular expression something like the following:

	VALUE	$Highlight_Char	'/X/'	(REGEXP)	=	'Yellow'
				OTHER			=	'WhiteSmoke'
				;

But this doesn't appear to be valid SAS code. I'm getting the following message:

38         		'/X/'	(REGEXP)		=	'Yellow'
                        _
                        22
                        76
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant, 
              a missing value, LOW.  
ERROR 76-322: Syntax error, statement will be ignored.

Can anyone point me in the right direction to highlight only cells with discepancies in only the DIF rows from a PROC COMPARE for character variables?

 

Thanks,

 

Jim

1 ACCEPTED SOLUTION

Accepted Solutions
Cynthia_sas
SAS Super FREQ
Hi:
My thinking would be to use PROC REPORT instead of PROC PRINT because PROC REPORT can highlight one row or one cell based on the value in another cell. Also the compute block in PROC REPORT can include a DO loop.

I don't think the format will do for character variables what you are trying. You might want to check examples of formats in the documentation.

cynthia

View solution in original post

8 REPLIES 8
jimbarbour
Meteorite | Level 14

For reference, here's the code I'm using to create the spreadsheet:

ODS		TAGSETS.ExcelXP	options(frozen_headers='Yes'	FontName="Calibri" Size="8")	FILE="&DifOnly"	PATH="&XMLlib."	STYLE=styles.XLsansPrinter;

PROC	PRINT		DATA=WORK.&Dif.2		(OBS=&PrintObs2)
	;

**  Apply colors to numeric and character variables .  Use $Highlight_Cntl, Highlight_Num, and $Highlight_Char with DIF only. **;
	VAR	_OBS_
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Reg. font=("Calibri",8pt) ]
		;
	VAR	_TYPE_
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Cntl. font=("Calibri",8pt) ]
		;
	VAR	TIP_PIN_N
/*		TIP_PIN	*/
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Reg. font=("Calibri",8pt) ]
		;
	VAR	TIP_TIN
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Cntl. font=("Calibri",8pt) ]
		;
	VAR	TIP_ACCOUNT_NUMBER
		TIP_TRADE_BASE_DT_MDCY
		TIP_STATE_CODE
		TIP_ID
/*		TIP_SPECIAL_COMMENT			*/
		TIP_ENHANCED_SPECIAL_CMT
		TIP_DATE_OPENED
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;	
	VAR	TIP_DATE_OPENED_MONTH		
		TIP_DATE_OPENED_DAY			
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_DATE_REPORTED
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;
	VAR	TIP_DATE_REPORTED_MONTH		
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_AMT1_QUAL				
		TIP_AMT2_QUAL
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;
	VAR	TIP_AMOUNT_1				
		TIP_AMOUNT_2				
		TIP_INQUIRY_AMOUNT			
		TIP_PUB_REC_AMOUNT			
		TIP_BKRPT_LIABILITY_AMT		
		TIP_SUBSCRIBER_ID			
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_ACCT_TYPE_CD			
		TIP_ENHANCED_ACCT_TYPE_CD	
		TIP_TERMS					
		TIP_TERMS_V08				
		TIP_TERMS_FREQ_CODE			
		TIP_STATUS_CODE				
		TIP_ENHANCED_STATUS_CODE	
		TIP_ACCOUNT_COND_CODE		
		TIP_LEGAL_DESIGNATOR_CODE	
		TIP_ECOA_CODE				
		TIP_TR_BALANCE_DATE
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;
	VAR	TIP_TR_BAL_DATE_MONTH		
		TIP_TR_BAL_DATE_DAY			
		TIP_BALANCE_AMOUNT
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_PR_STATUS_DATE
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;	
	VAR	TIP_R_STATUS_DATE_MONTH		
		TIP_PR_STATUS_DATE_DAY		
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_PR_FILE_DATE
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;	
	VAR	TIP_PR_FILE_DATE_MONTH		
		TIP_PR_FILE_DATE_DAY		
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_INQUIRY_DATE
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;	
	VAR	TIP_INQUIRY_DATE_MONTH		
		TIP_INQUIRY_DATE_DAY		
		TIP_AMOUNT_PAST_DUE			
		TIP_SCH_PMT_AMOUNT			
		TIP_ACT_PMT_AMOUNT			
		TIP_BALLOON_PMT_AMOUNT		
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_DEFER_FUTURE_PAY_DATE
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;	
	VAR	TIP_PAST_PMT_DATE_MONTH		
		TIP_FUT_PMT_DATE_MONTH		
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_LAST_PMT_DATE
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;	
	VAR	TIP_LAST_PMT_DATE_MONTH		
		TIP_COMPANY_ID				
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_KOB_CODE				
		TIP_ORIG_CREDITOR_CODE		
		TIP_SEC_AGENCY_CODE			
		TIP_CII_IND					
		TIP_CCC_CODE				
		TIP_PORTFOLIO_CODE			
		TIP_MORTGAGE_ID				
		TIP_GRID_V08				
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;
	VAR	TIP_CREDIT_LIMIT			
		TIP_HIGH_BALANCE			
		TIP_CHARGEOFF_AMOUNT
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;	
	VAR	TIP_DELINQ_DATE_1
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;	
	VAR	TIP_DELINQ_DATE_1_MONTH		
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_DELINQ_DATE_2
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;
	VAR	TIP_DELINQ_DATE_2_MONTH		
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_MAX_DELINQ_DATE
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;
	VAR	TIP_MAX_DELINQ_DATE_MONTH	
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_MAX_DELINQ_CODE
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;
	VAR	TIP_30_DAY_DELINQ_CT		
		TIP_60_DAY_DELINQ_CT		
		TIP_90_DAY_DELINQ_CT		
		TIP_DEROG_CT				
		TIP_ORIGINAL_LOAN_AMT		
		/	STYLE(DATA)	=	[BACKGROUND=Highlight_Num. font=("Calibri",8pt) ]
		;
	VAR	TIP_SPECIAL_PAYMENT_CODE	
		TIP_TRADE_INCLUSION_FLAG
		/	STYLE(DATA)	=	[BACKGROUND=$Highlight_Char. font=("Calibri",8pt) ]
		;
RUN;

ODS  	TAGSETS.ExcelXP	CLOSE;

 

 

Cynthia_sas
SAS Super FREQ
Hi:
My thinking would be to use PROC REPORT instead of PROC PRINT because PROC REPORT can highlight one row or one cell based on the value in another cell. Also the compute block in PROC REPORT can include a DO loop.

I don't think the format will do for character variables what you are trying. You might want to check examples of formats in the documentation.

cynthia
jimbarbour
Meteorite | Level 14

Hi, Cynthia,

 

I was hoping someone would suggest something easy that I had overlooked.  I was hoping to not have to re-write the code.  Alas, it seems not to be.

 

Too bad a regular expression can't be used on a VALUE but only on an INVALUE. Rick Langston's paper had me going for a minute there.

 

I guess I can pick my poison:

1.  Add a separate line to the spreadsheet with unique values for highlighting.

2.  Reduce all ".....X......X...." type values to a single "X" and create a format to highlight the single "X."

3.  Use VBA or highlighting rules inside of Excel

4.  Re-write the code in PROC REPORT

 

I'll probably just go the Excel route since I've already spent too much time on the SAS code for a simple report.  

 

Thanks, Cynthia,

 

HJ

jimbarbour
Meteorite | Level 14

I tried the following:

	VALUE	$Highlight_Char	'^'							=	'Black'
							'X'							=:	'Yellow'
							'.X'						=:	'Yellow'
							'..X'						=:	'Yellow'
							'...X'						=:	'Yellow'
							'....X'						=:	'Yellow'
							'.....X'					=:	'Yellow'
							'......X'					=:	'Yellow'
							'.......X'					=:	'Yellow'
							'........X'					=:	'Yellow'
							'.........X'				=:	'Yellow'
							'..........X'				=:	'Yellow'
							'...........X'				=:	'Yellow'
							'............X'				=:	'Yellow'
							'.............X'			=:	'Yellow'
							'..............X'			=:	'Yellow'
							'...............X'			=:	'Yellow'
							'................X'			=:	'Yellow'
							'.................X'		=:	'Yellow'
							'..................X'		=:	'Yellow'
							'...................X'		=:	'Yellow'
							'....................X'		=:	'Yellow'
							'.....................X'	=:	'Yellow'
							'......................X'	=:	'Yellow'
							'.......................X'	=:	'Yellow'
							'........................X'	=:	'Yellow'
							OTHER						=	'WhiteSmoke'
							;

But it causes the generated xml to be unreadable by Excel.  The following gets inserted into the xml:

<Style ss:ID="data__l_3" ss:Parent="data__l">
<Font ss:FontName="Calibri" ss:Size="8" />
<Interior ss:Color=":Yellow" ss:Pattern="Solid" />
<Protection ss:Protected="1" />
<NumberFormat ss:Format="General" />
</Style>

The ":Yellow" causes an error in Excel.  This appears to be a bug in ODS.

 

Jim

ballardw
Super User

Incomplete code is pretty hard to diagnose. It appears that you may have used Proc Format to create a character format. Is that correct?

 

Any reason that you used =: instead of =? 

If you do a simple print or put using that format you will see that the : is put by the format and would not be a valid colorname.

 

Cynthia_sas
SAS Super FREQ

Hi:

  I would stop trying the FORMAT approach, and instead look to ESCAPECHAR and "in-line" formatting, as described in this paper: http://www2.sas.com/proceedings/forum2007/099-2007.pdf -- the paper was originally written in 2007 and shows the "original" style ESCAPECHAR syntax, but also has examples of the newer syntax that was introduced in SAS 9.2.

 

  My program produces this output -- where all the X in the string are made red. (you can take out the PUTLOG after you review the log and see how the MODSTR variable is being created.

 

only_X_red.png

 

 

  I didn't run a PROC COMPARE -- this is just a proof of concept that you can change one character in a string but not with a format. ODS ESCAPECHAR with the STYLE function allows you to specify a style attribute change (such as color=red) for 1 character in a string. Using the TRANWRD function allows me to change all the occurences of X in the string to have the appropriate ESCAPECHAR function syntax.

 

  As you can see from the screen shot, the periods are black and the X is red in both HTML and in Excel when the TAGSETS.EXCELXP output is opened.

 

cynthia

 

here's the code:

options ls=256;

** to code this, you have to know what your ESCAPECHAR is going to be;
** in this example, it is set to ^;
data fakedata;
length charstr $100 modstr $1000 ;
  infile datalines;
  input charstr $;
  countX = count(charstr,'X');
  modstr=tranwrd(charstr,'X',"^{style[color=red]X}" );
  putlog _n_= modstr=;
return;
datalines;
.....X.....X..........X.........X.....
X...X........X......X........X........
.......X...X...X...X...X...X...X.....X
;
run;
  
ods _all_ close;
** declare the ESCAPECHAR that is in the data;
ods escapechar='^';
ods html(id=ht) file='c:\temp\colorX.html';
ods tagsets.excelxp(id=xp) file='c:\temp\colorX.xml' style=htmlblue
    options(autofit_height='yes');
proc report data=fakedata 
     style(column)={font_face='Courier New'};
  column charstr modstr countX;
  define charstr / 'Original character string'
         style(column)={width=4in};
  define modstr / 'Only X is red'
         style(column)={width=4in}; 
  define countx / 'Count of X';
run;
ods html(id=ht) close;
ods tagsets.excelxp(id=xp) close;
jimbarbour
Meteorite | Level 14

OK, you were absolutely right, @Cynthia_sas.  PROC REPORT is absolutely the way to go; the key functionality being the ability to highlight a cell based on values in other cells as well as the particular cell itself.

 

Basically, I wanted to highlight cells with diferences in the the DIF records from a PROC COMPARE.  If there's a numeric difference, that's easy; just use a FORMAT with a range of values and a corresponding color.  It was the character variables I was having trouble with.  

 

The solution is fairly staightforward; sample code is shown below.  If there's an "X" anywhere in the cell, highlight the cell in yellow to draw attention to it (you need that when your data is 230 columns wide) -- AND only do this for DIF records.  If I have an "X" in one of the BASE or COMPARE records, for example "Xerox" or "242XB", I do NOT want to highlight that.  

COMPUTE	TIP_ACCOUNT_NUMBER;
	IF _TYPE_ = 'DIF' THEN
		DO;
			IF	INDEX(TIP_ACCOUNT_NUMBER ,'X')	>	0 THEN
				CALL DEFINE	( _col_,'style','style=[BACKGROUND=Yellow TAGATTR="Text" font=("Calibri",8pt) ]');
		END;
ENDCOMP;

Thanks,

 

Jim

jimbarbour
Meteorite | Level 14

Thank you very much, Cynthia.

I think the take away here at least for me is that if I'm doing something "quick and dirty", then proc print may well be fine, but if I need to be very specific about highlighting and such, then I need to go with proc report.

Jim

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1637 views
  • 1 like
  • 3 in conversation