BookmarkSubscribeRSS Feed
deleted_user
Not applicable
If there are duplicate values, I would like to highlight the whole row. How would I flag that, then color that row in to show these duplicate values? This is a report that runs monthly, so I wouldn't know what specific values will be duplicated.
8 REPLIES 8
Cynthia_sas
SAS Super FREQ
Hi:
Not exactly sure what you mean by "duplicate values". Do you mean that exactly EVERY value in a particular observation could be duplicated or that only SOME values in a particular observation could be duplicated. How could the values be duplicated -- could you have duplicated obs -- where an entire observation was repeated in the data??? Are the number of duplicated observations limited to just 2 observations or could you have 3 or 4 duplicated observations???

Are you concerned with duplicate values on ALL identifying variables (such as NAME, ID, etc) or are you concerned with duplicate values for only some of the identifying variables -- if ID is the same, you might not care if NAME was different. This is an example where seeing a sample of some fake data would be useful. And seeing code that you've already tried or the PROC REPORT code that you've started with would also be useful -- it would tell us how you're treating your variables (what USAGE you show on the DEFINE statement and whether you want a summary report or a detail report).

However assuming you had a FLAG variable that could be tested, then PROC REPORT can highlight a whole row using a CALL DEFINE statement. In the program below, I made duplicate observations from SASHELP.CLASS which means that EVERY value on the observation was a duplicate. This allowed me to use PROC SORT to create an output dataset that had the NAME (identifying variable) of the duplicated observations. I used this PROC SORT technique, so I did NOT have to compare every variable value in one obs to every other variable value in another obs. Then once the FLAG variable was created, the CALL DEFINE was fairly simple to code, as long as I use the _ROW_ keyword as the value for the first argument to CALL DEFINE.

The code that is relevant to your question is the section marked 3b) -- whatever method you use to create the FLAG variable is up to you. I made up fake data that lent itself to the PROC SORT and MERGE method because you did not describe your particular data scenario in much detail and I didn't want to spend a lot of time setting up a complicated data scenario.

cynthia
[pre]
** 1a) make some duplicates;
** since you did not describe your duplicate scenario, I made one up.;
** I have duplicated observations for Janet and William -- where each of them has 2 rows where;
** every value for every variable is duplicated.;
** Then I have 3 duplicated observations for Robert where ;
** every value for every variable is duplicated;
data class;
set sashelp.class;
** output every student 1 time;
output;
** output an extra row each for Janet and William (2 total rows);
** and output two extra rows (3 total rows) for Robert;
if name in ('Janet', 'William') then output;
if name = 'Robert' then do; output; output; end;
run;

** 1b) show duplicates;
ods listing;
proc print data=class;
title 'Two Rows for Janet and William and Three Rows for Robert';
run;

** 2a) get a file (DUPOUT=dupobs) that contains the duplicates;
proc sort nodup equals data=class out=OK dupout=dupobs(keep=name);
by name sex age height weight;
run;

** 2b) get rid of extra row for Robert to avoid many-many merge warning;
proc sort data=dupobs nodup equals;
by name;
run;

** 2c) show which obs are duplicates;
ods listing;
proc print data=dupobs;
title 'duplicate obs';
run;

** 3a) create the FLAG variable;
** I use NAME as the identifying variable, you might need to use another;
** technique to identify the duplicated observations or duplicate values;
data makeflag;
merge class(in=inclass)
dupobs(in=indups);
by name;
if inclass and indups then flag = 1;
else if inclass and not indups then flag = 0;
run;

** 3b) use PROC REPORT and CALL DEFINE based on the flag variable;
ods listing close;
ods html file='c:\temp\tlite.html' style=sasweb;
proc report data=makeflag nowd;
title 'Report showing Row Highlighting and FLAG';
title2 'Use NOPRINT the DEFINE statement for the FLAG variable';
title3 'To hide the FLAG variable if you do not want to see it on the report';
column sex name flag age height weight;
define sex / order;
define name / display;
define flag / display /* noprint */;
define age /display;
define height / display;
define weight / display;
compute flag;
if flag = 1 then do;
call define(_ROW_,'style','style={background=yellow}');
end;
endcomp;
run;
ods html close;
[/pre]
deleted_user
Not applicable
To be more specific, I have the following:

obs name rate date
1 steve 5.5 5/1/10
2 karen 7.7 10/09/10
3 steve 6.0 8/3/09
4 alana 10.12 5/4/10
5 steve 3.2 6/6/10

From these I want to take any record that has a name that repeats and highlight it in a color. This would be a monthly file, but the duplicate name wouldn't always be 'steve.' I hope this makes more sense.
Robert_Bardos
Fluorite | Level 6
Question to the audience: (having read data _null_'s post regarding temporary variables in PROC REPORT) is it possible to use hash functions within compute blocks to solve this problem?

Miles and miles away from any SAS machine.
Robert
Cynthia_sas
SAS Super FREQ
Ah, nice to have the data. That is an easier problem and no hash tables needed (which do not work in Proc Report I think).

As soon as I get to a computer with SAS I will post the code...will still need CALL DEFINE.

Cynthia
Cynthia_sas
SAS Super FREQ
Hi:
One thing you do not describe -- what kind of report will this be??? Do you want a DETAIL report -- (where every report row represents an observation in the data); or do you want a SUMMARY report, which might show the total COUNT for each name and the total rate or the average rate -- for this second type of report, if the COUNT variable was GE 2, you could then do the highlighting. Here's an example of a summary report (where the 14.7 is the sum of all the rates for steve -- this statistic could be the mean or median of steve's rates, too):
[pre]
name rate count
alana 10.12 1
karen 7.7 1
steve 14.7 3
[/pre]

If you want a DETAIL report, such as you show above, would you want to see the rows in name order -- where it would be immediately apparent that there were duplicates for STEVE or would you want the rows to be in the "original order" that they appear -- or DATE order???
ORIGINAL order:
[pre]
name rate date
steve 5.5 5/1/10
karen 7.7 10/09/10
steve 6.0 8/3/09
alana 10.12 5/4/10
steve 3.2 6/6/10
[/pre]

NAME order (without regard to date):
[pre]
name rate date
alana 10.12 5/4/10
karen 7.7 10/09/10
steve 5.5 5/1/10
steve 3.2 6/6/10
steve 6.0 8/3/09
[/pre]

DATE order (without regard to name):
[pre]
name rate date
steve 6.0 8/3/09
steve 5.5 5/1/10
alana 10.12 5/4/10
steve 3.2 6/6/10
karen 7.7 10/09/10
[/pre]

Do remember that the OBS number that PROC PRINT shows will change, depending on whether you sort the data by name or date or not. That's why I did not show OBS number on the above sample outputs...if you want to "save" the original observation order, you would need to do that when you read in the data.

Also, PROC PRINT is the only procedure that shows OBS number when you do a LIST report like this. If you want highlighting for an entire ROW, you will have to switch to PROC REPORT, which does not have any OBS column capability.

And, what is your destination of interest -- RTF, PDF, HTML????

cynthia
Peter_C
Rhodochrosite | Level 12
just a small supplementary (that might be as relevant among language forum)
on the point :
> Do remember that the OBS number that PROC PRINT shows
> will change, depending on whether you sort the data
> by name or date or not. That's why I did not show OBS
> number on the above sample outputs...if you want to
> "save" the original observation order, you would need
> to do that when you read in the data.
>
If you proc print with a where statement, you get the original obs numbers.
If you use a where statement that (engines behind) proc print decide can be supported by an index then the data will be returned in index order but with obs number appearing disordered because they hold the original data order numbers. It's an unexpected eccentricity that appeals to me but not to all.
Using a data set option it can be forced, like
data class( index= (name age height )) ;
set sashelp.class ;
run ;
title 'demo dis-ordered class' ;
proc print data= class( idxwhere= yes ) ;
title2 ' ordered by age? ' ;
where age > 1 ;
run ;
proc print data= class( idxwhere= yes ) ;
title2 ' ordered by height? ' ;
where height > 1 ;
run ;
deleted_user
Not applicable
Hi cynthia

I would want it detailed without regard to the date, so name order. My preferred output would be excel (which I know might be difficult) or RTF. I currently have it as a PROC REPORT that saves as a RTF.
Cynthia_sas
SAS Super FREQ
Hi:
Do you have PROC REPORT code to share, something that works with the data you posted??? So, what you would want to do would be to see the duplicate rows all highlighted?? Have you written PROC REPORT code that uses NAME as an ORDER variable???

There are only two ways to get your report into Excel with highlighting -- using HTML-based files or using XML-based files (using Spreadsheet Markup Language XML). Do you need single sheet workbooks or do you need multi-sheet workbooks??? Or, do you even care about the color formatting in Excel (in which case, you could create CSV files)??

cynthia

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2637 views
  • 0 likes
  • 4 in conversation