BookmarkSubscribeRSS Feed
OS2Rules
Obsidian | Level 7
Hi All:

I'm trying to get my brain bent around this and I know I could write a mile of code, but I thought someone could find an easier solution.

I have a ton of records (nearly 2 million) that are sequenced by date. For some dates there is a single record, for some there are 2 records. There are never more than 2 for any date.

When the 2 values for a selected date are the same I want to set the value for both to missing ("."), but I want to keep the records in the table. I know I could sort and use "first." and "last.", or use a "lag", but I'm thinkin' there is a more elegant solution. Most solutions I come up with only get rid of the duplicate but I need to get rid of them both.

Thanks in advance.
4 REPLIES 4
Bill
Quartz | Level 8
One way would be to run proc summary against the data set (http://support.sas.com/kb/37/581.html) to find the duplicates. Merge back to the original file to update the duplicate records to missing.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Elegant? After your PROC SORT step, you will have less than 10 SAS statements for the DATA step executino.

So, one DATA step pass and test for FIRST. AND LAST. -- if it's not the condition, then set your to missing values until you get to LAST..

Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable
Maybe this example might help, but take into account that you have a ton of records.
Program finds duplicates, keep them in the table and sets values to missing.

DATA Temp;
INPUT date data;
CARDS;
1 5000
1 6000
1 2100
2 1110
2 5000
2 5000
3 5000
;
RUN;

proc sql;
create table temp1
as
select a.date,
a.data,
(CASE
when b.date is not null then .
else a.data
END) as new_data_value
from temp a
left outer join (
select date
from temp
group by date
having count(*)>1) b on a.date=b.date
;
quit;
Patrick
Opal | Level 21
I agree with Scott: What's so bad about a first-last approach? It's not a lot of code and easy to understand and maintain.

Of course: If your data is stored in a database and you want to alter the values in this database table then a SQL approach would be better.

DATA Temp;
INPUT date data;
CARDS;
1 5000
1 6000
1 2100
2 1110
3 5000
3 5000
;
RUN;

proc sql noprint;
create view SortedTemp as
select *
from temp
order by date
;
quit;

data want;
set temp;
by date;
if not (first.date and last.date) then call missing(data);
run;

proc print data=want;
run;

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1491 views
  • 0 likes
  • 5 in conversation