BookmarkSubscribeRSS Feed
OS2Rules
Obsidian | Level 7
Hi All:

I'm trying to get my brain bent around this and I know I could write a mile of code, but I thought someone could find an easier solution.

I have a ton of records (nearly 2 million) that are sequenced by date. For some dates there is a single record, for some there are 2 records. There are never more than 2 for any date.

When the 2 values for a selected date are the same I want to set the value for both to missing ("."), but I want to keep the records in the table. I know I could sort and use "first." and "last.", or use a "lag", but I'm thinkin' there is a more elegant solution. Most solutions I come up with only get rid of the duplicate but I need to get rid of them both.

Thanks in advance.
4 REPLIES 4
Bill
Quartz | Level 8
One way would be to run proc summary against the data set (http://support.sas.com/kb/37/581.html) to find the duplicates. Merge back to the original file to update the duplicate records to missing.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Elegant? After your PROC SORT step, you will have less than 10 SAS statements for the DATA step executino.

So, one DATA step pass and test for FIRST. AND LAST. -- if it's not the condition, then set your to missing values until you get to LAST..

Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable
Maybe this example might help, but take into account that you have a ton of records.
Program finds duplicates, keep them in the table and sets values to missing.

DATA Temp;
INPUT date data;
CARDS;
1 5000
1 6000
1 2100
2 1110
2 5000
2 5000
3 5000
;
RUN;

proc sql;
create table temp1
as
select a.date,
a.data,
(CASE
when b.date is not null then .
else a.data
END) as new_data_value
from temp a
left outer join (
select date
from temp
group by date
having count(*)>1) b on a.date=b.date
;
quit;
Patrick
Opal | Level 21
I agree with Scott: What's so bad about a first-last approach? It's not a lot of code and easy to understand and maintain.

Of course: If your data is stored in a database and you want to alter the values in this database table then a SQL approach would be better.

DATA Temp;
INPUT date data;
CARDS;
1 5000
1 6000
1 2100
2 1110
3 5000
3 5000
;
RUN;

proc sql noprint;
create view SortedTemp as
select *
from temp
order by date
;
quit;

data want;
set temp;
by date;
if not (first.date and last.date) then call missing(data);
run;

proc print data=want;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 704 views
  • 0 likes
  • 5 in conversation