BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasuser_sk
Quartz | Level 8

Hi: I am a new SAS EG user and struggling here.

I want to compare ID from row 2 to ID in row 1 for identifying duplicates purposes and c=some other comparison needs. For example:

case when(id = prior(id)) then "Ignore" else "Keep it" end

OR 

case when(id = next(id)) then "Ignore" else "Keep it" end

 

Here prior and next are comparing the id field in row 1 and in row 2.

 

Can someone please make suggestions?


Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
data want;
set have;
by id;
if first.id then status='Keep';
else status='Ignore';
run;

@sasuser_sk wrote:
Yes my data is sorted. here is an example. This is referring to the PRIOR concept I am wanting to use in SAS EG.
Client id 4578 occurs twice. For first instance it is keeping it and then on the second occurrence i want to add ignore. it is identifying duplicate for me as well. Two purposes.

client id NEW FIELD I WANT TO CREATE
2376 KEEP
2489 KEEP
4578 KEEP
4578 IGNORE
5478 KEEP
5784 KEEP
5784 IGNORE
5784 IGNORE
7069 KEEP
7069 IGNORE
567834 KEEP

 

View solution in original post

12 REPLIES 12
PeterClemmensen
Tourmaline | Level 20

You say that this is for identifying duplicates. In that case, there are better ways than to manually compare each id with the id from the previous row. What does your data look like? And what does your desired result look like? 

sasuser_sk
Quartz | Level 8
Thanks draycut. My desired result would be to create another column that would let me put filter on it for further analysis. I am aware of the method to identify duplicates by:
Tasks....>Data...>Sort data and then generating a different dataset that has duplicates in it.

I want to keep my existing data set and make a new field that compares the id field.

Hope this helps.
ballardw
Super User

@sasuser_sk wrote:
Thanks draycut. My desired result would be to create another column that would let me put filter on it for further analysis. I am aware of the method to identify duplicates by:
Tasks....>Data...>Sort data and then generating a different dataset that has duplicates in it.

I want to keep my existing data set and make a new field that compares the id field.

Hope this helps.

Adding a new field to your data will create a new data set. You would replace it. It might have the same name but it would not be your "existing data set".

 

What @PeterClemmensen meant was show use an example of your current data. It doesn't have to be your data but look like it. Then show what the resulting data set with the added column would look like. Best would be to include the example data as data step code. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

The logic for the created field would also help.

 

Is your data currently sorted in the order that it needs to be processed?

 

There may be two different approaches, depending on your data. The RETAIN function to create a variable like "PriorID" to keep the value from the last record, or the LAG function which can reference the values of a variable in previous observations. Caution: this frequently does not work like you may expect if you use IF and LAG in the same statement.

A simple example of Retain:

Data want;
   set have;
	retain Priorid;
	<code>
	/* warning: while processing the 
	first record PriorId is not assigned
	so be aware*/
	/* at the end of the data step
	set the PriorId to the value of the 
	current id*/
	Priorid=id;
run;

When you create the PriorID variable you should make sure that it has the same properties such as length and type as Id.

 

sasuser_sk
Quartz | Level 8
Yes my data is sorted. here is an example. This is referring to the PRIOR concept I am wanting to use in SAS EG.
Client id 4578 occurs twice. For first instance it is keeping it and then on the second occurrence i want to add ignore. it is identifying duplicate for me as well. Two purposes.

client id NEW FIELD I WANT TO CREATE
2376 KEEP
2489 KEEP
4578 KEEP
4578 IGNORE
5478 KEEP
5784 KEEP
5784 IGNORE
5784 IGNORE
7069 KEEP
7069 IGNORE
567834 KEEP
Reeza
Super User
data want;
set have;
by id;
if first.id then status='Keep';
else status='Ignore';
run;

@sasuser_sk wrote:
Yes my data is sorted. here is an example. This is referring to the PRIOR concept I am wanting to use in SAS EG.
Client id 4578 occurs twice. For first instance it is keeping it and then on the second occurrence i want to add ignore. it is identifying duplicate for me as well. Two purposes.

client id NEW FIELD I WANT TO CREATE
2376 KEEP
2489 KEEP
4578 KEEP
4578 IGNORE
5478 KEEP
5784 KEEP
5784 IGNORE
5784 IGNORE
7069 KEEP
7069 IGNORE
567834 KEEP

 

sasuser_sk
Quartz | Level 8
Thanks! Can this be don en query builder in SAS EG?
Reeza
Super User

Try a RANK task and add the ranks by groups. Then you'll have 1 for the first and a number greater than one for each other record. 

 


@sasuser_sk wrote:
Thanks! Can this be don en query builder in SAS EG?

 

sasuser_sk
Quartz | Level 8

Thank you everyone for your suggestions.

Reeza
Super User

Look at the SORT task instead and the options. It will identify duplicates or unique records. Very specifically, GROUP BY processing is a unique feature in SAS and you should leverage it. SQL/Query Builder isn't a great way to accomplish this task. A data step is another approach.

 

proc sort data=have;
by id;
run;

data duplicates unique;
set have;
by;
if not (first.id and last.id) then output duplicates;
else output unique;
run;

@sasuser_sk wrote:

Hi: I am a new SAS EG user and struggling here.

I want to compare ID from row 2 to ID in row 1 for identifying duplicates purposes and c=some other comparison needs. For example:

case when(id = prior(id)) then "Ignore" else "Keep it" end

OR 

case when(id = next(id)) then "Ignore" else "Keep it" end

 

Here prior and next are comparing the id field in row 1 and in row 2.

 

Can someone please make suggestions?


Thank you!

 


 

novinosrin
Tourmaline | Level 20

Hi @sasuser_sk  I believe the query builder generates a PROC SQL query and I am afraid the parititon -by feature is unavailable in proc sql and you may have to resort to MONOTONIC(). If query builder allows monotonic in subquery and that we could flag the MIN and others as Keep and Ignore, it's easy and possible. However, I can't confirm if that idea can be plugged in query builder. 

 

 

novinosrin
Tourmaline | Level 20

What I meant was

proc sql;
create table want as
select *,ifc(min(rn)=rn,'KEEP','IGNORE') as new_variable
from
(select *,monotonic() as rn from have)
group by clientid
order by rn;
quit;
sasuser_sk
Quartz | Level 8
THANK YOU! I'll give it a try.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2928 views
  • 4 likes
  • 5 in conversation