Solved: Re: ID Comparison

sasuser_sk · Posted 03-31-2020 04:01 PM

Hi: I am a new SAS EG user and struggling here.

I want to compare ID from row 2 to ID in row 1 for identifying duplicates purposes and c=some other comparison needs. For example:

case when(id = prior(id)) then "Ignore" else "Keep it" end

OR

case when(id = next(id)) then "Ignore" else "Keep it" end

Here prior and next are comparing the id field in row 1 and in row 2.

Can someone please make suggestions?

Thank you!

Reeza · Posted 03-31-2020 04:44 PM

data want;
set have;
by id;
if first.id then status='Keep';
else status='Ignore';
run;

@sasuser_sk wrote:
Yes my data is sorted. here is an example. This is referring to the PRIOR concept I am wanting to use in SAS EG.
Client id 4578 occurs twice. For first instance it is keeping it and then on the second occurrence i want to add ignore. it is identifying duplicate for me as well. Two purposes.

client id NEW FIELD I WANT TO CREATE
2376 KEEP
2489 KEEP
4578 KEEP
4578 IGNORE
5478 KEEP
5784 KEEP
5784 IGNORE
5784 IGNORE
7069 KEEP
7069 IGNORE
567834 KEEP

View solution in original post

PeterClemmensen · Posted 03-31-2020 04:04 PM

You say that this is for identifying duplicates. In that case, there are better ways than to manually compare each id with the id from the previous row. What does your data look like? And what does your desired result look like?

The DATA to DATA Step Macro
Blog: SASnrd

sasuser_sk · Posted 03-31-2020 04:20 PM

Thanks draycut. My desired result would be to create another column that would let me put filter on it for further analysis. I am aware of the method to identify duplicates by:
Tasks....>Data...>Sort data and then generating a different dataset that has duplicates in it.

I want to keep my existing data set and make a new field that compares the id field.

Hope this helps.

ballardw · Posted 03-31-2020 04:33 PM

@sasuser_sk wrote:
Thanks draycut. My desired result would be to create another column that would let me put filter on it for further analysis. I am aware of the method to identify duplicates by:
Tasks....>Data...>Sort data and then generating a different dataset that has duplicates in it.

I want to keep my existing data set and make a new field that compares the id field.

Hope this helps.

Adding a new field to your data will create a new data set. You would replace it. It might have the same name but it would not be your "existing data set".

What @PeterClemmensen meant was show use an example of your current data. It doesn't have to be your data but look like it. Then show what the resulting data set with the added column would look like. Best would be to include the example data as data step code. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

The logic for the created field would also help.

Is your data currently sorted in the order that it needs to be processed?

There may be two different approaches, depending on your data. The RETAIN function to create a variable like "PriorID" to keep the value from the last record, or the LAG function which can reference the values of a variable in previous observations. Caution: this frequently does not work like you may expect if you use IF and LAG in the same statement.

A simple example of Retain:

Data want;
   set have;
	retain Priorid;
	<code>
	/* warning: while processing the 
	first record PriorId is not assigned
	so be aware*/
	/* at the end of the data step
	set the PriorId to the value of the 
	current id*/
	Priorid=id;
run;

When you create the PriorID variable you should make sure that it has the same properties such as length and type as Id.

sasuser_sk · Posted 03-31-2020 04:42 PM

Yes my data is sorted. here is an example. This is referring to the PRIOR concept I am wanting to use in SAS EG.
Client id 4578 occurs twice. For first instance it is keeping it and then on the second occurrence i want to add ignore. it is identifying duplicate for me as well. Two purposes.

client id NEW FIELD I WANT TO CREATE
2376 KEEP
2489 KEEP
4578 KEEP
4578 IGNORE
5478 KEEP
5784 KEEP
5784 IGNORE
5784 IGNORE
7069 KEEP
7069 IGNORE
567834 KEEP

Reeza · Posted 03-31-2020 04:44 PM

data want;
set have;
by id;
if first.id then status='Keep';
else status='Ignore';
run;

@sasuser_sk wrote:
Yes my data is sorted. here is an example. This is referring to the PRIOR concept I am wanting to use in SAS EG.
Client id 4578 occurs twice. For first instance it is keeping it and then on the second occurrence i want to add ignore. it is identifying duplicate for me as well. Two purposes.

client id NEW FIELD I WANT TO CREATE
2376 KEEP
2489 KEEP
4578 KEEP
4578 IGNORE
5478 KEEP
5784 KEEP
5784 IGNORE
5784 IGNORE
7069 KEEP
7069 IGNORE
567834 KEEP

sasuser_sk · Posted 03-31-2020 04:46 PM

Thanks! Can this be don en query builder in SAS EG?

Reeza · Posted 03-31-2020 05:00 PM

Try a RANK task and add the ranks by groups. Then you'll have 1 for the first and a number greater than one for each other record.

@sasuser_sk wrote:
Thanks! Can this be don en query builder in SAS EG?

sasuser_sk · Posted 03-31-2020 06:25 PM

Thank you everyone for your suggestions.

Reeza · Posted 03-31-2020 04:35 PM

Look at the SORT task instead and the options. It will identify duplicates or unique records. Very specifically, GROUP BY processing is a unique feature in SAS and you should leverage it. SQL/Query Builder isn't a great way to accomplish this task. A data step is another approach.

proc sort data=have;
by id;
run;

data duplicates unique;
set have;
by;
if not (first.id and last.id) then output duplicates;
else output unique;
run;

@sasuser_sk wrote:

Hi: I am a new SAS EG user and struggling here.

I want to compare ID from row 2 to ID in row 1 for identifying duplicates purposes and c=some other comparison needs. For example:

case when(id = prior(id)) then "Ignore" else "Keep it" end

OR

case when(id = next(id)) then "Ignore" else "Keep it" end

Here prior and next are comparing the id field in row 1 and in row 2.

Can someone please make suggestions?

Thank you!

novinosrin · Posted 03-31-2020 04:53 PM

Hi @sasuser_sk I believe the query builder generates a PROC SQL query and I am afraid the parititon -by feature is unavailable in proc sql and you may have to resort to MONOTONIC(). If query builder allows monotonic in subquery and that we could flag the MIN and others as Keep and Ignore, it's easy and possible. However, I can't confirm if that idea can be plugged in query builder.

novinosrin · Posted 03-31-2020 05:55 PM

What I meant was

proc sql;
create table want as
select *,ifc(min(rn)=rn,'KEEP','IGNORE') as new_variable
from
(select *,monotonic() as rn from have)
group by clientid
order by rn;
quit;

sasuser_sk · Posted 03-31-2020 06:25 PM

THANK YOU! I'll give it a try.

Registration is open

SAS Training: Just a Click Away