Solved: Adding a new column to find if the value is repeated again.

Sandeep77 · Posted 04-19-2023 10:07 AM

Hi Experts,

I have a dataset which has repeated reference_number. I do not want to delete anything but just want to add a new column which can show 1 if the reference_number is repeaded or else 0. Can you please suggest how can I do that. I have used the below code. This highlight the first time reference_number as 0 and the repeated reference_number as 1. I want if the reference_number is not repeated then 0 and if repeated then 1.

data repeated_accounts;
set Equifax_files;
if lag(reference_number)=reference_number then repeat_flag=1;
else repeat_flag=0;
run;

Tom · Posted 04-19-2023 11:18 AM

If you want a flag that indicates if the reference number is UNIQUE (appears only in one observation) and the data is sorted then you need to test both the FIRST. and the LAST. flags.

So this data step will create a flag variable UNIQUE that will be 1 (TRUE) when this is the only observation with that value and 0 (FALSE) when it is any of multiple observations with the same value.

data want;
  set have;
   by ref_no;
   unique = first.ref_no and last.ref_no ;
run;

View solution in original post

Quentin · Posted 04-19-2023 10:20 AM

It sounds like your data is sorted by reference_number. If I'm understanding your goal, you can do it like (untested):

data repeated_accounts;
  set Equifax_files;
  by reference_number ;
  flag = NOT (first.reference_number and last.reference_number) ;
run;

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

yabwon · Posted 04-19-2023 10:22 AM

Try this:

/* data */
data Equifax_files;
input reference_number $ 1. @@;
if reference_number;
cards;
123456673399822267234
;
run;
proc print;
run;

/*process*/

proc sort data=Equifax_files;
  by reference_number;
run;

data Equifax_files;
  set Equifax_files;
  by reference_number;

  marker = not first.reference_number;
run;
proc print;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Sandeep77 · Posted 04-19-2023 11:02 AM

Thank you but this is also showing in the same way I was getting. First ref_number as 0 and when it was repeated it was showing as 1. I want if there are more than 1 same reference number then 1 and if it is unique then 0.

yabwon · Posted 04-19-2023 11:23 AM

@Tom 's answer seems to be doing the job, i.e.:

marker = not (first.reference_number and last.reference_number);

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

ballardw · Posted 04-19-2023 10:48 AM

Example before and result desired.

Since your LAG attempt, which doesn't behave as you want because of the nature of LAG, would only have a chance of working if the reference is sorted:

data example;
   input ref;
datalines;
1
1
1
2
3
3
4
; 


data want;
  set example;
  by ref;
  flag= not(first.ref);
run;

LAG, and DIF, are queue functions. So when you use IF lag() the "lagged" value is the last time the IF was true, not the previous record.

To use Lag you would do something like:

data repeated_accounts;
set Equifax_files;
Lref = lag(Reference_number);
if lfref=reference_number then repeat_flag=1;
else repeat_flag=0;
drop lref;
run;

Sandeep77 · Posted 04-19-2023 11:06 AM

Thank you but it still shows as the first reference_number as 0 and the repeated reference_number as 1. I want if the reference_number is repeated then put it as 1 or else 0. So that I can filter out 0 and conclude them as they are not repeated and unique reference_numbers from the data.

Tom · Posted 04-19-2023 11:18 AM

If you want a flag that indicates if the reference number is UNIQUE (appears only in one observation) and the data is sorted then you need to test both the FIRST. and the LAST. flags.

So this data step will create a flag variable UNIQUE that will be 1 (TRUE) when this is the only observation with that value and 0 (FALSE) when it is any of multiple observations with the same value.

data want;
  set have;
   by ref_no;
   unique = first.ref_no and last.ref_no ;
run;

Sandeep77 · Posted 04-19-2023 11:30 AM

Thank you all!

ballardw · Posted 04-19-2023 11:51 AM

@Sandeep77 wrote:
Thank you but it still shows as the first reference_number as 0 and the repeated reference_number as 1. I want if the reference_number is repeated then put it as 1 or else 0. So that I can filter out 0 and conclude them as they are not repeated and unique reference_numbers from the data.

I submit that your descriptions need to include example DATA and desired result.

Adding a new column to find if the value is repeated again.

Re: Adding a new column to find if the value is repeated again.

Re: Adding a new column to find if the value is repeated again.

Re: Adding a new column to find if the value is repeated again.

Re: Adding a new column to find if the value is repeated again.

Re: Adding a new column to find if the value is repeated again.

Re: Adding a new column to find if the value is repeated again.

Re: Adding a new column to find if the value is repeated again.

Re: Adding a new column to find if the value is repeated again.

Re: Adding a new column to find if the value is repeated again.

Re: Adding a new column to find if the value is repeated again.

SAS Innovate 2025: Register Now