BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Sandeep77
Lapis Lazuli | Level 10

Hi Experts,

I have a dataset which has repeated reference_number. I do not want to delete anything but just want to add a new column which can show 1 if the reference_number is repeaded or else 0. Can you please suggest how can I do that. I have used the below code. This highlight the first time reference_number as 0 and the repeated reference_number as 1. I want if the reference_number is not repeated then 0 and if repeated then 1.

data repeated_accounts;
set Equifax_files;
if lag(reference_number)=reference_number then repeat_flag=1;
else repeat_flag=0;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you want a flag that indicates if the reference number is UNIQUE (appears only in one observation) and the data is sorted then you need to test both the FIRST. and the LAST. flags.

 

So this data step will create a flag variable UNIQUE that will be 1 (TRUE) when this is the only observation with that value and 0 (FALSE) when it is any of multiple observations with the same value.

data want;
  set have;
   by ref_no;
   unique = first.ref_no and last.ref_no ;
run;

View solution in original post

9 REPLIES 9
Quentin
Super User

It sounds like your data is sorted by reference_number.  If I'm understanding your goal, you can do it like (untested):

data repeated_accounts;
  set Equifax_files;
  by reference_number ;
  flag = NOT (first.reference_number and last.reference_number) ;
run;
The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
yabwon
Onyx | Level 15

Try this:

/* data */
data Equifax_files;
input reference_number $ 1. @@;
if reference_number;
cards;
123456673399822267234
;
run;
proc print;
run;

/*process*/

proc sort data=Equifax_files;
  by reference_number;
run;

data Equifax_files;
  set Equifax_files;
  by reference_number;

  marker = not first.reference_number;
run;
proc print;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Sandeep77
Lapis Lazuli | Level 10
Thank you but this is also showing in the same way I was getting. First ref_number as 0 and when it was repeated it was showing as 1. I want if there are more than 1 same reference number then 1 and if it is unique then 0.
yabwon
Onyx | Level 15

@Tom 's answer seems to be doing the job, i.e.:

marker = not (first.reference_number and last.reference_number);

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



ballardw
Super User

Example before and result desired.

Since your LAG attempt, which doesn't behave as you want because of the nature of LAG, would only have a chance of working if the reference is sorted:

 

data example;
   input ref;
datalines;
1
1
1
2
3
3
4
; 


data want;
  set example;
  by ref;
  flag= not(first.ref);
run;

LAG, and DIF, are queue functions. So when you use IF lag() the "lagged" value is the last time the IF was true, not the previous record.

To use Lag you would do something like:

data repeated_accounts;
set Equifax_files;
Lref = lag(Reference_number);
if lfref=reference_number then repeat_flag=1;
else repeat_flag=0;
drop lref;
run;

 

Sandeep77
Lapis Lazuli | Level 10
Thank you but it still shows as the first reference_number as 0 and the repeated reference_number as 1. I want if the reference_number is repeated then put it as 1 or else 0. So that I can filter out 0 and conclude them as they are not repeated and unique reference_numbers from the data.
Tom
Super User Tom
Super User

If you want a flag that indicates if the reference number is UNIQUE (appears only in one observation) and the data is sorted then you need to test both the FIRST. and the LAST. flags.

 

So this data step will create a flag variable UNIQUE that will be 1 (TRUE) when this is the only observation with that value and 0 (FALSE) when it is any of multiple observations with the same value.

data want;
  set have;
   by ref_no;
   unique = first.ref_no and last.ref_no ;
run;
Sandeep77
Lapis Lazuli | Level 10

Thank you all!

ballardw
Super User

@Sandeep77 wrote:
Thank you but it still shows as the first reference_number as 0 and the repeated reference_number as 1. I want if the reference_number is repeated then put it as 1 or else 0. So that I can filter out 0 and conclude them as they are not repeated and unique reference_numbers from the data.

I submit that your descriptions need to include example DATA and desired result.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1052 views
  • 2 likes
  • 5 in conversation