BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Sandeep77
Lapis Lazuli | Level 10

Hi Experts,

I have a dataset which has repeated reference_number. I do not want to delete anything but just want to add a new column which can show 1 if the reference_number is repeaded or else 0. Can you please suggest how can I do that. I have used the below code. This highlight the first time reference_number as 0 and the repeated reference_number as 1. I want if the reference_number is not repeated then 0 and if repeated then 1.

data repeated_accounts;
set Equifax_files;
if lag(reference_number)=reference_number then repeat_flag=1;
else repeat_flag=0;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you want a flag that indicates if the reference number is UNIQUE (appears only in one observation) and the data is sorted then you need to test both the FIRST. and the LAST. flags.

 

So this data step will create a flag variable UNIQUE that will be 1 (TRUE) when this is the only observation with that value and 0 (FALSE) when it is any of multiple observations with the same value.

data want;
  set have;
   by ref_no;
   unique = first.ref_no and last.ref_no ;
run;

View solution in original post

9 REPLIES 9
Quentin
PROC Star

It sounds like your data is sorted by reference_number.  If I'm understanding your goal, you can do it like (untested):

data repeated_accounts;
  set Equifax_files;
  by reference_number ;
  flag = NOT (first.reference_number and last.reference_number) ;
run;
Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.
yabwon
Meteorite | Level 14

Try this:

/* data */
data Equifax_files;
input reference_number $ 1. @@;
if reference_number;
cards;
123456673399822267234
;
run;
proc print;
run;

/*process*/

proc sort data=Equifax_files;
  by reference_number;
run;

data Equifax_files;
  set Equifax_files;
  by reference_number;

  marker = not first.reference_number;
run;
proc print;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Sandeep77
Lapis Lazuli | Level 10
Thank you but this is also showing in the same way I was getting. First ref_number as 0 and when it was repeated it was showing as 1. I want if there are more than 1 same reference number then 1 and if it is unique then 0.
yabwon
Meteorite | Level 14

@Tom 's answer seems to be doing the job, i.e.:

marker = not (first.reference_number and last.reference_number);

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



ballardw
Super User

Example before and result desired.

Since your LAG attempt, which doesn't behave as you want because of the nature of LAG, would only have a chance of working if the reference is sorted:

 

data example;
   input ref;
datalines;
1
1
1
2
3
3
4
; 


data want;
  set example;
  by ref;
  flag= not(first.ref);
run;

LAG, and DIF, are queue functions. So when you use IF lag() the "lagged" value is the last time the IF was true, not the previous record.

To use Lag you would do something like:

data repeated_accounts;
set Equifax_files;
Lref = lag(Reference_number);
if lfref=reference_number then repeat_flag=1;
else repeat_flag=0;
drop lref;
run;

 

Sandeep77
Lapis Lazuli | Level 10
Thank you but it still shows as the first reference_number as 0 and the repeated reference_number as 1. I want if the reference_number is repeated then put it as 1 or else 0. So that I can filter out 0 and conclude them as they are not repeated and unique reference_numbers from the data.
Tom
Super User Tom
Super User

If you want a flag that indicates if the reference number is UNIQUE (appears only in one observation) and the data is sorted then you need to test both the FIRST. and the LAST. flags.

 

So this data step will create a flag variable UNIQUE that will be 1 (TRUE) when this is the only observation with that value and 0 (FALSE) when it is any of multiple observations with the same value.

data want;
  set have;
   by ref_no;
   unique = first.ref_no and last.ref_no ;
run;
Sandeep77
Lapis Lazuli | Level 10

Thank you all!

ballardw
Super User

@Sandeep77 wrote:
Thank you but it still shows as the first reference_number as 0 and the repeated reference_number as 1. I want if the reference_number is repeated then put it as 1 or else 0. So that I can filter out 0 and conclude them as they are not repeated and unique reference_numbers from the data.

I submit that your descriptions need to include example DATA and desired result.

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

From SAS Users blog
Want more? Visit our blog for more articles like these.
5 Steps to Your First Analytics Project Using SAS

For SAS newbies, this video is a great way to get started. James Harroun walks through the process using SAS Studio for SAS OnDemand for Academics, but the same steps apply to any analytics project.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 304 views
  • 2 likes
  • 5 in conversation