BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HitmonTran
Pyrite | Level 9

Hi All,

 

I have a screenshot of the data where there is a duplicate record.  My goal is to keep LBBLFL='Y' for only the last record (second record in this case) .   I tried using last.visit, but didn't work. thanks

 

Capture.JPG

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Actual example data as a data step is helpful as I am not going to type in data.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Likely you used 1) the wrong variable visit_num and the wrong test.

In the following example the variable B plays the role of visit_num, a is the llbblfl and c is the lbdtc.

 

data example;
   input a b c;
datalines;
1 2 3
1 2 5
. 3 2
. 3 4
;

data want;
   set example;
   by b c;
   if not(last.b) then call missing(a);
run;

if there are other variables involved, such as an ID variable or more you would sort by those plus visit_num and LBDTC.

 

Why not to use visit? To use by processing and first and last the data needs to be sorted. And believe it or not DAY50 comes before DAY8.

data example;
   input a $;
datalines;
DAY-1
DAY8
DAY50
;

proc sort data=example ;
   by a;
run;

proc print; run;

Why? Character comparisons for order stop with the first different character and 5<8.

 

You can get around this with proc sort but the data step doesn't really understand the difference and getting NOTSORTED in the right place may be tricky depending on the actual data involved.

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

So here you want to delete the first observation only and keep the rest in the data set, correct?

novinosrin
Tourmaline | Level 20

" I tried using last.visit, but didn't work"

 

Why is that?

 

What was the result?

ballardw
Super User

Actual example data as a data step is helpful as I am not going to type in data.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Likely you used 1) the wrong variable visit_num and the wrong test.

In the following example the variable B plays the role of visit_num, a is the llbblfl and c is the lbdtc.

 

data example;
   input a b c;
datalines;
1 2 3
1 2 5
. 3 2
. 3 4
;

data want;
   set example;
   by b c;
   if not(last.b) then call missing(a);
run;

if there are other variables involved, such as an ID variable or more you would sort by those plus visit_num and LBDTC.

 

Why not to use visit? To use by processing and first and last the data needs to be sorted. And believe it or not DAY50 comes before DAY8.

data example;
   input a $;
datalines;
DAY-1
DAY8
DAY50
;

proc sort data=example ;
   by a;
run;

proc print; run;

Why? Character comparisons for order stop with the first different character and 5<8.

 

You can get around this with proc sort but the data step doesn't really understand the difference and getting NOTSORTED in the right place may be tricky depending on the actual data involved.

Shivam
Calcite | Level 5

Hi,

 

I assume that in your data you can have visit day as different for same visit?(obviously it doesn't make sense, but since you have highlighted the visitnum and visit both the columns, i assume visit day can be different for same visitnum)

in that case you can use the following code

 

data have;
input visitnum visit $;
cards;
1 day1
1 day1
2 day4
1 day1
2 day4
2 day4
3 day7
2 day4
3 day5
4 day6
4 day8
;
run;
proc sort data=have;
by visitnum visit;
run;

data want;
set have;
by visitnum visit;
if last.visit then flag='Y'
run;

 

Shivam
Calcite | Level 5

also in continuity of my previous reply.

if you are looking to remove the duplicates completely from your record then you can use nodupkey in proc sort, as below

proc sort data=have out=want nodupkey;
by visitnum visit;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1573 views
  • 0 likes
  • 5 in conversation