Hi All,
I have a dataset with duplicates and I'm trying to study duplicated records, the dataset is shaped as of the following, and I'm trying to create a dup variable.
The dup count will be grouped by session_id, device_name and time, and it will have repeated numbers if the defined group are identical within session_id.
session_id device_name time dup 1 desktop 12 1 1 desktop 12 1 1 desktop 12 1 1 tablet 10 2
2 tablet 11 1
2 tablet 11 1
2 mobile 10 2
2 desktop 10 3
3 desktop 10 1
3 desktop 10 1
Appreciate for any help.
yes or just :
data want;
set have;
by session_id device_name notsorted time notsorted;
if first.session_id then dup=0;
if first.time then dup+1;
run;
- Cheers -
You can do something like this
data have;
input session_id device_name $ time;
datalines;
1 desktop 12
1 desktop 12
1 desktop 12
1 tablet 10
2 tablet 11
2 tablet 11
2 mobile 10
2 desktop 10
3 desktop 10
3 desktop 10
;
data want(drop=_:);
set have;
by session_id device_name notsorted time notsorted;
_session_id=lag1(session_id);
_device_name=lag1(device_name);
_time=lag1(time);
if session_id ne _session_id | device_name ne _device_name | time ne _time then do;
dup+1;
end;
if first.session_id then dup=1;
run;
yes or just :
data want;
set have;
by session_id device_name notsorted time notsorted;
if first.session_id then dup=0;
if first.time then dup+1;
run;
- Cheers -
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.