Hi All,
I have a dataset with duplicates and I'm trying to study duplicated records, the dataset is shaped as of the following, and I'm trying to create a dup variable.
The dup count will be grouped by session_id, device_name and time, and it will have repeated numbers if the defined group are identical within session_id.
session_id device_name time dup
1 desktop 12 1
1 desktop 12 1
1 desktop 12 1
1 tablet 10 2
2 tablet 11 1
2 tablet 11 1
2 mobile 10 2
2 desktop 10 3
3 desktop 10 1
3 desktop 10 1
Appreciate for any help.
yes or just :
data want;
set have;
by session_id device_name notsorted time notsorted;
if first.session_id then dup=0;
if first.time then dup+1;
run;
- Cheers -
You can do something like this
data have;
input session_id device_name $ time;
datalines;
1 desktop 12
1 desktop 12
1 desktop 12
1 tablet 10
2 tablet 11
2 tablet 11
2 mobile 10
2 desktop 10
3 desktop 10
3 desktop 10
;
data want(drop=_:);
set have;
by session_id device_name notsorted time notsorted;
_session_id=lag1(session_id);
_device_name=lag1(device_name);
_time=lag1(time);
if session_id ne _session_id | device_name ne _device_name | time ne _time then do;
dup+1;
end;
if first.session_id then dup=1;
run;
yes or just :
data want;
set have;
by session_id device_name notsorted time notsorted;
if first.session_id then dup=0;
if first.time then dup+1;
run;
- Cheers -
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.