Solved: A specific way to define duplicates by groups

lydiawawa · Posted 02-26-2019 01:19 AM

Hi All,

I have a dataset with duplicates and I'm trying to study duplicated records, the dataset is shaped as of the following, and I'm trying to create a dup variable.

The dup count will be grouped by session_id, device_name and time, and it will have repeated numbers if the defined group are identical within session_id.

session_id  device_name    time   dup
    1          desktop       12    1
    1          desktop       12    1
    1          desktop       12    1
    1          tablet        10    2
    2          tablet        11    1
    2          tablet        11    1
    2          mobile        10    2
    2          desktop       10    3
    3          desktop       10    1
    3          desktop       10    1

Appreciate for any help.

Oligolas · Posted 02-26-2019 01:50 AM

yes or just :

data want;
   set have;
   by session_id device_name notsorted time notsorted;

   if first.session_id then dup=0;
   if first.time then dup+1;
run;

________________________
- Cheers -

View solution in original post

PeterClemmensen · Posted 02-26-2019 01:35 AM

You can do something like this

data have;
input session_id device_name $ time;
datalines;
1 desktop 12
1 desktop 12
1 desktop 12
1 tablet 10
2 tablet 11
2 tablet 11
2 mobile 10
2 desktop 10
3 desktop 10
3 desktop 10
;

data want(drop=_:);
   set have;
   by session_id device_name notsorted time notsorted;

   _session_id=lag1(session_id);
   _device_name=lag1(device_name); 
   _time=lag1(time);

   if session_id ne _session_id | device_name ne _device_name | time ne _time then do;
      dup+1;
   end;
   
   if first.session_id then dup=1;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Oligolas · Posted 02-26-2019 01:50 AM

yes or just :

data want;
   set have;
   by session_id device_name notsorted time notsorted;

   if first.session_id then dup=0;
   if first.time then dup+1;
run;

________________________
- Cheers -

A specific way to define duplicates by groups

Re: A specific way to define duplicates by groups

Re: A specific way to define duplicates by groups

Re: A specific way to define duplicates by groups

Catch up on SAS Innovate 2026

A specific way to define duplicates by groups

Re: A specific way to define duplicates by groups

Re: A specific way to define duplicates by groups

Re: A specific way to define duplicates by groups

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away