topic Re: A specific way to define duplicates by groups in SAS Programming

A specific way to define duplicates by groups

lydiawawa — Tue, 26 Feb 2019 06:21:16 GMT

Hi All,

I have a dataset with duplicates and I'm trying to study duplicated records, the dataset is shaped as of the following, and I'm trying to create a dup variable.

The dup count will be grouped by session_id, device_name and time, and it will have repeated numbers if the defined group are identical within session_id.

session_id  device_name    time   dup
    1          desktop       12    1
    1          desktop       12    1
    1          desktop       12    1
    1          tablet        10    2
    2          tablet        11    1
    2          tablet        11    1
    2          mobile        10    2
    2          desktop       10    3
    3          desktop       10    1
    3          desktop       10    1

Appreciate for any help.

Re: A specific way to define duplicates by groups

PeterClemmensen — Tue, 26 Feb 2019 06:35:42 GMT

You can do something like this

data have;
input session_id device_name $ time;
datalines;
1 desktop 12
1 desktop 12
1 desktop 12
1 tablet 10
2 tablet 11
2 tablet 11
2 mobile 10
2 desktop 10
3 desktop 10
3 desktop 10
;

data want(drop=_:);
   set have;
   by session_id device_name notsorted time notsorted;

   _session_id=lag1(session_id);
   _device_name=lag1(device_name); 
   _time=lag1(time);

   if session_id ne _session_id | device_name ne _device_name | time ne _time then do;
      dup+1;
   end;
   
   if first.session_id then dup=1;
run;

Re: A specific way to define duplicates by groups

Oligolas — Tue, 26 Feb 2019 06:50:41 GMT

yes or just :

data want;
   set have;
   by session_id device_name notsorted time notsorted;

   if first.session_id then dup=0;
   if first.time then dup+1;
run;