Generate a sequence number (a variable) by ID with condition

chimei0403 · Posted 01-11-2024 01:40 PM

I need a quick help.

Wish to generate a variable as sequence number by ID but with a condition (variable named difference is greater than 10 and by id and sequence of the date)

Thank you so much!

data have;
input ID$ DATE$ DIFFERENCE$;
datalines;
A 4/22/19 .
A 4/24/19 2
A 4/25/19 1
A 5/6/19 11
A 5/17/19 11
A 5/22/19 5
B 2/4/20 .
B 2/12/20 8
B 3/4/20 21
B 3/10/20 6
RUN;

Data Wish something like below (i.e. create a variable named list which is a sequence number by ID but with the condition that the variable difference greater than 10 and by id's date order)

ID	DATE	difference	list
A	4/22/19	.	.
A	4/24/19	2	.
A	4/25/19	1	.
A	5/6/19	11	1
A	5/17/19	11	2
A	5/22/19	5	.
B	2/4/20	.	.
B	2/12/20	8	.
B	3/4/20	21	1
B	3/10/20	6	.

yabwon · Posted 01-11-2024 01:46 PM

data want;
set have;
by ID;
if first.ID then L=1; drop L;

if DIFFERENCE >=10 then
  do;
    list = L;
    L+1;
  end;
run;

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

yabwon · Posted 01-11-2024 01:48 PM

BTW. in your data step it should be:

input ID$ DATE$ DIFFERENCE;

since the "difference" is a number.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

chimei0403 · Posted 01-11-2024 01:52 PM

Hello @yabwon ,

Thank you so much for the solution/advice promptly. I truly appreciate the help!!!

Tom · Posted 01-11-2024 01:53 PM

Just use a retained variable to count the number of cases where differences is larger than 10.

To get that strange result where the count is MISSING on some observations you will need a second variable.

data have;
  input ID $ DATE :yymmdd. DIFFERENCE ;
  format date yymmdd10. ;
datalines;
A 2019-04-22 .
A 2019-04-24 2
A 2019-04-25 1
A 2019-05-06 11
A 2019-05-17 11
A 2019-05-22 5
B 2020-02-04 .
B 2020-02-12 8
B 2020-03-04 21
B 2020-03-10 6
;

data want;
  set have ;
  by id date ;
  if first.id then count=0;
  if difference > 10 then do;
    count+1;
    list=count;
  end;
run;

Result

Obs    ID          DATE    DIFFERENCE    count    list

  1    A     2019-04-22         .          0        .
  2    A     2019-04-24         2          0        .
  3    A     2019-04-25         1          0        .
  4    A     2019-05-06        11          1        1
  5    A     2019-05-17        11          2        2
  6    A     2019-05-22         5          2        .
  7    B     2020-02-04         .          0        .
  8    B     2020-02-12         8          0        .
  9    B     2020-03-04        21          1        1
 10    B     2020-03-10         6          1        .

If you don't already have the DIFERENCES variable you can calculate it on the fly using the DIF() function. Just remember to replace the result when you start a new BY group.

data have;
  input ID $ DATE :yymmdd. ;;
  format date yymmdd10. ;
datalines;
A 2019-04-22
A 2019-04-24
A 2019-04-25
A 2019-05-06
A 2019-05-17
A 2019-05-22
B 2020-02-04
B 2020-02-12
B 2020-03-04
B 2020-03-10
;

data want;
  set have ;
  by id date ;
  difference = dif(date);
  if first.id then difference=.;
  if first.id then count=0;
  if difference > 10 then do; 
    count+1;
    list=count;
  end;
run;

PS Displaying dates without the century will cause confusion. Displaying dates in ether MDY or DMY order will confuse half of your audience.

chimei0403 · Posted 01-11-2024 02:23 PM

@Tom

Thank you for the advice as well! I will practice a bit and see how those output go!

mkeintz · Posted 01-12-2024 05:02 PM

Your variable DIFFERENCE appears to be DATE-lag(DATE) [otherwise known as DIF(date)], ... except for the first obs of each ID, when it takes a missing values. So with access to the DIF function, you could:

data have;
  input ID $ DATE :yymmdd. DIFFERENCE ;
  format date yymmdd10. ;
datalines;
A 2019-04-22 .
A 2019-04-24 2
A 2019-04-25 1
A 2019-05-06 11
A 2019-05-17 11
A 2019-05-22 5
B 2020-02-04 .
B 2020-02-12 8
B 2020-03-04 21
B 2020-03-10 6
;
data want (drop=_:);
  set have ;
  by id;
  retain _x;  /*Edited addition */
  _x=ifn(first.id,0,sum(_x,dif(date)>10));
  if dif(date)>10 and first.id=0 then list=_x;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Generate a sequence number (a variable) by ID with condition

Re: Generate a sequence number (a variable) by ID with condition

Re: Generate a sequence number (a variable) by ID with condition

Re: Generate a sequence number (a variable) by ID with condition

Re: Generate a sequence number (a variable) by ID with condition

Re: Generate a sequence number (a variable) by ID with condition

Re: Generate a sequence number (a variable) by ID with condition

Generate a sequence number (a variable) by ID with condition

Re: Generate a sequence number (a variable) by ID with condition

Re: Generate a sequence number (a variable) by ID with condition

Re: Generate a sequence number (a variable) by ID with condition

Re: Generate a sequence number (a variable) by ID with condition

Re: Generate a sequence number (a variable) by ID with condition

Re: Generate a sequence number (a variable) by ID with condition

Registration is open

SAS Training: Just a Click Away