BookmarkSubscribeRSS Feed
chimei0403
Obsidian | Level 7

I need a quick help.

Wish to generate a variable as sequence number by ID but with a condition (variable named difference is greater than 10 and by id and sequence of the date)

 Thank you so much!

 

data have;
input ID$ DATE$ DIFFERENCE$;
datalines;
A 4/22/19 .
A 4/24/19 2
A 4/25/19 1
A 5/6/19 11
A 5/17/19 11
A 5/22/19 5
B 2/4/20 .
B 2/12/20 8
B 3/4/20 21
B 3/10/20 6
RUN;

 

Data Wish something like below (i.e. create a variable named list which is a sequence number by ID but with the condition that the variable difference greater than 10 and by id's date order)

 

ID DATE difference list
A 4/22/19 . .
A 4/24/19 2 .
A 4/25/19 1 .
A 5/6/19 11 1
A 5/17/19 11 2
A 5/22/19 5 .
B 2/4/20 . .
B 2/12/20 8 .
B 3/4/20 21 1
B 3/10/20 6 .
6 REPLIES 6
yabwon
Onyx | Level 15
data want;
set have;
by ID;
if first.ID then L=1; drop L;

if DIFFERENCE >=10 then
  do;
    list = L;
    L+1;
  end;
run;
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

BTW. in your data step it should be:

input ID$ DATE$ DIFFERENCE;

since the "difference" is a number. 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



chimei0403
Obsidian | Level 7

Hello @yabwon ,

Thank you so much for the solution/advice  promptly. I truly appreciate the help!!!

Tom
Super User Tom
Super User

Just use a retained variable to count the number of cases where differences is larger than 10.

To get that strange result where the count is MISSING on some observations you will need a second variable.

data have;
  input ID $ DATE :yymmdd. DIFFERENCE ;
  format date yymmdd10. ;
datalines;
A 2019-04-22 .
A 2019-04-24 2
A 2019-04-25 1
A 2019-05-06 11
A 2019-05-17 11
A 2019-05-22 5
B 2020-02-04 .
B 2020-02-12 8
B 2020-03-04 21
B 2020-03-10 6
;

data want;
  set have ;
  by id date ;
  if first.id then count=0;
  if difference > 10 then do;
    count+1;
    list=count;
  end;
run;

Result

Obs    ID          DATE    DIFFERENCE    count    list

  1    A     2019-04-22         .          0        .
  2    A     2019-04-24         2          0        .
  3    A     2019-04-25         1          0        .
  4    A     2019-05-06        11          1        1
  5    A     2019-05-17        11          2        2
  6    A     2019-05-22         5          2        .
  7    B     2020-02-04         .          0        .
  8    B     2020-02-12         8          0        .
  9    B     2020-03-04        21          1        1
 10    B     2020-03-10         6          1        .

If you don't already have the DIFERENCES variable you can calculate it on the fly using the DIF() function.  Just remember to replace the result when you start a new BY group.

data have;
  input ID $ DATE :yymmdd. ;;
  format date yymmdd10. ;
datalines;
A 2019-04-22
A 2019-04-24
A 2019-04-25
A 2019-05-06
A 2019-05-17
A 2019-05-22
B 2020-02-04
B 2020-02-12
B 2020-03-04
B 2020-03-10
;

data want;
  set have ;
  by id date ;
  difference = dif(date);
  if first.id then difference=.;
  if first.id then count=0;
  if difference > 10 then do; 
    count+1;
    list=count;
  end;
run;

PS Displaying dates without the century will cause confusion.  Displaying dates in ether MDY or DMY order will confuse half of your audience.

 

chimei0403
Obsidian | Level 7

@Tom 

Thank you for the advice as well! I will practice a bit and see how those output go!

mkeintz
PROC Star

Your variable DIFFERENCE appears to be DATE-lag(DATE)  [otherwise known as DIF(date)], ... except for the first obs of each ID, when it takes a missing values.  So with access to the DIF function, you could:

 

data have;
  input ID $ DATE :yymmdd. DIFFERENCE ;
  format date yymmdd10. ;
datalines;
A 2019-04-22 .
A 2019-04-24 2
A 2019-04-25 1
A 2019-05-06 11
A 2019-05-17 11
A 2019-05-22 5
B 2020-02-04 .
B 2020-02-12 8
B 2020-03-04 21
B 2020-03-10 6
;
data want (drop=_:);
  set have ;
  by id;
  retain _x;  /*Edited addition */
  _x=ifn(first.id,0,sum(_x,dif(date)>10));
  if dif(date)>10 and first.id=0 then list=_x;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1198 views
  • 4 likes
  • 4 in conversation