I need a quick help.
Wish to generate a variable as sequence number by ID but with a condition (variable named difference is greater than 10 and by id and sequence of the date)
Thank you so much!
data have;
input ID$ DATE$ DIFFERENCE$;
datalines;
A 4/22/19 .
A 4/24/19 2
A 4/25/19 1
A 5/6/19 11
A 5/17/19 11
A 5/22/19 5
B 2/4/20 .
B 2/12/20 8
B 3/4/20 21
B 3/10/20 6
RUN;
Data Wish something like below (i.e. create a variable named list which is a sequence number by ID but with the condition that the variable difference greater than 10 and by id's date order)
ID | DATE | difference | list |
A | 4/22/19 | . | . |
A | 4/24/19 | 2 | . |
A | 4/25/19 | 1 | . |
A | 5/6/19 | 11 | 1 |
A | 5/17/19 | 11 | 2 |
A | 5/22/19 | 5 | . |
B | 2/4/20 | . | . |
B | 2/12/20 | 8 | . |
B | 3/4/20 | 21 | 1 |
B | 3/10/20 | 6 | . |
data want;
set have;
by ID;
if first.ID then L=1; drop L;
if DIFFERENCE >=10 then
do;
list = L;
L+1;
end;
run;
BTW. in your data step it should be:
input ID$ DATE$ DIFFERENCE;
since the "difference" is a number.
Bart
Hello @yabwon ,
Thank you so much for the solution/advice promptly. I truly appreciate the help!!!
Just use a retained variable to count the number of cases where differences is larger than 10.
To get that strange result where the count is MISSING on some observations you will need a second variable.
data have;
input ID $ DATE :yymmdd. DIFFERENCE ;
format date yymmdd10. ;
datalines;
A 2019-04-22 .
A 2019-04-24 2
A 2019-04-25 1
A 2019-05-06 11
A 2019-05-17 11
A 2019-05-22 5
B 2020-02-04 .
B 2020-02-12 8
B 2020-03-04 21
B 2020-03-10 6
;
data want;
set have ;
by id date ;
if first.id then count=0;
if difference > 10 then do;
count+1;
list=count;
end;
run;
Result
Obs ID DATE DIFFERENCE count list 1 A 2019-04-22 . 0 . 2 A 2019-04-24 2 0 . 3 A 2019-04-25 1 0 . 4 A 2019-05-06 11 1 1 5 A 2019-05-17 11 2 2 6 A 2019-05-22 5 2 . 7 B 2020-02-04 . 0 . 8 B 2020-02-12 8 0 . 9 B 2020-03-04 21 1 1 10 B 2020-03-10 6 1 .
If you don't already have the DIFERENCES variable you can calculate it on the fly using the DIF() function. Just remember to replace the result when you start a new BY group.
data have;
input ID $ DATE :yymmdd. ;;
format date yymmdd10. ;
datalines;
A 2019-04-22
A 2019-04-24
A 2019-04-25
A 2019-05-06
A 2019-05-17
A 2019-05-22
B 2020-02-04
B 2020-02-12
B 2020-03-04
B 2020-03-10
;
data want;
set have ;
by id date ;
difference = dif(date);
if first.id then difference=.;
if first.id then count=0;
if difference > 10 then do;
count+1;
list=count;
end;
run;
PS Displaying dates without the century will cause confusion. Displaying dates in ether MDY or DMY order will confuse half of your audience.
Thank you for the advice as well! I will practice a bit and see how those output go!
Your variable DIFFERENCE appears to be DATE-lag(DATE) [otherwise known as DIF(date)], ... except for the first obs of each ID, when it takes a missing values. So with access to the DIF function, you could:
data have;
input ID $ DATE :yymmdd. DIFFERENCE ;
format date yymmdd10. ;
datalines;
A 2019-04-22 .
A 2019-04-24 2
A 2019-04-25 1
A 2019-05-06 11
A 2019-05-17 11
A 2019-05-22 5
B 2020-02-04 .
B 2020-02-12 8
B 2020-03-04 21
B 2020-03-10 6
;
data want (drop=_:);
set have ;
by id;
retain _x; /*Edited addition */
_x=ifn(first.id,0,sum(_x,dif(date)>10));
if dif(date)>10 and first.id=0 then list=_x;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.