I need a quick help.
Wish to generate a variable as sequence number by ID but with a condition (variable named difference is greater than 10 and by id and sequence of the date)
Thank you so much!
data have;
input ID$ DATE$ DIFFERENCE$;
datalines;
A 4/22/19 .
A 4/24/19 2
A 4/25/19 1
A 5/6/19 11
A 5/17/19 11
A 5/22/19 5
B 2/4/20 .
B 2/12/20 8
B 3/4/20 21
B 3/10/20 6
RUN;
Data Wish something like below (i.e. create a variable named list which is a sequence number by ID but with the condition that the variable difference greater than 10 and by id's date order)
ID | DATE | difference | list |
A | 4/22/19 | . | . |
A | 4/24/19 | 2 | . |
A | 4/25/19 | 1 | . |
A | 5/6/19 | 11 | 1 |
A | 5/17/19 | 11 | 2 |
A | 5/22/19 | 5 | . |
B | 2/4/20 | . | . |
B | 2/12/20 | 8 | . |
B | 3/4/20 | 21 | 1 |
B | 3/10/20 | 6 | . |
data want;
set have;
by ID;
if first.ID then L=1; drop L;
if DIFFERENCE >=10 then
do;
list = L;
L+1;
end;
run;
BTW. in your data step it should be:
input ID$ DATE$ DIFFERENCE;
since the "difference" is a number.
Bart
Hello @yabwon ,
Thank you so much for the solution/advice promptly. I truly appreciate the help!!!
Just use a retained variable to count the number of cases where differences is larger than 10.
To get that strange result where the count is MISSING on some observations you will need a second variable.
data have;
input ID $ DATE :yymmdd. DIFFERENCE ;
format date yymmdd10. ;
datalines;
A 2019-04-22 .
A 2019-04-24 2
A 2019-04-25 1
A 2019-05-06 11
A 2019-05-17 11
A 2019-05-22 5
B 2020-02-04 .
B 2020-02-12 8
B 2020-03-04 21
B 2020-03-10 6
;
data want;
set have ;
by id date ;
if first.id then count=0;
if difference > 10 then do;
count+1;
list=count;
end;
run;
Result
Obs ID DATE DIFFERENCE count list 1 A 2019-04-22 . 0 . 2 A 2019-04-24 2 0 . 3 A 2019-04-25 1 0 . 4 A 2019-05-06 11 1 1 5 A 2019-05-17 11 2 2 6 A 2019-05-22 5 2 . 7 B 2020-02-04 . 0 . 8 B 2020-02-12 8 0 . 9 B 2020-03-04 21 1 1 10 B 2020-03-10 6 1 .
If you don't already have the DIFERENCES variable you can calculate it on the fly using the DIF() function. Just remember to replace the result when you start a new BY group.
data have;
input ID $ DATE :yymmdd. ;;
format date yymmdd10. ;
datalines;
A 2019-04-22
A 2019-04-24
A 2019-04-25
A 2019-05-06
A 2019-05-17
A 2019-05-22
B 2020-02-04
B 2020-02-12
B 2020-03-04
B 2020-03-10
;
data want;
set have ;
by id date ;
difference = dif(date);
if first.id then difference=.;
if first.id then count=0;
if difference > 10 then do;
count+1;
list=count;
end;
run;
PS Displaying dates without the century will cause confusion. Displaying dates in ether MDY or DMY order will confuse half of your audience.
Thank you for the advice as well! I will practice a bit and see how those output go!
Your variable DIFFERENCE appears to be DATE-lag(DATE) [otherwise known as DIF(date)], ... except for the first obs of each ID, when it takes a missing values. So with access to the DIF function, you could:
data have;
input ID $ DATE :yymmdd. DIFFERENCE ;
format date yymmdd10. ;
datalines;
A 2019-04-22 .
A 2019-04-24 2
A 2019-04-25 1
A 2019-05-06 11
A 2019-05-17 11
A 2019-05-22 5
B 2020-02-04 .
B 2020-02-12 8
B 2020-03-04 21
B 2020-03-10 6
;
data want (drop=_:);
set have ;
by id;
retain _x; /*Edited addition */
_x=ifn(first.id,0,sum(_x,dif(date)>10));
if dif(date)>10 and first.id=0 then list=_x;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.