BookmarkSubscribeRSS Feed
chimei0403
Obsidian | Level 7

I need a quick help.

Wish to generate a variable as sequence number by ID but with a condition (variable named difference is greater than 10 and by id and sequence of the date)

 Thank you so much!

 

data have;
input ID$ DATE$ DIFFERENCE$;
datalines;
A 4/22/19 .
A 4/24/19 2
A 4/25/19 1
A 5/6/19 11
A 5/17/19 11
A 5/22/19 5
B 2/4/20 .
B 2/12/20 8
B 3/4/20 21
B 3/10/20 6
RUN;

 

Data Wish something like below (i.e. create a variable named list which is a sequence number by ID but with the condition that the variable difference greater than 10 and by id's date order)

 

ID DATE difference list
A 4/22/19 . .
A 4/24/19 2 .
A 4/25/19 1 .
A 5/6/19 11 1
A 5/17/19 11 2
A 5/22/19 5 .
B 2/4/20 . .
B 2/12/20 8 .
B 3/4/20 21 1
B 3/10/20 6 .
6 REPLIES 6
yabwon
Onyx | Level 15
data want;
set have;
by ID;
if first.ID then L=1; drop L;

if DIFFERENCE >=10 then
  do;
    list = L;
    L+1;
  end;
run;
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

BTW. in your data step it should be:

input ID$ DATE$ DIFFERENCE;

since the "difference" is a number. 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



chimei0403
Obsidian | Level 7

Hello @yabwon ,

Thank you so much for the solution/advice  promptly. I truly appreciate the help!!!

Tom
Super User Tom
Super User

Just use a retained variable to count the number of cases where differences is larger than 10.

To get that strange result where the count is MISSING on some observations you will need a second variable.

data have;
  input ID $ DATE :yymmdd. DIFFERENCE ;
  format date yymmdd10. ;
datalines;
A 2019-04-22 .
A 2019-04-24 2
A 2019-04-25 1
A 2019-05-06 11
A 2019-05-17 11
A 2019-05-22 5
B 2020-02-04 .
B 2020-02-12 8
B 2020-03-04 21
B 2020-03-10 6
;

data want;
  set have ;
  by id date ;
  if first.id then count=0;
  if difference > 10 then do;
    count+1;
    list=count;
  end;
run;

Result

Obs    ID          DATE    DIFFERENCE    count    list

  1    A     2019-04-22         .          0        .
  2    A     2019-04-24         2          0        .
  3    A     2019-04-25         1          0        .
  4    A     2019-05-06        11          1        1
  5    A     2019-05-17        11          2        2
  6    A     2019-05-22         5          2        .
  7    B     2020-02-04         .          0        .
  8    B     2020-02-12         8          0        .
  9    B     2020-03-04        21          1        1
 10    B     2020-03-10         6          1        .

If you don't already have the DIFERENCES variable you can calculate it on the fly using the DIF() function.  Just remember to replace the result when you start a new BY group.

data have;
  input ID $ DATE :yymmdd. ;;
  format date yymmdd10. ;
datalines;
A 2019-04-22
A 2019-04-24
A 2019-04-25
A 2019-05-06
A 2019-05-17
A 2019-05-22
B 2020-02-04
B 2020-02-12
B 2020-03-04
B 2020-03-10
;

data want;
  set have ;
  by id date ;
  difference = dif(date);
  if first.id then difference=.;
  if first.id then count=0;
  if difference > 10 then do; 
    count+1;
    list=count;
  end;
run;

PS Displaying dates without the century will cause confusion.  Displaying dates in ether MDY or DMY order will confuse half of your audience.

 

chimei0403
Obsidian | Level 7

@Tom 

Thank you for the advice as well! I will practice a bit and see how those output go!

mkeintz
PROC Star

Your variable DIFFERENCE appears to be DATE-lag(DATE)  [otherwise known as DIF(date)], ... except for the first obs of each ID, when it takes a missing values.  So with access to the DIF function, you could:

 

data have;
  input ID $ DATE :yymmdd. DIFFERENCE ;
  format date yymmdd10. ;
datalines;
A 2019-04-22 .
A 2019-04-24 2
A 2019-04-25 1
A 2019-05-06 11
A 2019-05-17 11
A 2019-05-22 5
B 2020-02-04 .
B 2020-02-12 8
B 2020-03-04 21
B 2020-03-10 6
;
data want (drop=_:);
  set have ;
  by id;
  retain _x;  /*Edited addition */
  _x=ifn(first.id,0,sum(_x,dif(date)>10));
  if dif(date)>10 and first.id=0 then list=_x;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 580 views
  • 4 likes
  • 4 in conversation