BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
huhuhu
Obsidian | Level 7

Hello,

 

I got stuck in the process to calculate the difference between two dates under a specific requirement. My data looks like below (Diff and Count are the variables that I desire to have)

ID   Date          FirstPositive  Diff       Count

A     1/1/2020        .                .              .

A      1/3/2020      1                 .             1 

A      1/4/2020       .                 1             2

A      1/5/2020       .                 1             3

B      1/4/2020      1                 .              1

B      1/5/2020       .                 1             2

I want to calculate the difference between dates for each ID, but start from the row when FirstPositive =1. I also would like to have the count variable which count the number of rows for each ID start from the row when FirstPositive =1.

 

Any advice would be appreciated!

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

See if this gets you started.

data have;
   input ID $  Date :mmddyy10.  FirstPositive;
   format date mmddyy10.;
datalines;
A      1/1/2020        .    
A      1/3/2020      1     
A      1/4/2020       .    
A      1/5/2020       .    
B      1/4/2020      1     
B      1/5/2020       .    
;

data want;
   set have;
   by id;
   retain count  posflag  ;
   difdate= dif(date);
   if first.id then call missing(count,posflag);
   if posflag then diff=difdate;
   if firstPositive=1 then posflag=1;
   if posflag then count+1;
   drop posflag difdate;
run;

You may have to sort your data set by ID and Date prior to the Want data set.

If you have not seen these functions before:

Retain keeps variable values from iteration of the data step to the next.

DIF is a function that returns the current value of a variable minus the previous value.

When using BY statement SAS creates automatic variables First. and Last. that indicate whether the current is the first or last that level of a by variable.

Call missing is a function that can set a number of variables to missing values.

Timing of calculations is the main part of this problem with the when to set the diff value in relation to the iteration of the count.

View solution in original post

6 REPLIES 6
Jagadishkatam
Amethyst | Level 16

Please try the below code

 

data have;
input ID$ Date:mmddyy10. FirstPositive;
cards;
A 1/1/2020 .         
A 1/3/2020 1          
A 1/4/2020 .         
A 1/5/2020 .         
B 1/4/2020 1          
B 1/5/2020 .        
;
 
data want;
set have;
by id notsorted;
retain FirstPositive2;
if first.id then do;FirstPositive2=.;count=.;end;
if FirstPositive ne . then FirstPositive2=FirstPositive;
count+FirstPositive2;
if FirstPositive2 ne . then diff=Date-lag(date);
if first.id then diff=.;
run;
Thanks,
Jag
ballardw
Super User

See if this gets you started.

data have;
   input ID $  Date :mmddyy10.  FirstPositive;
   format date mmddyy10.;
datalines;
A      1/1/2020        .    
A      1/3/2020      1     
A      1/4/2020       .    
A      1/5/2020       .    
B      1/4/2020      1     
B      1/5/2020       .    
;

data want;
   set have;
   by id;
   retain count  posflag  ;
   difdate= dif(date);
   if first.id then call missing(count,posflag);
   if posflag then diff=difdate;
   if firstPositive=1 then posflag=1;
   if posflag then count+1;
   drop posflag difdate;
run;

You may have to sort your data set by ID and Date prior to the Want data set.

If you have not seen these functions before:

Retain keeps variable values from iteration of the data step to the next.

DIF is a function that returns the current value of a variable minus the previous value.

When using BY statement SAS creates automatic variables First. and Last. that indicate whether the current is the first or last that level of a by variable.

Call missing is a function that can set a number of variables to missing values.

Timing of calculations is the main part of this problem with the when to set the diff value in relation to the iteration of the count.

huhuhu
Obsidian | Level 7
Thank you so much. That's exactly what I'm looking for. I also appreciate your detailed explanation of functions used here. so helpful!
smantha
Lapis Lazuli | Level 10
data sample(Drop = diff count);
informat ID $3. Date mmddyy10. FirstPositive Diff Count 3.;
input ID $3. Date FirstPositive Diff Count;
format date mmddyy10.;
datalines;
A 1/1/2020 . . .
A 1/3/2020 1 . 1 
A 1/4/2020 . 1 2
A 1/5/2020 . 1 3
B 1/4/2020 1 . 1
B 1/5/2020 . 1 2
;
proc sort; by ID Date; run;

data sample(drop=prv_date);
set sample;
by id;
retain count Diff prv_date 0;
if first.id then do;
count=.; Diff=.;
prv_date = .;
end;

if FirstPositive = 1 then do;
count=1;
prv_date = Date;
end;
else if count > 0 then do;
count = count+1;
Diff = Date-prv_date;
prv_date=date;
end;
run;
novinosrin
Tourmaline | Level 20

Hi @huhuhu  Your case presents a nice scenario for yet another "dorfmanisms" aka automatic variables usage-

data have;
   input ID $  Date :mmddyy10.  FirstPositive;
   format date mmddyy10.;
datalines;
A      1/1/2020        .    
A      1/3/2020      1     
A      1/4/2020       .    
A      1/5/2020       .    
B      1/4/2020      1     
B      1/5/2020       .    
;

data want;
 do until(last.id);
  set have;
  by id;
  diff=dif(date);
  if _n_ then diff=.;
  if FirstPositive then _n_=0;
  if _n_=0  then  count=sum(count,1);
  output;
 end;
run;
r_behata
Barite | Level 11
data have;
input ID  $ Date : mmddyy10.  FirstPositive;
format date mmddyy10.;
cards;
A 1/1/2020 .
A 1/3/2020 1
A 1/4/2020 .
A 1/5/2020 .
B 1/4/2020 1
B 1/5/2020 .
;
run;



data want;
	Set have;
	by ID FirstPositive notsorted;

	retain FirstPositive_;

	if FirstPositive=1 then FirstPositive_=FirstPositive;

	if first.id then count=.;
		if FirstPositive_=1  then count+ (1 * FirstPositive_);	else count =.;

	dif=ifn(first.ID=0 and count > FirstPositive_ ,dif(date),.);

	drop FirstPositive_ ;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1006 views
  • 1 like
  • 6 in conversation