Hello,
I got stuck in the process to calculate the difference between two dates under a specific requirement. My data looks like below (Diff and Count are the variables that I desire to have)
ID Date FirstPositive Diff Count
A 1/1/2020 . . .
A 1/3/2020 1 . 1
A 1/4/2020 . 1 2
A 1/5/2020 . 1 3
B 1/4/2020 1 . 1
B 1/5/2020 . 1 2
I want to calculate the difference between dates for each ID, but start from the row when FirstPositive =1. I also would like to have the count variable which count the number of rows for each ID start from the row when FirstPositive =1.
Any advice would be appreciated!
Thank you!
See if this gets you started.
data have; input ID $ Date :mmddyy10. FirstPositive; format date mmddyy10.; datalines; A 1/1/2020 . A 1/3/2020 1 A 1/4/2020 . A 1/5/2020 . B 1/4/2020 1 B 1/5/2020 . ; data want; set have; by id; retain count posflag ; difdate= dif(date); if first.id then call missing(count,posflag); if posflag then diff=difdate; if firstPositive=1 then posflag=1; if posflag then count+1; drop posflag difdate; run;
You may have to sort your data set by ID and Date prior to the Want data set.
If you have not seen these functions before:
Retain keeps variable values from iteration of the data step to the next.
DIF is a function that returns the current value of a variable minus the previous value.
When using BY statement SAS creates automatic variables First. and Last. that indicate whether the current is the first or last that level of a by variable.
Call missing is a function that can set a number of variables to missing values.
Timing of calculations is the main part of this problem with the when to set the diff value in relation to the iteration of the count.
Please try the below code
data have;
input ID$ Date:mmddyy10. FirstPositive;
cards;
A 1/1/2020 .         
A 1/3/2020 1          
A 1/4/2020 .         
A 1/5/2020 .         
B 1/4/2020 1          
B 1/5/2020 .        
;
 
data want;
set have;
by id notsorted;
retain FirstPositive2;
if first.id then do;FirstPositive2=.;count=.;end;
if FirstPositive ne . then FirstPositive2=FirstPositive;
count+FirstPositive2;
if FirstPositive2 ne . then diff=Date-lag(date);
if first.id then diff=.;
run;
See if this gets you started.
data have; input ID $ Date :mmddyy10. FirstPositive; format date mmddyy10.; datalines; A 1/1/2020 . A 1/3/2020 1 A 1/4/2020 . A 1/5/2020 . B 1/4/2020 1 B 1/5/2020 . ; data want; set have; by id; retain count posflag ; difdate= dif(date); if first.id then call missing(count,posflag); if posflag then diff=difdate; if firstPositive=1 then posflag=1; if posflag then count+1; drop posflag difdate; run;
You may have to sort your data set by ID and Date prior to the Want data set.
If you have not seen these functions before:
Retain keeps variable values from iteration of the data step to the next.
DIF is a function that returns the current value of a variable minus the previous value.
When using BY statement SAS creates automatic variables First. and Last. that indicate whether the current is the first or last that level of a by variable.
Call missing is a function that can set a number of variables to missing values.
Timing of calculations is the main part of this problem with the when to set the diff value in relation to the iteration of the count.
data sample(Drop = diff count);
informat ID $3. Date mmddyy10. FirstPositive Diff Count 3.;
input ID $3. Date FirstPositive Diff Count;
format date mmddyy10.;
datalines;
A 1/1/2020 . . .
A 1/3/2020 1 . 1 
A 1/4/2020 . 1 2
A 1/5/2020 . 1 3
B 1/4/2020 1 . 1
B 1/5/2020 . 1 2
;
proc sort; by ID Date; run;
data sample(drop=prv_date);
set sample;
by id;
retain count Diff prv_date 0;
if first.id then do;
count=.; Diff=.;
prv_date = .;
end;
if FirstPositive = 1 then do;
count=1;
prv_date = Date;
end;
else if count > 0 then do;
count = count+1;
Diff = Date-prv_date;
prv_date=date;
end;
run;Hi @huhuhu Your case presents a nice scenario for yet another "dorfmanisms" aka automatic variables usage-
data have;
   input ID $  Date :mmddyy10.  FirstPositive;
   format date mmddyy10.;
datalines;
A      1/1/2020        .    
A      1/3/2020      1     
A      1/4/2020       .    
A      1/5/2020       .    
B      1/4/2020      1     
B      1/5/2020       .    
;
data want;
 do until(last.id);
  set have;
  by id;
  diff=dif(date);
  if _n_ then diff=.;
  if FirstPositive then _n_=0;
  if _n_=0  then  count=sum(count,1);
  output;
 end;
run;data have;
input ID  $ Date : mmddyy10.  FirstPositive;
format date mmddyy10.;
cards;
A 1/1/2020 .
A 1/3/2020 1
A 1/4/2020 .
A 1/5/2020 .
B 1/4/2020 1
B 1/5/2020 .
;
run;
data want;
	Set have;
	by ID FirstPositive notsorted;
	retain FirstPositive_;
	if FirstPositive=1 then FirstPositive_=FirstPositive;
	if first.id then count=.;
		if FirstPositive_=1  then count+ (1 * FirstPositive_);	else count =.;
	dif=ifn(first.ID=0 and count > FirstPositive_ ,dif(date),.);
	drop FirstPositive_ ;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
