Please forgive my ignorance, but I need to measure days between two dates on different observations of the same individual. Perhaps an example of my data would be helpful:
ID# VAR1 VAR2 VAR1dummy DATE
1 . X 0 30April2015
1 9May2015 . 1 31May2015
1 . . 0 30June2015
2 . X 0 30April2015
2 . . 0 31May2015
2 28June2015 . 1 30June2015
3 3April2015 . 1 30April2015
3 . . 0 31May2015
3 . X 0 30June2015
In the example above, I am attempting to measure the days between when Var1 has a date and when Var2 has a value of 'X' for each individual ID# (I created a dummy variable for when VAR1 has a date). For example, ID#1 has 9 days between when VAR1 appears and when VAR2 has an 'X', ID#2 has 59 and ID#3 has 88
Any help would be much appreciated!
Thanks
Please post an example of your expected output.
Your date format is a bit awkward to turn into SAS date values that would be the first step.
The following reads your example data, creates SAS date values and then does the difference calculation.
data have;
informat id best4. var1text $25. var2 $1. var1dummy best1. datetext $25.;
input ID var1text VAR2 VAR1dummy datetext;
length monname $ 9;
array mon (12) $ 9 _temporary_ ('January','February','March','April','May','June','July','August','September','October','November','December');
if not missing(var1text) then do;
day= input(scan(var1text,1,,'A'),best2.);
monname= scan(var1text,1,,'D');
month = whichc(monname,of mon(*));
year= input(scan(var1text,2,,'A'),best4.);
Var1=mdy(month,day,year);
end;
if not missing(datetext) then do;
day= input(scan(datetext,1,,'A'),best2.);
monname= scan(datetext,1,,'D');
month = whichc(monname,of mon(*));
year= input(scan(datetext,2,,'A'),best4.);
date=mdy(month,day,year);
end;
format var1 date date9.;
drop var1text datetext day monname month year;
datalines;
1 . X 0 30April2015
1 9May2015 . 1 31May2015
1 . . 0 30June2015
2 . X 0 30April2015
2 . . 0 31May2015
2 28June2015 . 1 30June2015
3 3April2015 . 1 30April2015
3 . . 0 31May2015
3 . X 0 30June2015
;
run;
proc sql;
create table want as
select a.*, case
when not missing(a.var1) then abs(b.date-a.var1)
else .
end as datedifference
from have as a left join (select id,date from have where var2='X') as b
on a.id =b.id
;
quit;
Assumptions have been made about the possible values of month in your dates for other months that do not appear in your data. Modify the list in the array statement. Note that the order of the elements in the array is critical.
data T;
D=input('30April2015',anydtdte32.);
put D= date9.;
run;
D=30APR2015
my install, 9.2 i wasn't getting results with the anydtdte but I didn't try with longer than 25.
I always use a length 32 for my informats when the data is not well bound and controlled, it avoids problems.
Related: I don't know why sas decided to use a lower default length, even for the best. informat. That's a dumb decision imho.
Here is a technique that uses PROC SUMMARY to reduce the data.
proc summary data = in nway missing; class id var2;
var var1 date;
output out = reduced( drop = _type_ _freq_ ) min=;
run;
data deltaDays;
set reduced( where = ( var2 = 'X' ) );
deltaDays = date - var1;
run;
data have;
informat id best4. var1text date9. var2 $1. var1dummy best1. datetext date9.;
input ID var1text VAR2 VAR1dummy datetext;
format var1text datetext date9.;
datalines;
1 . X 0 30Apr2015
1 9May2015 . 1 31May2015
1 . . 0 30Jun2015
2 . X 0 30Apr2015
2 . . 0 31May2015
2 28Jun2015 . 1 30Jun2015
3 3Apr2015 . 1 30Apr2015
3 . . 0 31May2015
3 . X 0 30Jun2015
;
run;
data want;
do until(last.id);
set have;
by id;
if not missing(var1text) then x=var1text;
if not missing(VAR2) then y=datetext;
end;
dif=abs(x-y);
keep id dif;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.