BookmarkSubscribeRSS Feed
ccarlyle26
Calcite | Level 5

Please forgive my ignorance, but I need to measure days between two dates on different observations of the same individual. Perhaps an example of my data would be helpful:

 

ID#     VAR1         VAR2     VAR1dummy     DATE

1           .                 X                0              30April2015

1      9May2015        .               1               31May2015

1          .                   .                0              30June2015

2          .                  X                0               30April2015

2          .                   .                0               31May2015

2    28June2015      .                1               30June2015

3      3April2015       .                1              30April2015

3          .                   .                0               31May2015

3          .                  X                0               30June2015

 

In the example above, I am attempting to measure the days between when Var1 has a date and when Var2 has a value of 'X' for each individual ID# (I created a dummy variable for when VAR1 has a date). For example, ID#1 has 9 days between when VAR1 appears and when VAR2 has an 'X', ID#2 has 59 and ID#3 has 88

Any help would be much appreciated!

 

Thanks

7 REPLIES 7
Reeza
Super User

Please post an example of your expected output. 

ballardw
Super User

Your date format is a bit awkward to turn into SAS date values that would be the first step.

The following reads your example data, creates SAS date values and then does the difference calculation.

data have;
    informat id best4. var1text $25. var2 $1. var1dummy best1. datetext $25.;
    input ID  var1text  VAR2  VAR1dummy  datetext;
    length monname $ 9;
    array mon (12) $ 9 _temporary_ ('January','February','March','April','May','June','July','August','September','October','November','December');
    if not missing(var1text) then do;
       day= input(scan(var1text,1,,'A'),best2.);
       monname= scan(var1text,1,,'D');
       month = whichc(monname,of mon(*));
       year= input(scan(var1text,2,,'A'),best4.);
       Var1=mdy(month,day,year);
    end;
    if not missing(datetext) then do;
       day= input(scan(datetext,1,,'A'),best2.);
       monname= scan(datetext,1,,'D');
       month = whichc(monname,of mon(*));
       year= input(scan(datetext,2,,'A'),best4.);
       date=mdy(month,day,year);
    end;
    format var1 date date9.;
    drop var1text datetext day monname month year;

datalines;
1  .  X  0  30April2015
1  9May2015  .  1  31May2015
1  .  .  0  30June2015
2  .  X  0  30April2015
2  .  .  0  31May2015
2  28June2015  .  1  30June2015
3  3April2015  .  1  30April2015
3  .  .  0  31May2015
3  .  X  0  30June2015
;
run;

proc sql;
   create table want as
   select a.*, case 
                  when not missing(a.var1) then abs(b.date-a.var1)
                  else .
               end as datedifference
   from have as a left join (select id,date from have where var2='X') as b
      on a.id =b.id
   ;
quit;

Assumptions have been made about the possible values of month in your dates for other months that do not appear in your data. Modify the list in the array statement. Note that the order of the elements in the array is critical.

 

ChrisNZ
Tourmaline | Level 20

@ballardw

 


data T;
  D=input('30April2015',anydtdte32.); 
  put D= date9.; 
run;

D=30APR2015
ballardw
Super User

my install, 9.2 i wasn't getting results with the anydtdte but I didn't try with longer than 25.

ChrisNZ
Tourmaline | Level 20

I always use a length 32 for my informats when the data is not well bound and controlled, it avoids problems.
Related: I don't know why sas decided to use a lower default length, even for the best. informat. That's a dumb decision imho.

sh0e
Obsidian | Level 7

Here is a technique that uses PROC SUMMARY to reduce the data.

 

proc summary data = in nway missing;
  class id var2;
var var1 date;
output out = reduced( drop = _type_ _freq_ ) min=;
run;

data deltaDays;
set reduced( where = ( var2 = 'X' ) );
deltaDays = date - var1;
run;
Ksharp
Super User
data have;
    informat id best4. var1text date9. var2 $1. var1dummy best1. datetext date9.;
    input ID  var1text  VAR2  VAR1dummy  datetext;
format var1text  datetext date9.;
datalines;
1  .  X  0  30Apr2015
1  9May2015  .  1  31May2015
1  .  .  0  30Jun2015
2  .  X  0  30Apr2015
2  .  .  0  31May2015
2  28Jun2015  .  1  30Jun2015
3  3Apr2015  .  1  30Apr2015
3  .  .  0  31May2015
3  .  X  0  30Jun2015
;
run;
data want;
 do until(last.id);
  set have;
  by id;
  if not missing(var1text) then x=var1text;
  if not missing(VAR2) then y=datetext;
 end;
 dif=abs(x-y);
keep id dif;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1235 views
  • 0 likes
  • 6 in conversation