BookmarkSubscribeRSS Feed
rsva
Fluorite | Level 6

I have a data similar to one below

IDDate

1

11jul2009
106jul2010
101jul2012
217jul2008
201jul2001
306jul2001
311jul2005
301jul2011
417jul2011
401jul2012
511jul2006
601jul2000
617jul2003
601jul2009
717jul2004

I am trying to create a data that looks like below

IDDt1Dt2
1

11jul2009

01jul2012
217jul200801jul2011
306jul200101jul2011
417jul201101jul2012
511jul200611jul2006
601jul200001jul2009
717jul200417jul2004

I short I am trying to summarize the minimum and maximum dates by id. Note: the number of times an id repeats is not same. ID1 has 3 rows, ID2 has 2 rows, ID5 has 1 row. I tried the lag function but it's not giving the expected results.

Thanks.

9 REPLIES 9
MikeZdeb
Rhodochrosite | Level 12

hi ... maybe SQL ...

data x;

input id date :date9. @@;

format date date9.;

datalines;

1 11jul2009 1 06jul2010 1 01jul2012

2 17jul2008 2 01jul2001

3 06jul2001 3 11jul2005 3 01jul2011

4 17jul2011 4 01jul2012

5 11jul2006

6 01jul2000 6 17jul2003 6 01jul2009

7 17jul2004

;

proc sql;

create table minmax as

select id, min(date) as dt1 format=date9., max(date) as dt2 format=date9.

from x

group by id;

quit;

rsva
Fluorite | Level 6

Thanks. But my data has dates in rows not in multiple columns.

MikeZdeb
Rhodochrosite | Level 12

Hi ... my solution creates a data set with dates in one variable named DATE (the way the raw data looks in the data step has no relationship to the way the data set looks)

(curious ... do you know how  an INPUT statement that ends with "@@"  works)

also, MISSING values would present no problem with the SQL solution I posted

data x;

input id date :date9. @@;

format date date9.;

datalines;

1 11jul2009 1 06jul2010 1 01jul2012

2 17jul2008 2 01jul2001

3 06jul2001 3 11jul2005 3 01jul2011

4 17jul2011 4 01jul2012

5 11jul2006

6 01jul2000 6 17jul2003 6 01jul2009

7 17jul2004

;


proc print data=x;

run;

Obs    id         date

  1     1    11JUL2009

  2     1    06JUL2010

  3     1    01JUL2012

  4     2    17JUL2008

  5     2    01JUL2001

  6     3    06JUL2001

  7     3    11JUL2005

  8     3    01JUL2011

  9     4    17JUL2011

10     4    01JUL2012

11     5    11JUL2006

12     6    01JUL2000

13     6    17JUL2003

14     6    01JUL2009

15     7    17JUL2004

rsva
Fluorite | Level 6

MikeZdeb,

My mistake. I overlooked the @@ in the INPUT statement. It worked.

mkeintz
PROC Star

Since you're using the lag function, I presume you are using a DATA step, so in that vein here is the classic DATA step solution ("classic" in the sense that it's fhe first technique I became acquainted with):

data want (drop=date);

  set have ;

  by id;

  retain mindate;

  if first.id then mindate=date;

  if last.id then do;

    maxdate=date;

    output;

  end;

run;

Note this assumes there are no missing date values and the data is sorted by ID DATE.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
rsva
Fluorite | Level 6

It worked. Lukcily I do not have missing data. Any suggestons if my data has missing values? Just curious?

Thanks.

joehinson
Calcite | Level 5

Certainly. Works with missing numbers too:

data have;

input id date :date9.;

datalines;

1 11jul2009

1 06jul2010

1 01jul2012

2 17jul2008

2 01jul2001

3 .

3 .

3 .

4 17jul2011

4 .

5 11jul2006

6 01jul2000

6 .

6 .

7 17jul2004

;

run;

RESULTS:

111JUL200901JUL2012
217JUL200801JUL2001
3..
417JUL2011.
511JUL200611JUL2006
601JUL2000.
717JUL200417JUL2004

mkeintz
PROC Star

If you have a missing value for date, and if the data are sorted by ID then DATE, it would mean that the first date(s) is missing (all missing values compare as lower than the most negative valid number).

So you could change the single line

    if first.id then mindate=date;

to two lines:

   if first.id then mindate=date;

   if mindate=. then mindate=date;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
joehinson
Calcite | Level 5

...and a Hash solution:

data _null_;

     if (1=2) then set have;

     format date1 date2 date9.; call missing(date1,date2);

     declare hash dt (ordered:'a');

          dt.defineKey("id");

          dt.defineData("id","date1","date2");

          dt.defineDone();

     do until(done);

          set have end=done;

          by id notsorted;

          if first.id then do;

               date1=date;

               rc=dt.replace();

          end;

          if last.id then do;

               date2=date;

               rc=dt.replace();

          end;

     end;

     dt.output(dataset:"want");

stop;

run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2652 views
  • 0 likes
  • 4 in conversation