BookmarkSubscribeRSS Feed
CathyVI
Lapis Lazuli | Level 10

Hello,

Please how can I keep the first_date and last date on the same row. Here is my code but didn't work. Thanks

data have;
input ID $6. event_dt  First_dt Last_dt  ;
format event_dt  First_dt  Last_dt  date9.;
informat event_dt  First_dt  Last_dt  date9.;
datalines;
011396   14SEP2023  .  14SEP2023   
011396   02AUG2023  .  .   
011396   18AUG2021  18AUG2021  .   
011396   23JUN2023  .  .   
034627   01DEC2009  01DEC2009  .   
034627   24JUN2023  .  24JUN2023   
034627   11DEC2019  .  .   
011427   11SEP2010  .  .  
011427   09AUG2010  09AUG2010  . 
011427   11SEP2012  .  .  
011427   11JUN2013  .  11JUN2013  
;
run;

proc sort data=have; by id event_dt;run;

data want;
set have;
by id;
format event_dt  First_dt  Last_dt  date9.;
if First_dt ne . or Last_dt ne . then last_date=Last_dt;
run;

Desired output

ID

event_dt

First_dt

Last_date

011396

18AUG2021

18AUG2021

14SEP2023

034627

01DEC2009

01DEC2009

24JUN2023

011427

09AUG2010

09AUG2010

11JUN2013

8 REPLIES 8
PaigeMiller
Diamond | Level 26

This is a job for PROC SUMMARY

 

proc summary data=have nway;
    class id;
    var event_dt first_dt last_dt;
    output out=want min(event_dt first_dt)= max(last_dt)=;
run;
--
Paige Miller
CathyVI
Lapis Lazuli | Level 10

@PaigeMiller Thanks alot this works. I have alot of variables in the var statement. I want to keep all so I used the _all_ but didn't work. Any suggestions so I don't have to type out all the variables?

PaigeMiller
Diamond | Level 26

Show us the code you used, and show us what didn't work (if there are errors, show us the ENTIRE log from PROC SUMMARY; if you get unexpected output show us the unexpected output and then explain what the correct output should be). Please provide a data set that better represents the actual problem.

--
Paige Miller
CathyVI
Lapis Lazuli | Level 10

@PaigeMiller  I mean how can I add other variables in the var statement without typing all the variables. When I used _all_ this is the error I got.  

proc summary data=have nway;
    class id;
    var _all_;
    output out=want min(event_dt first_dt)= max(last_dt)=;
run;

ERROR: Variable LAST_NAME in list does not match type prescribed for this list.
ERROR: Variable FIRST_NAME in list does not match type prescribed for this list.
ERROR: Variable DISPOSITION in list does not match type prescribed for this list.
ERROR: Variable dx1 in list does not match type prescribed for this list.
ERROR: Variable dx2 in list does not match type prescribed for this list.
ERROR: Variable dx3 in list does not match type prescribed for this list.
ERROR: Variable dx4 in list does not match type prescribed for this list.
ERROR: Variable dx5 in list does not match type prescribed for this list.
ERROR: Variable dx6 in list does not match type prescribed for this list.
ERROR: Variable dx7 in list does not match type prescribed for this list.
ERROR: Variable dx8 in list does not match type prescribed for this list.
ERROR: Variable dx9 in list does not match type prescribed for this list.
ERROR: Variable source in list does not match type prescribed for this list.
PaigeMiller
Diamond | Level 26

You have switched from numeric variables to character variables. In that case, PROC SUMMARY will not get the job done. Please provide a realistic data set that illustrates the problem, and also show us the desired output.

--
Paige Miller
ballardw
Super User

Variables on the VAR statement in Proc Summary/Means (and quite a few other procedures) must be numeric.

If you want all of the numeric variables on the Var statement use the _NUMERIC_  list key word:

 

proc summary data=have nway;
    class id;
    var _numeric_;
    output out=want min(event_dt first_dt)= max(last_dt)=;
run;

Do also note that any numeric variable without specific statistics listed will not actually appear in the output data set when one or more other variables has statistics specified. So I'm not sure you are going to save much typing as the OUTPUT statement will be quite a bit longer. So some specifics of what you are actually expecting to see

Tom
Super User Tom
Super User

I cannot make much sense of this original request. And the update to also use character variables totally eliminates any possibility of understanding what you want.

 

Is the goal to collapse multiple observations into one observations?

If so then it sounds like you want to convert each varaible into two new variables. So if you start with 2 original variables, say DATE  and DX then you want to collapse to one observations with four variables that replace the original two variables.  And which ones are the "first" and which are the "last"?  Do you mean the multiple observations are already sorted and you want to keep only the first observation and the last observation for each BY group?  If that is case BY group processing sounds like the way to go.  Perhaps something like this?

 

data first last ;
  set have;
  by id date ;
  if first.id then output first;
  else if last.id then output last;
run;

data want;
   merge first(rename=(dx=first_dx date=first_date))
         last(rename=(dx=last_dx date=last_date))
   ;
   by id;
run;

So if we have data like this:

data have;
  input id date :date. dx :$10. ;
  format date date9.;
cards;
1 01jan2020 ABC
1 01feb2020 DEF
1 01mar2020 XYZ
2 01apr2020 DX1
3 01jun2020 DX2
3 02jun2020 dx3
;

You will get this result:

Tom_0-1740703529273.png

You could do something similar with PROC SUMMARY.

proc summary data=have;
  by id;
  var date ;
  output out=want(drop=_type_ _freq_) 
    idgroup (min(date) out(date dx)=first_date first_dx)
    idgroup (max(date) out(date dx)=last_date last_dx)
  ;
run;

But for the groups that have only one observation its values will appear as both the first and the last.

Tom_0-1740704545420.png

 

 

Astounding
PROC Star

Here's a variation that gives you all the data you might ever want ... but lets you postpone the typing of all the variable names until you know the numbers you want.  There is no way to avoid naming the variables you want however.  Apologies for stealing bits of code here and there from earlier posts.

proc summary data=have nway;
   class id;
   var _numeric_;
   output out=minimums min=;
   output out=maxiimums max=;
run;

This gives you two new data sets (I'll describe just one).  MINIMUMS contains one observation per ID, with the smallest value of each numeric variable.  You don't have to name any of them, because this program re-uses the same variable name.  In MINIMUMS, there is one observation per ID, and EVENT_DT holds the minimum value of EVENT_DT from the original data.  So the same variable name gets re-used without any typing or inventing names on your part.

 

You can always decide later which of the output pieces you want to use in your report.  The price you pay for this simplicity is on the back end when it is more complex to select the pieces that you want.  Here are the hoops you might have to jump through when using the data:

data want;
    merge have 
          minimums (keep=id event_dt rename=(event_dt=firstdt))
          maximums (keep=id event_dt rename=(event_dt=lastdt));
   by id;
run;

You start with simplicity, as well as great flexibility.  But you can't totally get around your responsibility to name the variables you want to use.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1390 views
  • 3 likes
  • 5 in conversation