Hello,
Please how can I keep the first_date and last date on the same row. Here is my code but didn't work. Thanks
data have;
input ID $6. event_dt First_dt Last_dt ;
format event_dt First_dt Last_dt date9.;
informat event_dt First_dt Last_dt date9.;
datalines;
011396 14SEP2023 . 14SEP2023
011396 02AUG2023 . .
011396 18AUG2021 18AUG2021 .
011396 23JUN2023 . .
034627 01DEC2009 01DEC2009 .
034627 24JUN2023 . 24JUN2023
034627 11DEC2019 . .
011427 11SEP2010 . .
011427 09AUG2010 09AUG2010 .
011427 11SEP2012 . .
011427 11JUN2013 . 11JUN2013
;
run;
proc sort data=have; by id event_dt;run;
data want;
set have;
by id;
format event_dt First_dt Last_dt date9.;
if First_dt ne . or Last_dt ne . then last_date=Last_dt;
run;
Desired output
ID | event_dt | First_dt | Last_date |
011396 | 18AUG2021 | 18AUG2021 | 14SEP2023 |
034627 | 01DEC2009 | 01DEC2009 | 24JUN2023 |
011427 | 09AUG2010 | 09AUG2010 | 11JUN2013 |
This is a job for PROC SUMMARY
proc summary data=have nway;
class id;
var event_dt first_dt last_dt;
output out=want min(event_dt first_dt)= max(last_dt)=;
run;
@PaigeMiller Thanks alot this works. I have alot of variables in the var statement. I want to keep all so I used the _all_ but didn't work. Any suggestions so I don't have to type out all the variables?
Show us the code you used, and show us what didn't work (if there are errors, show us the ENTIRE log from PROC SUMMARY; if you get unexpected output show us the unexpected output and then explain what the correct output should be). Please provide a data set that better represents the actual problem.
@PaigeMiller I mean how can I add other variables in the var statement without typing all the variables. When I used _all_ this is the error I got.
proc summary data=have nway;
class id;
var _all_;
output out=want min(event_dt first_dt)= max(last_dt)=;
run;
ERROR: Variable LAST_NAME in list does not match type prescribed for this list.
ERROR: Variable FIRST_NAME in list does not match type prescribed for this list.
ERROR: Variable DISPOSITION in list does not match type prescribed for this list.
ERROR: Variable dx1 in list does not match type prescribed for this list.
ERROR: Variable dx2 in list does not match type prescribed for this list.
ERROR: Variable dx3 in list does not match type prescribed for this list.
ERROR: Variable dx4 in list does not match type prescribed for this list.
ERROR: Variable dx5 in list does not match type prescribed for this list.
ERROR: Variable dx6 in list does not match type prescribed for this list.
ERROR: Variable dx7 in list does not match type prescribed for this list.
ERROR: Variable dx8 in list does not match type prescribed for this list.
ERROR: Variable dx9 in list does not match type prescribed for this list.
ERROR: Variable source in list does not match type prescribed for this list.
You have switched from numeric variables to character variables. In that case, PROC SUMMARY will not get the job done. Please provide a realistic data set that illustrates the problem, and also show us the desired output.
Variables on the VAR statement in Proc Summary/Means (and quite a few other procedures) must be numeric.
If you want all of the numeric variables on the Var statement use the _NUMERIC_ list key word:
proc summary data=have nway; class id; var _numeric_; output out=want min(event_dt first_dt)= max(last_dt)=; run;
Do also note that any numeric variable without specific statistics listed will not actually appear in the output data set when one or more other variables has statistics specified. So I'm not sure you are going to save much typing as the OUTPUT statement will be quite a bit longer. So some specifics of what you are actually expecting to see
I cannot make much sense of this original request. And the update to also use character variables totally eliminates any possibility of understanding what you want.
Is the goal to collapse multiple observations into one observations?
If so then it sounds like you want to convert each varaible into two new variables. So if you start with 2 original variables, say DATE and DX then you want to collapse to one observations with four variables that replace the original two variables. And which ones are the "first" and which are the "last"? Do you mean the multiple observations are already sorted and you want to keep only the first observation and the last observation for each BY group? If that is case BY group processing sounds like the way to go. Perhaps something like this?
data first last ;
set have;
by id date ;
if first.id then output first;
else if last.id then output last;
run;
data want;
merge first(rename=(dx=first_dx date=first_date))
last(rename=(dx=last_dx date=last_date))
;
by id;
run;
So if we have data like this:
data have;
input id date :date. dx :$10. ;
format date date9.;
cards;
1 01jan2020 ABC
1 01feb2020 DEF
1 01mar2020 XYZ
2 01apr2020 DX1
3 01jun2020 DX2
3 02jun2020 dx3
;
You will get this result:
You could do something similar with PROC SUMMARY.
proc summary data=have;
by id;
var date ;
output out=want(drop=_type_ _freq_)
idgroup (min(date) out(date dx)=first_date first_dx)
idgroup (max(date) out(date dx)=last_date last_dx)
;
run;
But for the groups that have only one observation its values will appear as both the first and the last.
Here's a variation that gives you all the data you might ever want ... but lets you postpone the typing of all the variable names until you know the numbers you want. There is no way to avoid naming the variables you want however. Apologies for stealing bits of code here and there from earlier posts.
proc summary data=have nway;
class id;
var _numeric_;
output out=minimums min=;
output out=maxiimums max=;
run;
This gives you two new data sets (I'll describe just one). MINIMUMS contains one observation per ID, with the smallest value of each numeric variable. You don't have to name any of them, because this program re-uses the same variable name. In MINIMUMS, there is one observation per ID, and EVENT_DT holds the minimum value of EVENT_DT from the original data. So the same variable name gets re-used without any typing or inventing names on your part.
You can always decide later which of the output pieces you want to use in your report. The price you pay for this simplicity is on the back end when it is more complex to select the pieces that you want. Here are the hoops you might have to jump through when using the data:
data want;
merge have
minimums (keep=id event_dt rename=(event_dt=firstdt))
maximums (keep=id event_dt rename=(event_dt=lastdt));
by id;
run;
You start with simplicity, as well as great flexibility. But you can't totally get around your responsibility to name the variables you want to use.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.