I have a dataset that has variables ID, Date, and Value. For each ID that has more than one Value, I want to output the earliest observation into a new column 'First', and the latest observation into a new column 'Last'. For IDs that only have one Value, I want the observation to be ignored. The final aim is to do a scatter plot of 'First' vs 'Last'.
Any help would be appreciated. Preferably using the UI in SAS EG, but if that isn't possible then a script will do as well.
proc sql;
create table one as
select * from have
order by id, date, value;
quit;
data want (keep=id earliest last) ;
set one;
by id;
if first.id and last.id then delete;
retain earliest;
if first.id then earliest=value;
if last.id then last=value;
if last.id;
run;
First, you need to sort the dataset by ID and Date.
Then you do
data want (keep=ID first_date last_date first_value last_value);
set have;
by ID;
retain first_date last_date first_value last_value counter;
if first.ID
then do;
counter = 0;
first_date = Date;
first_value = Value;
end;
counter + 1;
if last.ID and counter > 1
then do;
last_date = Date;
last_value = Value;
output;
end;
run;
proc sql;
create table first as
select * from have group by id having date=min(date) and count(*)>1;
create table last as
select * from have group by id having date=max(date) and count(*)>1;
quit;
proc sql;
create table one as
select * from have
order by id, date, value;
quit;
data want (keep=id earliest last) ;
set one;
by id;
if first.id and last.id then delete;
retain earliest;
if first.id then earliest=value;
if last.id then last=value;
if last.id;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.