I have a dataset that has account change information, and I want to build a summary table that allows me to see what those changes were over time.
Current Table looks like this:
ID Date email_address
josh 1/1/2015 josh1@gmail.com
josh 1/2/2015 josh2@yahoo.com
josh 1/3/2015 josh3@yahoo.com
mary 1/1/2015 mary123@aol.com
mary 1/15/2015 mars@blah.com
I want the output to look like this:
ID Most_recent_date email_address1 email_address2 email_address3
josh 1/3/2015 josh3@yahoo.com josh2@yahoo.com josh1@gmail.com
mary 1/15/2015 mars@blah.com mary123@aol.com .
Has anyone had experience working with this?
This works for your example data
data have;
informat id $6. date mmddyy. email_address $25.;
format date mmddyy10.;
input ID Date email_address;
datalines;
josh 1/1/2015 josh1@gmail.com
josh 1/2/2015 josh2@yahoo.com
josh 1/3/2015 josh3@yahoo.com
mary 1/1/2015 mary123@aol.com
mary 1/15/2015 mars@blah.com
run;
proc sort data=have; by id date;run;
proc transpose data=have out=temp (drop=_name_ )
prefix=email_address;
by id;
Var email_address;
run;
proc sql;
create table want as
select a.*, b.Most_recent_date
from temp as a left join
(select id, max(date) as Most_recent_date format=mmddyy10. from have
group by id) as b
on a.id = b.id;
quit;
Minor trickyness involved because of wanting the max date and order of naming the emails.
Long to wide via Proc transpose or an array.
I swear this gets asked multiple times a day here 🙂
Search here, google, or someone else will probably answer it.
If you don't know the max amount of email addresses per person, proc transpose is more dynamic.
Knowing where to look is half the battle sometimes (:
Appreciate the input, and I'll take a look at Proc Transpose.
Jason
This works for your example data
data have;
informat id $6. date mmddyy. email_address $25.;
format date mmddyy10.;
input ID Date email_address;
datalines;
josh 1/1/2015 josh1@gmail.com
josh 1/2/2015 josh2@yahoo.com
josh 1/3/2015 josh3@yahoo.com
mary 1/1/2015 mary123@aol.com
mary 1/15/2015 mars@blah.com
run;
proc sort data=have; by id date;run;
proc transpose data=have out=temp (drop=_name_ )
prefix=email_address;
by id;
Var email_address;
run;
proc sql;
create table want as
select a.*, b.Most_recent_date
from temp as a left join
(select id, max(date) as Most_recent_date format=mmddyy10. from have
group by id) as b
on a.id = b.id;
quit;
Minor trickyness involved because of wanting the max date and order of naming the emails.
This is literally EXACTLY the correct solution.
I'd gone down a dark path and tried to use:
data want;
set have;
by id;
if have.id then seq_id=0;
seq_id+1;
run;
... and then build out a separate table for each one of the variables, but this Proc Transpose example you've provided is fantastic.
Thank you so much.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.