I want to transform data that looks like as below:
and has following fields :--> patid, date on which a medical team was added/removed, type of team , and action
Input data --à
patid |
date |
Team |
action |
1111 |
jan-25-2016 |
medical |
added |
1111 |
jan-27-2016 |
surgical |
added |
1111 |
29-Jan |
medical |
removed |
1111 |
31-Jan |
obgyn |
added |
The output that i need should be in this way: so for each date i have the team in one row and for missing dates i can populate
them as below...what is the most efficient way to do this in sas base /sas ??Any help will be highly appreciated...
Thanks guys
Output Data --à
patid |
date |
team1 |
team2 |
team3 |
1111 |
jan-25-2016 |
medical |
|
|
1111 |
26-Jan |
medical |
|
|
1111 |
27-Jan |
medical |
surgical |
|
1111 |
28-Jan |
medical |
surgical |
|
1111 |
29-Jan |
surgical |
|
|
1111 |
30-Jan |
surgical |
|
|
1111 31- jan surgical obgyn
Your output doesn't look like right .
1111 |
29-Jan |
surgical |
|
|
1111 |
30-Jan |
surgical |
obgyn |
31-Jan is after 30-Jan . How can you put obgyn into 30-Jan?
data have;
input patid date : date11. Team $ action $;
format date date11. ;
cards;
1111 25-jan-2016 medical added
1111 27-jan-2016 surgical added
1111 29-Jan-2016 medical removed
1111 31-Jan-2016 obgyn added
;
run;
data temp;
merge have have(keep=patid date rename=(patid=_patid date=_date) firstobs=2);
if _n_ eq 1 then do;
declare hash h();
declare hiter hi('h');
h.definekey('team');
h.definedata('team');
h.definedone();
end;
if patid ne _patid then h.clear();
if action='added' then rc=h.add();
if action='removed' then rc=h.remove();
if patid=_patid then do;
do i=date to _date-1;
date=i;
do while(hi.next()=0);
output;
end;
end;
end;
drop rc i _: action;
run;
proc sort data=temp;by patid date Team;run;
proc transpose data=temp out=want(drop=_name_) prefix=team ;
by patid date;
var team;
run;
Do not post the same question on multiple communities.
Here is a single data step solution, assuming your input dataset is called have and your date is a SAS date:
data want;
set have; by patid;
prevDate = lag(date);
array teams{*} $12 teams1-teams5 ("");
if first.patid then do;
call missing (of teams{*});
prevDate = date;
end;
else do;
prevDate = intnx("DAY", prevDate, 1);
do while(prevDate < date);
output;
prevDate = intnx("DAY", prevDate, 1);
end;
end;
select (action);
when ("added") do;
if team not in teams then do;
do pos = 1 by 1 until(missing(teams{pos})); end;
teams{pos} = team;
end;
end;
when ("removed") do;
pos = whichc(team, of teams{*});
if pos > 0 then do;
do i = pos+1 to dim(teams);
teams{i-1} = teams{i};
end;
end;
end;
otherwise;
end;
output;
drop pos i date team action;
format prevDate yymmdd10.;
rename prevDate=date;
run;
I am Sorry it got posted 3 times by mistake as I wasn't sure which category is right to post in. Will avoid this duplicate submission in future. Thanks for the response and sharing your approach to this data management problem.
Very Innovative approach especially for handling dates part ..
Your output doesn't look like right .
1111 |
29-Jan |
surgical |
|
|
1111 |
30-Jan |
surgical |
obgyn |
31-Jan is after 30-Jan . How can you put obgyn into 30-Jan?
data have;
input patid date : date11. Team $ action $;
format date date11. ;
cards;
1111 25-jan-2016 medical added
1111 27-jan-2016 surgical added
1111 29-Jan-2016 medical removed
1111 31-Jan-2016 obgyn added
;
run;
data temp;
merge have have(keep=patid date rename=(patid=_patid date=_date) firstobs=2);
if _n_ eq 1 then do;
declare hash h();
declare hiter hi('h');
h.definekey('team');
h.definedata('team');
h.definedone();
end;
if patid ne _patid then h.clear();
if action='added' then rc=h.add();
if action='removed' then rc=h.remove();
if patid=_patid then do;
do i=date to _date-1;
date=i;
do while(hi.next()=0);
output;
end;
end;
end;
drop rc i _: action;
run;
proc sort data=temp;by patid date Team;run;
proc transpose data=temp out=want(drop=_name_) prefix=team ;
by patid date;
var team;
run;
data row is not showing up for 31st jan .i.e last row is not showing up in the output dataset... It stops at 30th jan populating that row only.. Logic is correct !!! and code is giving right results till 2nd last row only !!.
Yeah. But your output didn't show the last one .
data have;
input patid date : date11. Team $ action $;
format date date11. ;
cards;
1111 25-jan-2016 medical added
1111 27-jan-2016 surgical added
1111 29-Jan-2016 medical removed
1111 31-Jan-2016 obgyn added
;
run;
data temp;
merge have have(keep=patid date rename=(patid=_patid date=_date) firstobs=2);
if _n_ eq 1 then do;
declare hash h();
declare hiter hi('h');
h.definekey('team');
h.definedata('team');
h.definedone();
end;
if action='added' then rc=h.add();
if action='removed' then rc=h.remove();
if patid=_patid then do;
do i=date to _date-1;
date=i;
do while(hi.next()=0);
output;
end;
end;
end;
else do;
do while(hi.next()=0);
output;
end;
h.clear();
end;
drop rc i _: action;
run;
proc sort data=temp;by patid date Team;run;
proc transpose data=temp out=want(drop=_name_) prefix=team ;
by patid date;
var team;
run;
Thanks so much, this solution is very innovative . Declare and hash are so useful in sas programs..
Another interesting situation:
if data looks like this and first row has no data but want to fill it based on info from next row
Then how hash logic will change, Example as below
HAVE-->
patid date team action
1111 24-jan
1111 25-jan-2016 medical added
1111 27-jan-2016 surgical added
1111 29-Jan-2016 medical removed
1111 31-Jan-2016 surgical removed
1111 04-feb
WANT -->
patid date team1 team2
1111 24-jan medical (Populating this row based on next non missing observation from 25th jan data)
1111 25-jan medical
1111 26-jan medical
1111 27-jan medical surgical
1111 28-jan medical surgical
1111 29-jan surgical
1111 30-jan surgical
1111 31-jan
1111 01-feb
1111 02-feb
1111 03-feb
1111 04-feb
no data basically for last rows with no info on team and action, they are populated based on data from 31st jan data row ?
Thanks
Ou, No. You have so complicated logic . what if there are multi obs has missing team at the first place , what you gonna do ?
patid date team action
111 23-jan
1111 24-jan
1111 25-jan-2016 medical added
data have;
input patid date : date11. Team $ action $;
format date date11. ;
cards;
1111 24-jan-2016 . .
1111 25-jan-2016 medical added
1111 27-jan-2016 surgical added
1111 29-Jan-2016 medical removed
1111 31-Jan-2016 surgical removed
1111 04-feb-2016 . .
;
run;
data temp;
merge have have(keep=patid date team rename=(patid=_patid date=_date team=_team) firstobs=2);
if _n_ eq 1 then do;
declare hash h();
declare hiter hi('h');
h.definekey('team');
h.definedata('team');
h.definedone();
end;
if action='added' then rc=h.add();
if action='removed' then rc=h.remove();
lag_patid=lag(patid);
if patid=_patid then do;
do i=date to _date-1;
date=i;
do while(hi.next()=0);
output;
end;
if h.num_items=0 then do;
if lag_patid ne patid then do;team=_team;output;end;
else do;call missing(team);output;end;
end;
end;
end;
else do;
do while(hi.next()=0);
output;
end;
if h.num_items=0 then do;call missing(team);output;end;
h.clear();
end;
drop rc i _: action;
run;
proc sort data=temp;by patid date Team;run;
proc transpose data=temp out=want(drop=_name_) prefix=team ;
by patid date;
var team;
run;
Not making this more complicated , just trying to learn all possible scenarios in which a data problem can be addressed.
Thanks so much for sharing your approach to this problem.
Yes obgyn actually should be in 31st Jan row along with surgical. You're right. Thanks for noticing it and thanks for your useful innovative response.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.