Hello,
I have spent a few days reading articles about how to reshape data but cannot seem to figure out what to do. My dataset consists of emergency and hospital data with >18,000 observations. Variables: studyID, visit_type (ED or Hospital visit), visitdatetime (date of visit in datetime format). I also want to create indicator variables between every 2 visits to determine whether the ED visit precedes the hospital visit (0 =No, 1 = Yes). Some subjects had more than 100 visits over the course of 3 years (2018-2021). The data are sorted by studyID and visitdatetime because I need to reshape it so that visitdatetimes are in ascending order for each studyID. This requirement has me confused, so that's why I've turned to this community for help. Thank you.
Thank you for your reply @Reeza . My goal is to exclude an ED visit if it's within 48 hours, 7 days, or 14 days before a hospitalization for each subject. I think I might be able to create the 3 additional variables (within_48hr, within_7, within_14) after I get the data in the right format. I have reinserted photos of the data because I can't see them in my original post. Sorry I'm very new to this forum. I appreciate you help.
@Sam20001 wrote:
Thank you for your reply @Reeza . My goal is to exclude an ED visit if it's within 48 hours, 7 days, or 14 days before a hospitalization for each subject.
That's much easier to do via SQL than this approach. Please repost your data as text and we can help you get that coded. No need to flip it to that structure.
Thank you @Reeza . I have included text data below.
Current data (sorted by studyID and visitdatetime)
studyID visit_type visitdatetime
1 ED 05-22-2018 10:10:00
1 Hospital 06-27-2019 11:00:19
2 Hospital 01-16-2020 12:17:34
2 ED 06-24-2020 08:14:36
3 ED 09-24-2020 09:30:41
3 Hospital 09-24-2020 11:47:12
3 ED 01-10-2021 12:00:01
4 ED 07-16-2019 04:45:36
4 ED 12-22-2020 08:17:40
5 Hospital 06-19-2021 09:12:22
" want to create indicator variables between every 2 visits" should likely be clarified.
Do you mean every sequential pair?
Other wise if you have 6 visits then you are looking a 5*4*3*2*1 combinations to get "every pair". Which by the time you play with 100 visits is an extremely large number of combinations (look up (n-1)! , ! meaning factorial Hint: 50! is on the order of 3.0414093E64 )
You might consider exactly what you are saying with that indicator variable. If you have a Hospital record that has 0 for the indicator then you say the Hospital visit does NOT follow an ED visit when in effect it does. What about 2 ED visits followed by Hospital? Would only the second indicator for the ED be set to 1? What if both of the ED visits are on the same date?
Can you provide an example of the type(s) of reports you expect to write with the new data set?
You can sequentially process data and look at previous values using either the LAG functions or Retain values from record to record.
Hi @ballardw . Yes. I mean every sequential pair. The indicator variable suppose to identify an ED visit that comes before a hospitalization (1 if the ED comes before the hospital, 0 if the ED visit doesn't come before the hospital visit. I think I entered them backwards).
"What about 2 ED visits followed by Hospital? Would only the second indicator for the ED be set to 1?" You are correct. If a subject has 2 ED visits followed by hospital then only the second indicator would be set to 1.
"What if both of the ED visits are on the same date?" The dataset is ordered by studyID and visitdatetime. If the visitdatetime of the ED is before the visitdatetime of the hospital then the indicator would be set to 1.
Thanks for recommending the lag function. I actually tried using lag in the long file so that visitdatetimes of ED and hospital visits would be on the same row for each subject. I thought it worked until my supervisor reviewed the data and stated that the lag function in some cases filled in a visitdatetime from the previous record when it was for a different subject. I have not tried the retain function. I hope this makes sense.
data have;
input studyID visit_type $ visitdatetime $40.;
cards;
1 ED 05-22-2018 10:10:00
1 Hospital 06-27-2019 11:00:19
2 Hospital 01-16-2020 12:17:34
2 ED 06-24-2020 08:14:36
3 ED 09-24-2020 09:30:41
3 Hospital 09-24-2020 11:47:12
3 ED 01-10-2021 12:00:01
3 Hospital 09-24-2022 11:47:12
4 ED 07-16-2019 04:45:36
4 ED 12-22-2020 08:17:40
5 Hospital 06-19-2021 09:12:22
;
proc sql noprint;
select max(n) into :n separated by ' '
from (select count(*) as n from have group by studyID);
select ceil(max(n)/2) into :n2 separated by ' '
from (select count(*) as n from have group by studyID);
quit;
proc summary data=have ;
by studyID;
output out=temp(drop=_:) idgroup(out[&n.] (visit_type visitdatetime)=);
run;
data want;
set temp;
array x{*} visit_type:;
array y{*} ED_before_Hosp1-ED_before_Hosp&n2. ;
n=0;
do i=1 to dim(x) by 2;
n+1;
if i+1<=dim(x) then do;
if x{i}='ED' and x{i+1}='Hospital' then y{n}=1;
else y{n}=0;
end;
end;
drop n i;
run;
Thank you very much for your help @Ksharp . However when I run the PROC Summary portion of the code, I get this error the following error message: "the parameter value 206 is not within the required range of 1 and 100." The full log is below. Is there another option? Thank you.
proc sql noprint;
select max(n) into :n separated by ' '
from (select count(*) as n from test group by studyID);
select ceil(max(n)/2) into :n2 separated by ' '
from (select count(*) as n from test group by studyID);
quit;
proc summary data=test;
by studyID;
output out=temp(drop=_:) idgroup(out[&n.] (visit_type visitdatetime)=);
run;
/*Log*/
137 proc sql noprint;
138 select max(n) into :n separated by ' '
139 from (select count(*) as n from test group by studyID);
140 select ceil(max(n)/2) into :n2 separated by ' '
141 from (select count(*) as n from test group by studyID);
142 quit;
/*NOTE: PROCEDURE SQL used (Total process time):*/
/* real time 28.14 seconds*/
/* cpu time 0.10 seconds*/
143 proc summary data=test ;
144 by studyID;
145 output out=temp(drop=_:) idgroup(out[&n.] (visit_type visitdatetime)=);
SYMBOLGEN: Macro variable N resolves to 206
ERROR: The parameter value 206 is not within the required range of 1 and 100.
146 run;
Sure. Of course.
data have;
input studyID visit_type $ visitdatetime $40.;
cards;
1 ED 05-22-2018 10:10:00
1 Hospital 06-27-2019 11:00:19
2 Hospital 01-16-2020 12:17:34
2 ED 06-24-2020 08:14:36
3 ED 09-24-2020 09:30:41
3 Hospital 09-24-2020 11:47:12
3 ED 01-10-2021 12:00:01
3 Hospital 09-24-2022 11:47:12
4 ED 07-16-2019 04:45:36
4 ED 12-22-2020 08:17:40
5 Hospital 06-19-2021 09:12:22
;
data have;
set have;
by studyID;
if first.studyID then n=0;
n+1;
run;
proc sql noprint;
select max(n) into :n separated by ' '
from (select count(*) as n from have group by studyID);
select ceil(max(n)/2) into :n2 separated by ' '
from (select count(*) as n from have group by studyID);
select distinct catt('have(where=(n=',n,')
rename=(visit_type=visit_type_',n,' visitdatetime=visitdatetime_',n,'))')
into :merge separated by ' '
from have ;
quit;
data temp;
merge &merge.;
by studyID;
drop n;
run;
data want;
set temp;
array x{*} visit_type:;
array y{*} ED_before_Hosp1-ED_before_Hosp&n2. ;
n=0;
do i=1 to dim(x) by 2;
n+1;
if i+1<=dim(x) then do;
if x{i}='ED' and x{i+1}='Hospital' then y{n}=1;
else y{n}=0;
end;
end;
drop n i;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.