BookmarkSubscribeRSS Feed
Sam20001
Fluorite | Level 6

Hello,

I have spent a few days reading articles about how to reshape data but cannot seem to figure out what to do. My dataset consists of emergency and hospital data with >18,000 observations. Variables: studyID, visit_type (ED or Hospital visit), visitdatetime (date of visit in datetime format). I also want to create indicator variables between every 2 visits to determine whether the ED visit precedes the hospital visit (0 =No, 1 = Yes).  Some subjects had more than 100 visits over the course of 3 years (2018-2021). The data are sorted by studyID and visitdatetime because I need to reshape it so that visitdatetimes are in ascending order for each studyID. This requirement has me confused, so that's why I've turned to this community for help.  Thank you.

Sam20001_1-1659648262645.png

 

Sam20001_2-1659649044513.png

 

 

 

 

 

 

10 REPLIES 10
Reeza
Super User
Is this your final goal, or is this something you think you need to get to a final result?
What's the final summary you're looking for? This data structure is possible but cumbersome, I suspect there's an easier to get what you want.
Sam20001
Fluorite | Level 6

Thank you for your reply @Reeza . My goal is to exclude an ED visit if it's within 48 hours, 7 days, or 14 days before a hospitalization for each subject. I think I might be able to create the 3 additional variables (within_48hr, within_7, within_14) after I get the data in the right format. I have reinserted photos of the data because I can't see them in my original post. Sorry I'm very new to this forum. I appreciate you help.  

 

Data Have SAS Forum .pngData Wanted SAS Forum .png

Reeza
Super User

@Sam20001 wrote:

Thank you for your reply @Reeza . My goal is to exclude an ED visit if it's within 48 hours, 7 days, or 14 days before a hospitalization for each subject. 

 

 


That's much easier to do via SQL than this approach. Please repost your data as text and we can help you get that coded. No need to flip it to that structure. 

Sam20001
Fluorite | Level 6

Thank you @Reeza . I have included text data below.  

 

Current data (sorted by studyID and visitdatetime)

studyID          visit_type          visitdatetime

1                     ED                    05-22-2018 10:10:00

1                     Hospital            06-27-2019 11:00:19

2                     Hospital            01-16-2020 12:17:34

2                     ED                    06-24-2020 08:14:36

3                     ED                    09-24-2020 09:30:41

3                     Hospital            09-24-2020 11:47:12

3                     ED                    01-10-2021 12:00:01

4                     ED                    07-16-2019  04:45:36

4                     ED                   12-22-2020 08:17:40

5                     Hospital           06-19-2021 09:12:22

 

ballardw
Super User

" want to create indicator variables between every 2 visits" should likely be clarified.

Do you mean every sequential pair?

Other wise if you have 6 visits then you are looking a 5*4*3*2*1 combinations to get "every pair". Which by the time you play with 100 visits is an extremely large number of combinations (look up (n-1)! , ! meaning factorial  Hint: 50! is on the order of 3.0414093E64 )

 

You might consider exactly what you are saying with that indicator variable. If you have a Hospital record that has 0 for the indicator then you say the Hospital visit does NOT follow an ED visit when in effect it does. What about 2 ED visits followed by Hospital? Would only the second indicator for the ED be set to 1? What if both of the ED visits are on the same date?

 

Can you provide an example of the type(s) of reports you expect to write with the new data set?

 

You can sequentially process data and look at previous values using either the LAG functions or Retain values from record to record.

Sam20001
Fluorite | Level 6

Hi @ballardw . Yes. I mean every sequential pair. The indicator variable suppose to identify an ED visit that comes before a hospitalization (1 if the ED comes before the hospital, 0 if the ED visit doesn't come before the hospital visit. I think I entered them backwards). 

"What about 2 ED visits followed by Hospital? Would only the second indicator for the ED be set to 1?" You are correct. If a subject has 2 ED visits followed by hospital then only the second indicator would be set to 1. 

 

"What if both of the ED visits are on the same date?" The dataset is ordered by studyID and visitdatetime. If the visitdatetime of the ED is before the visitdatetime of the hospital then the indicator would be set to 1. 

 

Thanks for recommending the lag function. I actually tried using lag in the long file so that visitdatetimes of ED and hospital visits would be on the same row for each subject. I thought it worked until my supervisor reviewed the data and stated that the lag function in some cases filled in a visitdatetime from the previous record when it was for a different subject. I have not tried the retain function. I hope this makes sense. 

Reeza
Super User
PS. If you do need help with programming, its very helpful to post your data as text not images. To work with images we have to either type it in or pass it through a text extractor which can always be problematic and more work.
Ksharp
Super User
data have;
input studyID          visit_type  $         visitdatetime $40.;
cards;
1                     ED                    05-22-2018 10:10:00
1                     Hospital            06-27-2019 11:00:19
2                     Hospital            01-16-2020 12:17:34
2                     ED                    06-24-2020 08:14:36
3                     ED                    09-24-2020 09:30:41
3                     Hospital            09-24-2020 11:47:12
3                     ED                    01-10-2021 12:00:01
3                     Hospital            09-24-2022 11:47:12
4                     ED                    07-16-2019  04:45:36
4                     ED                   12-22-2020 08:17:40
5                     Hospital           06-19-2021 09:12:22
;

proc sql noprint;
select max(n) into :n separated by ' '
 from (select count(*) as n from have group by studyID);

select ceil(max(n)/2) into :n2 separated by ' '
 from (select count(*) as n from have group by studyID);
quit;

proc summary data=have ;
by studyID;
output out=temp(drop=_:) idgroup(out[&n.] (visit_type visitdatetime)=);
run;

data want;
 set temp;
 array x{*} visit_type:;
 array y{*} ED_before_Hosp1-ED_before_Hosp&n2. ;
 n=0;
 do i=1 to dim(x) by 2;
   n+1;
   if i+1<=dim(x) then do;
     if x{i}='ED' and x{i+1}='Hospital' then y{n}=1;
       else  y{n}=0;
   end;
 end;
 drop n i;
run;
Sam20001
Fluorite | Level 6

Thank you very much for your help @Ksharp . However when I run the PROC Summary portion of the code, I get this error the following error message: "the parameter value 206 is not within the required range of 1 and 100." The full log is below.  Is there another option? Thank you. 

proc sql noprint;
select max(n) into :n separated by ' '
from (select count(*) as n from test group by studyID);

select ceil(max(n)/2) into :n2 separated by ' '
from (select count(*) as n from test group by studyID);
quit;

proc summary data=test;
by studyID;
output out=temp(drop=_:) idgroup(out[&n.] (visit_type visitdatetime)=);
run;

/*Log*/
137 proc sql noprint;
138 select max(n) into :n separated by ' '
139 from (select count(*) as n from test group by studyID);
140 select ceil(max(n)/2) into :n2 separated by ' '
141 from (select count(*) as n from test group by studyID);
142 quit;
/*NOTE: PROCEDURE SQL used (Total process time):*/
/* real time 28.14 seconds*/
/* cpu time 0.10 seconds*/

143 proc summary data=test ;
144 by studyID;
145 output out=temp(drop=_:) idgroup(out[&n.] (visit_type visitdatetime)=);
SYMBOLGEN: Macro variable N resolves to 206
ERROR: The parameter value 206 is not within the required range of 1 and 100.
146 run;

 

Ksharp
Super User

Sure. Of course.

 

data have;
input studyID          visit_type  $         visitdatetime $40.;
cards;
1                     ED                    05-22-2018 10:10:00
1                     Hospital            06-27-2019 11:00:19
2                     Hospital            01-16-2020 12:17:34
2                     ED                    06-24-2020 08:14:36
3                     ED                    09-24-2020 09:30:41
3                     Hospital            09-24-2020 11:47:12
3                     ED                    01-10-2021 12:00:01
3                     Hospital            09-24-2022 11:47:12
4                     ED                    07-16-2019  04:45:36
4                     ED                   12-22-2020 08:17:40
5                     Hospital           06-19-2021 09:12:22
;




data have;
 set have;
 by studyID;
 if first.studyID then n=0;
 n+1;
run;

proc sql noprint;
select max(n) into :n separated by ' '
 from (select count(*) as n from have group by studyID);

select ceil(max(n)/2) into :n2 separated by ' '
 from (select count(*) as n from have group by studyID);

select distinct catt('have(where=(n=',n,') 
 rename=(visit_type=visit_type_',n,' visitdatetime=visitdatetime_',n,'))')
into :merge separated by ' ' 
 from have ;
quit;

data temp;
 merge &merge.;
 by studyID;
 drop n;
run;


data want;
 set temp;
 array x{*} visit_type:;
 array y{*} ED_before_Hosp1-ED_before_Hosp&n2. ;
 n=0;
 do i=1 to dim(x) by 2;
   n+1;
   if i+1<=dim(x) then do;
     if x{i}='ED' and x{i+1}='Hospital' then y{n}=1;
       else  y{n}=0;
   end;
 end;
 drop n i;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 967 views
  • 3 likes
  • 4 in conversation