I used array to create a wide dataset from the long one using following statement:
data hospb;
set hospa;
array oldfirst(4)fst_dt1-fst_dt4;
array oldlast(4)lst_dt1-lst_dt4;
array hospit(4) hosp_1-hosp_4;
retain fst_dt1-fst_dt4 lst_dt1-lst_dt4 hosp_1-hosp_4;
*array oldlast(*)lst_dt;
*array new(*) admdt;
by id;
if first.id then do;
do i=1 to 4;
oldfirst(i)=.;
oldlast(i)=.;
hospit(i)=.;
end;
end;
oldfirst(count)=fst_dt;
oldlast(count)=lst_dt;
hospit(count)=hosp;
if last.id then output;
run;
THis is the how the data looks like
Obs
id fst_dt1 fst_dt2 fst_dt3 fst_dt4 lst_dt1 lst_dt2 lst_dt3 lst_dt4 hosp_1 hosp_2 hosp_3 hosp_4
1 1 01JAN2010 02JAN2010 03JAN2010 05FEB2011 02JAN2010 03JAN2010 05JAN2010 06FEB2011 1 1 1 0
2 2 03FEB2011 09FEB2011 . . 07FEB2011 09FEB2011 . . 1 0 . .
3 3 03FEB2011 04MAR2011 . . 03FEB2011 05MAR2011 . . 0 1 . .
What I need now is to look for some patterns: if hospitalization is =0 and then next one is =1 then that date is assigned to the admission date.
Likewise where 1 is followed by 0 it is assigned to discharge date.
I was pretty successful with that step using following program:
array admissiondt(4) fst_dt1-fst_dt4;
array dichargedt(4) lst_dt1-lst_dt4;
array hospitA (4) hosp_1-hosp_4;
do i =1 to 3;
if hospitA=0 and hospitA[i+1]=1 then adm_dt=admissiondt(i);
if hospitA=1 and hospitA[i+1]=0 then disch_dt=dichargedt(i);
end;
but where I am running into problem is assigning admission date to the first record if there are more than one record with hosp=1.
For the record 1 I am getting admission rate assigned to the third observation.
Any help is appreciated.
You have to tell us more. I'm not sure what you mean by "if there are more than one record with hosp=1".
From your example it looks like there is only one record per id and the code appears to be assigning values correctly according to your specifications. When I ran your code, the first record only got disch_dt assigned and it was assigned, per specs, to the 3rd observation.
In short, we must be missing something. Rather than provide how you created the file, it would help to see a datastep with the data you are now working with, along with the full code (rather than just a snippet) showing how you are trying to modify it.
Thank you for your reply.
Here is the datastep for the original dataset:
data
hosp;
INPUT id pos fst_dt mmddyy11. lst_dt mmddyy11.
drg;
datalines
;
001 2 01/01/2010 01/02/2010 001
001 3 01/02/2010 01/03/2010 001
001 2 01/03/2010 01/05/2010 002
001 2 02/05/2011 02/06/2011 .
002 3 02/03/2011 02/07/2011 003
002 4 02/09/2011 02/09/2011 .
003 4 02/03/2011 02/03/2011 .
003 1 03/04/2011 03/05/2011 003
;
run;
Then I created hospitalization flag.
data want;
set hosp;
format fst_dt lst_dt admdt dichdt date9.;
array hospitalization(*)hosp;
array newq(*)dichdt;
array oldfirst(*)fst_dt;
array oldlast(*)lst_dt;
array new(*)admdt;
do i=1 to dim(new);
if pos in (1 2 3) and drg ^=. then hospitalization(i)=1;
else hospitalization(i)=0;
retain hosp;
end;
proc print;
format fst_dt lst_dt admdt date9.;
run;
run;
When the data is transposed using arrays (in my original post), the first person will have four records, where the first three hosp=1.
when I add
" if hospitA=1 and hospitA[i+1]=1 then adm_dt=admissiondt(i); "
I am getting the third date instead of first one.
Hopefully I made myself clear?
Need to see a complete set of code that runs. In the code you initially posted, to do the transposition, it uses at least one variable (count) that doesn't exist.
Sorry about this, I am new to this community.
Here is the entire code
data hosp;
INPUT id pos fst_dt mmddyy11. lst_dt mmddyy11. drg;
datalines ;
001 2 01/01/2010 01/02/2010 001
001 3 01/02/2010 01/03/2010 001
001 2 01/03/2010 01/05/2010 002
001 2 02/05/2011 02/06/2011 .
002 3 02/03/2011 02/07/2011 003
002 4 02/09/2011 02/09/2011 .
003 4 02/03/2011 02/03/2011 .
003 1 03/04/2011 03/05/2011 003
;
run;
data want;
set hosp;
format fst_dt lst_dt admdt dichdt date9.;
array hospitalization(*)hosp;
array newq(*)dichdt;
array oldfirst(*)fst_dt;
array oldlast(*)lst_dt;
array new(*)admdt;
do i=1 to dim(new);
if pos in (1 2 3) and drg ^=. then hospitalization(i)=1;
else hospitalization(i)=0;
retain hosp;
end;
proc print;
format fst_dt lst_dt admdt date9.;
run;
run;
data hospa;
set want;
retain count;
by id;
if first.id then count = 0;
count = count + 1;
proc print;
format fst_dt lst_dt admdt date9.;
run;
data hospb;
set hospa;
array oldfirst(4)fst_dt1-fst_dt4;
array oldlast(4)lst_dt1-lst_dt4;
array hospit(4) hosp_1-hosp_4;
retain fst_dt1-fst_dt4 lst_dt1-lst_dt4 hosp_1-hosp_4;
by id;
if first.id then do;
do i=1 to 4;
oldfirst(i)=.;
oldlast(i)=.;
hospit(i)=.;
end;
end;
oldfirst(count)=fst_dt;
oldlast(count)=lst_dt;
hospit(count)=hosp;
if last.id then output;
run;
data adm_dich;
set hospb;
array admissiondt(4) fst_dt1-fst_dt4;
array dichargedt(4) lst_dt1-lst_dt4;
array hospitA (4) hosp_1-hosp_4;
do i =1 to 3;
if hospitA=1 and hospitA[i+1]=1 then adm_dt=admissiondt(i);
if hospitA=0 and hospitA[i+1]=1 then adm_dt=admissiondt(i);
if hospitA=1 and hospitA[i+1]=0 then disch_dt=dichargedt(i);
end;
proc print;
format fst_dt1-fst_dt4 lst_dt1-lst_dt4 date0 adm_dt disch_dt date9.;
run;
First, welcome to the forum. Most of us are just SAS users like you and it really helps to see the whole problem up front.
You have conflicting assignments. Why not just have 3 adm and disch dates? BTW, I added else conditions to your if statements. Doesn't affect the logic, but does speed up performance a bit. E.g.,
data adm_dich;
set hospb;
array admissiondt(4) fst_dt1-fst_dt4;
array dichargedt(4) lst_dt1-lst_dt4;
array hospitA (4) hosp_1-hosp_4;
array adm_dt(3);
array disch_dt(3);
format adm_dt: disch_dt: date9.;
do i =1 to 3;
if hospitA=1 and hospitA[i+1]=1 then adm_dt(i)=admissiondt(i);
else if hospitA=0 and hospitA[i+1]=1 then adm_dt(i)=admissiondt(i);
else if hospitA=1 and hospitA[i+1]=0 then disch_dt(i)=dichargedt(i);
end;
run;
proc print;
format fst_dt1-fst_dt4 lst_dt1-lst_dt4 date0 adm_dt disch_dt date9.;
run;
I just noticed that you only wanted the first one. Do you only want the first one of either adm or discharge, or the first of both. The following gets the first of both:
data adm_dich;
set hospb;
array admissiondt(4) fst_dt1-fst_dt4;
array dichargedt(4) lst_dt1-lst_dt4;
array hospitA (4) hosp_1-hosp_4;
format adm_dt: disch_dt: date9.;
do i =1 to 3;
if missing(adm_dt) then do;
if hospitA=1 and hospitA[i+1]=1 then adm_dt=admissiondt(i);
else if hospitA=0 and hospitA[i+1]=1 then adm_dt=admissiondt(i);
end;
if missing(disch_dt) then do;
if hospitA=1 and hospitA[i+1]=0 then disch_dt=dichargedt(i);
end;
end;
run;
Great, that actually was my next question. I really appreciate your help.
Great thanks to you for all you help. I do have another quick question. In case where 1 is preceded by 0 I need to assign the second date to the admission date, but I think having it set to missing prevent it from doing so.
Below is the updated code for your review.
data adm_dich;
set hospb;
array admissiondt(4) fst_dt1-fst_dt4;
array dichargedt(4) lst_dt1-lst_dt4;
array hospitA (4) hosp_1-hosp_4;
format adm_dt: disch_dt: date9.;
do i =1 to 3;
if missing(adm_dt) then do;
if hospitA=0 and hospitA[i+1]=1 then adm_dt=admissiondt(i);
else if hospitA=1 and hospitA[i+1]=1 then adm_dt=admissiondt(i);
else if hospitA=1 and hospitA[i+1]=0 then adm_dt=admissiondt(i);
end;
if missing(disch_dt) then do;
if hospitA=1 and hospitA[i+1]=0 then disch_dt=dichargedt(i);
else if hospitA=1 and hospitA[i+1]=. then disch_dt=dichargedt(i);
end;
end;
run;
proc print;
format fst_dt1-fst_dt4 lst_dt1-lst_dt4 date0 adm_dt disch_dt date9.;
run;
I'm not sure what you are asking. Your question concerned a 0 then 1 combination, but your added code is looking for a 1 then missing combo. You also said that you wanted the second date, but never asked for dischargedt(i+1).
Your statements could be combined:
if hospitA=1 and (hospitA[i+1]=0 or missing(hospitA[i+1])) then disch_dt=dichargedt(i);
However, again, I'm not sure what you are asking.
Thank you for you continuing support. Everything seems to be working great, except the case for admission date where 0 is followed by 1.
if hospitA=0 and hospitA[i+1]=1 then adm_dt=admissiondt(i);
In this instance admission date is assigned to the one corresponding to 0 and not to 1.
I replied to that yesterday. If you want the 2nd date to be assigned then you have to state it in your code. i.e.:
if hospitA=0 and hospitA[i+1]=1 then adm_dt=admissiondt(i+1);
If that isn't what you mean, then I don't understand what you are asking.
Thanks, I will try that.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.