Solved: Re: Collapsing observations with update

JJ_211 · Posted 10-06-2020 05:38 AM

Hi!

I have a dataset with multiple dates of admission per person. I want to split them into admissions and readmissions (defined as another admission within 90 days of discharge date of the previous admission). I used the lag function to define those then I created a count (M) for each admission and readmission, and it looks like this:

Data have:

ID admission date readmission date count (M)

1 1/1/2000 . 1

1 . 1/2/2000 1

1 1/1/2010 . 2

1 . 20/1/2010 2

1 . 1/2/2010 2

I want to collapse these observations so it looks like this:

Data want:

ID admission date readmission date count (M)

1 1/1/2000 1/2/2000 1

1 1/1/2010 20/1/2010 2

I tried doing: data want; update have(obs=0) have; by id M; run;

That works well, except that it takes the last date per M, so it looks like this:

ID admission date readmission date count (M)

1 1/1/2000 1/2/2000 1

1 1/1/2010 1/2/2010 2

Note the date bolded in this above dataset is the latest of the two dates that followed the admission date on 1/1/2010, and I want it to take the earliest (want 20/1/2010 and NOT 1/2/2010). Is there a way to do that?

Many thanks in advance!

Janet

Ksharp · Posted 10-06-2020 06:59 AM

proc sql;
select id,count,min( admission_date ) as admission_date,
min( readmission_date ) as readmission_date
from have
group by id,count;
quit;

View solution in original post

Kurt_Bremser · Posted 10-06-2020 06:37 AM

See this:

data have;
infile datalines missover;
input ID (admission_date readmission_date) (:ddmmyy10.) count;
format admission_date readmission_date yymmdd10.;
datalines;
1      1/1/2000                .                               1
1       .                            1/2/2000                  1
1      1/1/2010                 .                              2
1       .                           20/1/2010                 2 
1       .                           1/2/2010                   2
;

data want;
set have;
by id;
retain _ad _re;
if first.id
then do;
  _ad = .;
  _re = .;
end;
_ad = coalesce(_ad,admission_date);
_re = coalesce(_re,readmission_date);
if _ad and _re
then do;
  admission_date = _ad;
  readmission_date = _re;
  output;
  _ad = .;
  _re = .;
end;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

JJ_211 · Posted 10-06-2020 07:01 AM

Thank you!

Do I need to do the first part with the infile and datalines for the rest to work?

My dataset contains some millions of lines 🙂

I tried the second part of the code (from data want) and it for some reason deletes some admission dates, and still takes the latest of the readmission dates.

Janet

Kurt_Bremser · Posted 10-06-2020 07:41 AM

The DATALINES step is just there to create some example data to work with. By using a data step with DATALINES, it is extremely easy for everybody else to recreate the dataset as is, so that we are all talking about the same thing. All that is needed is copy/paste and submit.

Please supply example data in the same way and include the cases that don't come out as intended; with the data you posted, my code works, and I can only test code against usable data.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 10-06-2020 06:59 AM

proc sql;
select id,count,min( admission_date ) as admission_date,
min( readmission_date ) as readmission_date
from have
group by id,count;
quit;

JJ_211 · Posted 10-07-2020 05:00 AM

Hi Ksharp. Many thanks for the sql code! Works as it should (Note to any future user of the code: I added after proc sql; create table want as...), as without this my SAS ran forever and would end up not responding.

Ksharp, a follow up question: If I need further variables to be kept from the original dataset (have), where do I specify that within this code?

I tried adding some variables in "select" but end up with repeated lines (e.g. I also want to keep the discharge date for each admission date, and some other variables). The keep statement doesn't seem to love sql either.

I appreciate your help!

Janet

Ksharp · Posted 10-07-2020 07:11 AM

Better post some sample data to explain your question.
If the variables you want added were also GROUP variables ,need include them in GROUP BY clause.

proc sql;
select id,count, xx,
min( admission_date ) as admission_date,
min( readmission_date ) as readmission_date
from have
group by id,count , xx ;
quit;

JJ_211 · Posted 10-07-2020 07:38 AM

Here a sample:

HAVE data:

Id count sex admission discharge readmission

1 1 M 1/1/03 15/1/03 .

1 1 M . . 15/3/03

.

After suggested proc sql, I want to also include the discharge date and sex (examples, there are many other variables like death date etc). If I include sex and discharge date in the GROUP BY clause, I end up getting the same as data HAVE.

WANT data:

Id count sex admission discharge readmission

1 1 M 1/1/03 15/1/03 15/3/03

.

THANKS 🙂

Kurt_Bremser · Posted 10-07-2020 07:45 AM

Use a summary function for discharge also:

data have;
input Id $ count sex $ (admission discharge readmission) (:ddmmyy10.);
format admission discharge readmission ddmmyy10.;
datalines;
1      1          M         1/1/03         15/1/03           . 
1      1          M          .                   .                   15/3/03
;

proc sql;
create table want as
  select
    id,
    count,
    sex,
    min(admission) as admission format=ddmmyy10.,
    min(discharge) as discharge format=ddmmyy10.,
    min(readmission) as readmission format=ddmmyy10.
  from have
  group by id, count, sex
;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

JJ_211 · Posted 10-07-2020 07:56 AM

of course! That does work 🙂 Many, many thanks!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away