In sas 9.4 i have a dataset looking something like this:
id | startdate | enddate | indate | outdate | owner |
498106 | 22.01.2019 | 09.11.2019 | 29.01.2018 | . | 803 |
498106 | 22.01.2019 | 09.11.2019 | 01.01.2019 | 26.05.2023 | 584 |
499547 | 23.01.2019 | 09.11.2019 | 13.02.2018 | . | 803 |
499547 | 23.01.2019 | 09.11.2019 | 01.01.2019 | . | 584 |
where . is missing outdate.
I want to manipulate this in two ways:
1. In cases where the outdate is missing, and it is not the last id record, I want to use the indate for the record after as the outdate.
2. In cases where the the outdate is missing, and it is the last record for that id, i want to put 09.11.2023.
The code should produce this dataset:
animal_id | startdate | enddate | indate | outdate | owner |
498106 | 22.01.2019 | 09.11.2019 | 29.01.2018 | 01.01.2019 | 803 |
498106 | 22.01.2019 | 09.11.2019 | 01.01.2019 | 26.05.2023 | 584 |
499547 | 23.01.2019 | 09.11.2019 | 13.02.2018 | 01.01.2019 | 803 |
499547 | 23.01.2019 | 09.11.2019 | 01.01.2019 | 09.11.2023 | 584 |
Note that id's can have more that two rows each in the real dataset.
So far I have this code,
proc sort data=have; by id indate outdate; run;
data want;
set have;
by id indate outdate;
if id = lag(id) and lag(outdate) = . then outdate2 = indate;
if last.id and outdate = . then outdate = 23323;
run;
which produce the acquired results for problem 2, and store the correct indate for use in problem 1, but I do not know how to insert it in the row above.
Any help is appreciated.
We'll need to use the LAG function to get values from other rows. Because LAG only accesses values from previous rows of data, I'll sort the data in reverse chronological order for processing. Here's the code:
data have;
infile datalines dsd dlm='|';
input id startdate:ddmmyy10. enddate:ddmmyy10.
indate:ddmmyy10. outdate:ddmmyy10. owner;
format startdate -- outdate ddmmyyp10.;
datalines;
498106|22.01.2019|09.11.2019|29.01.2018|.|803
498106|22.01.2019|09.11.2019|01.01.2019|26.05.2023|584
499547|23.01.2019|09.11.2019|13.02.2018|.|803
499547|23.01.2019|09.11.2019|01.01.2019|.|584
499547|23.01.2020|09.11.2020|13.02.2010|.|803
499547|23.01.2021|09.11.2021|01.01.2021|.|584
;
title "Have";
proc print data=have;
run;
/* LAG() only reads previous records, so first sort in reverse chronological order */
proc sort data=have;
by id descending indate descending outdate;
run;
data want;
set have;
by id descending indate descending outdate;
/* The LAG() function must be executed for every row, even if we don't need every value */
_prevInDate=lag1(indate);
/* If this is the first record for the ID and outdata is missing, set 11SEP2023*/
if first.id and missing(outdate) then outdate='11SEP2023'd;
/* Otherwise, if outdata is missing, use the previous indate */
else if missing(outdate) then outdate=_prevInDate;
drop _:;
run;
/* Sort the data back into chronological order */
proc sort data=want;
by id indate outdate;
run;
title "Want";
proc print data=want;
run;
And here is the result:
Obs | id | startdate | enddate | indate | outdate | owner |
---|---|---|---|---|---|---|
1 | 498106 | 22.01.2019 | 09.11.2019 | 29.01.2018 | . | 803 |
2 | 498106 | 22.01.2019 | 09.11.2019 | 01.01.2019 | 26.05.2023 | 584 |
3 | 499547 | 23.01.2019 | 09.11.2019 | 13.02.2018 | . | 803 |
4 | 499547 | 23.01.2019 | 09.11.2019 | 01.01.2019 | . | 584 |
5 | 499547 | 23.01.2020 | 09.11.2020 | 13.02.2010 | . | 803 |
6 | 499547 | 23.01.2021 | 09.11.2021 | 01.01.2021 | . | 584 |
Obs | id | startdate | enddate | indate | outdate | owner |
---|---|---|---|---|---|---|
1 | 498106 | 22.01.2019 | 09.11.2019 | 29.01.2018 | 01.01.2019 | 803 |
2 | 498106 | 22.01.2019 | 09.11.2019 | 01.01.2019 | 26.05.2023 | 584 |
3 | 499547 | 23.01.2020 | 09.11.2020 | 13.02.2010 | 13.02.2018 | 803 |
4 | 499547 | 23.01.2019 | 09.11.2019 | 13.02.2018 | 01.01.2019 | 803 |
5 | 499547 | 23.01.2019 | 09.11.2019 | 01.01.2019 | 01.01.2021 | 584 |
6 | 499547 | 23.01.2021 | 09.11.2021 | 01.01.2021 | 11.09.2023 | 584 |
We'll need to use the LAG function to get values from other rows. Because LAG only accesses values from previous rows of data, I'll sort the data in reverse chronological order for processing. Here's the code:
data have;
infile datalines dsd dlm='|';
input id startdate:ddmmyy10. enddate:ddmmyy10.
indate:ddmmyy10. outdate:ddmmyy10. owner;
format startdate -- outdate ddmmyyp10.;
datalines;
498106|22.01.2019|09.11.2019|29.01.2018|.|803
498106|22.01.2019|09.11.2019|01.01.2019|26.05.2023|584
499547|23.01.2019|09.11.2019|13.02.2018|.|803
499547|23.01.2019|09.11.2019|01.01.2019|.|584
499547|23.01.2020|09.11.2020|13.02.2010|.|803
499547|23.01.2021|09.11.2021|01.01.2021|.|584
;
title "Have";
proc print data=have;
run;
/* LAG() only reads previous records, so first sort in reverse chronological order */
proc sort data=have;
by id descending indate descending outdate;
run;
data want;
set have;
by id descending indate descending outdate;
/* The LAG() function must be executed for every row, even if we don't need every value */
_prevInDate=lag1(indate);
/* If this is the first record for the ID and outdata is missing, set 11SEP2023*/
if first.id and missing(outdate) then outdate='11SEP2023'd;
/* Otherwise, if outdata is missing, use the previous indate */
else if missing(outdate) then outdate=_prevInDate;
drop _:;
run;
/* Sort the data back into chronological order */
proc sort data=want;
by id indate outdate;
run;
title "Want";
proc print data=want;
run;
And here is the result:
Obs | id | startdate | enddate | indate | outdate | owner |
---|---|---|---|---|---|---|
1 | 498106 | 22.01.2019 | 09.11.2019 | 29.01.2018 | . | 803 |
2 | 498106 | 22.01.2019 | 09.11.2019 | 01.01.2019 | 26.05.2023 | 584 |
3 | 499547 | 23.01.2019 | 09.11.2019 | 13.02.2018 | . | 803 |
4 | 499547 | 23.01.2019 | 09.11.2019 | 01.01.2019 | . | 584 |
5 | 499547 | 23.01.2020 | 09.11.2020 | 13.02.2010 | . | 803 |
6 | 499547 | 23.01.2021 | 09.11.2021 | 01.01.2021 | . | 584 |
Obs | id | startdate | enddate | indate | outdate | owner |
---|---|---|---|---|---|---|
1 | 498106 | 22.01.2019 | 09.11.2019 | 29.01.2018 | 01.01.2019 | 803 |
2 | 498106 | 22.01.2019 | 09.11.2019 | 01.01.2019 | 26.05.2023 | 584 |
3 | 499547 | 23.01.2020 | 09.11.2020 | 13.02.2010 | 13.02.2018 | 803 |
4 | 499547 | 23.01.2019 | 09.11.2019 | 13.02.2018 | 01.01.2019 | 803 |
5 | 499547 | 23.01.2019 | 09.11.2019 | 01.01.2019 | 01.01.2021 | 584 |
6 | 499547 | 23.01.2021 | 09.11.2021 | 01.01.2021 | 11.09.2023 | 584 |
I'm a lazy programmer - I hate to type more than necessary. So whenever I create a variable I intend to drop, I start the name with an underscore. Then, I can use simple name prefix list syntax to drop them all:
/* This statement drops all variables that have a name beginning with '_' */
drop _:;
It's an old habit that I still find useful.
May the SAS be with you!
Mark
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.