I have a dataset (attached) with column names id, TrainingDate, Score, TestDate. Each id has a single TrainingDate and multiple TestDate with score. Some TestDates are before TrainngDate and some TestDates are after TrainingDate. I would like to keep the closest TestDate before the TraingDate and the closest TestDate after the TrainingDate.
Some fun with Hashes-
data have;
input id TrainingDate :mmddyy10. Score TestDate :mmddyy10.;
format TrainingDate mmddyy10. TestDate mmddyy10.;
cards;
1 11/21/16 3 08/26/10
1 11/21/16 3 06/27/11
1 11/21/16 7 10/20/16
1 11/21/16 3 06/13/17
1 11/21/16 2.5 08/13/17
2 04/23/18 5 11/15/10
2 04/23/18 3.3 10/26/11
2 04/23/18 13.4 10/26/11
2 04/23/18 3.3 11/28/12
2 04/23/18 11.8 11/28/12
2 04/23/18 6 12/27/17
2 04/23/18 3.6 05/09/18
2 04/23/18 30 08/10/18
2 04/23/18 2.6 01/02/19
3 10/08/18 3 06/11/11
3 10/08/18 5 05/14/18
3 10/08/18 4 06/11/18
3 10/08/18 3 08/13/18
4 12/06/16 1.4 07/01/17
4 12/06/16 4.6 11/17/17
4 12/06/16 2 07/19/18
;
data want;
if _n_=1 then do;
dcl hash h(dataset:'have(obs=0)',multidata:'y');
h.definekey('id');
h.definedata(all:'y');
h.definedone();
dcl hiter hi('h');
end;
do until(last.id);
set have;
by id;
if h.num_items =2 then continue;
if TestDate<=TrainingDate then h.replace();
else if _n_ then do;
h.add();
_n_=0;
end;
end;
do while(hi.next()=0);
output;
end;
h.clear();
run;
Hi @Barkat Since you didn't post your expected output for your input sample, please see if the following is what you are after i.e. whether my understanding of your requirement is correct-
data have;
input id TrainingDate :mmddyy10. Score TestDate :mmddyy10.;
format TrainingDate mmddyy10. TestDate mmddyy10.;
cards;
1 11/21/16 3 08/26/10
1 11/21/16 3 06/27/11
1 11/21/16 7 10/20/16
1 11/21/16 3 06/13/17
1 11/21/16 2.5 08/13/17
2 04/23/18 5 11/15/10
2 04/23/18 3.3 10/26/11
2 04/23/18 13.4 10/26/11
2 04/23/18 3.3 11/28/12
2 04/23/18 11.8 11/28/12
2 04/23/18 6 12/27/17
2 04/23/18 3.6 05/09/18
2 04/23/18 30 08/10/18
2 04/23/18 2.6 01/02/19
3 10/08/18 3 06/11/11
3 10/08/18 5 05/14/18
3 10/08/18 4 06/11/18
3 10/08/18 3 08/13/18
4 12/06/16 1.4 07/01/17
4 12/06/16 4.6 11/17/17
4 12/06/16 2 07/19/18
;
proc sql;
create table want as
select *
from have
group by id
having max(ifn(TestDate<=TrainingDate,TestDate,.))=testdate or min(ifn(TestDate>TrainingDate,TestDate,.))=testdate
order by id,TestDate;
quit;
Sort by id and testdate, then you can do a "look-ahead merge" (firstobs=2) of the dataset with itself.
For code, please supply example data in usable form (as code in a data step with datalines).
Some fun with Hashes-
data have;
input id TrainingDate :mmddyy10. Score TestDate :mmddyy10.;
format TrainingDate mmddyy10. TestDate mmddyy10.;
cards;
1 11/21/16 3 08/26/10
1 11/21/16 3 06/27/11
1 11/21/16 7 10/20/16
1 11/21/16 3 06/13/17
1 11/21/16 2.5 08/13/17
2 04/23/18 5 11/15/10
2 04/23/18 3.3 10/26/11
2 04/23/18 13.4 10/26/11
2 04/23/18 3.3 11/28/12
2 04/23/18 11.8 11/28/12
2 04/23/18 6 12/27/17
2 04/23/18 3.6 05/09/18
2 04/23/18 30 08/10/18
2 04/23/18 2.6 01/02/19
3 10/08/18 3 06/11/11
3 10/08/18 5 05/14/18
3 10/08/18 4 06/11/18
3 10/08/18 3 08/13/18
4 12/06/16 1.4 07/01/17
4 12/06/16 4.6 11/17/17
4 12/06/16 2 07/19/18
;
data want;
if _n_=1 then do;
dcl hash h(dataset:'have(obs=0)',multidata:'y');
h.definekey('id');
h.definedata(all:'y');
h.definedone();
dcl hiter hi('h');
end;
do until(last.id);
set have;
by id;
if h.num_items =2 then continue;
if TestDate<=TrainingDate then h.replace();
else if _n_ then do;
h.add();
_n_=0;
end;
end;
do while(hi.next()=0);
output;
end;
h.clear();
run;
Thanks! Could you help me learn the full procedure by recommending some materials.
@Barkat Following are the books I would recommend-
1. By Ron Cody. Learning SAS® by Example: A Programmer's Guide, Second Edition-
2. Practical and Efficient SAS Programming: The Insider's Guide: Messineo, Martha
3. High-Performance SAS Coding --Graffeuille, Christian aka @ChrisNZ
4. Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study eBook: Dorfman,Paul, Henderson, Don courtesy @hashman / @DonH
Start with 1 & 2. Learn, understand & practice. This is absolute. Take your time before you hit 3 & 4 until you have a solid grasp of 1,2.
Source: I have a paid subscription with https://www.oreilly.com/ e-library. It is expensive however my life is dependent on it, so I need it. Depending on your professional needs you may choose to subscribe or buy each one of these books on Amazon or SAS press. Best!
@ShelleySessoms Mam, any thoughts or would you like to add something more or perhaps advice a prospective discount on offer?
You have a great list @novinosrin. I don't have anything additional to add at this time.
As far as discounts, you can always visit https://support.sas.com/en/books.html and see what is available. The SAS Books teams lists all discounts and offers on their website. There is always a "book of the month" and a list of free e-books.
Happy learning, @Barkat.
It appears your data are already sorted by id/testdate. If so, then:
data want;
set have;
where trainingdate>=testdate;
by id;
if last.id;
run;
The WHERE statement "outsources" the filter to keep only qualifying testdates. The DATA steps never sees the non-qualifying testdates. Therefore the last testdate seen for each id (the "by id;" and "if last.id" statements) is the most recent qualifying testdate.
The important point here is the operational difference between the where statement and the subsetting if statement. The latter filters only the records that are seen by the data step, while the former, as said above, filters prior to processing by the data step.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.