BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Barkat
Pyrite | Level 9

I have a dataset (attached) with column names id, TrainingDate, Score, TestDate. Each id has a single TrainingDate and multiple TestDate with score. Some TestDates are before TrainngDate and some TestDates are after TrainingDate. I would like to keep the closest TestDate before the TraingDate and the closest TestDate after the TrainingDate. 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Some fun with Hashes-

 

data have;
input id	TrainingDate :mmddyy10.	Score	TestDate :mmddyy10.;
format TrainingDate mmddyy10. TestDate mmddyy10.;
cards;
1	11/21/16	3	08/26/10
1	11/21/16	3	06/27/11
1	11/21/16	7	10/20/16
1	11/21/16	3	06/13/17
1	11/21/16	2.5	08/13/17
2	04/23/18	5	11/15/10
2	04/23/18	3.3	10/26/11
2	04/23/18	13.4	10/26/11
2	04/23/18	3.3	11/28/12
2	04/23/18	11.8	11/28/12
2	04/23/18	6	12/27/17
2	04/23/18	3.6	05/09/18
2	04/23/18	30	08/10/18
2	04/23/18	2.6	01/02/19
3	10/08/18	3	06/11/11
3	10/08/18	5	05/14/18
3	10/08/18	4	06/11/18
3	10/08/18	3	08/13/18
4	12/06/16	1.4	07/01/17
4	12/06/16	4.6	11/17/17
4	12/06/16	2	07/19/18
;

data want;
 if _n_=1 then do;
  dcl hash h(dataset:'have(obs=0)',multidata:'y');
  h.definekey('id');
  h.definedata(all:'y');
  h.definedone();
  dcl hiter hi('h');
 end;
 do until(last.id);
  set have;
  by id;
  if h.num_items =2 then continue;
  if TestDate<=TrainingDate then h.replace();
  else if _n_ then do;
   h.add();
   _n_=0;
  end;
 end;
 do while(hi.next()=0);
  output;
 end;
 h.clear();
run;

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

Hi @Barkat  Since you didn't post your expected output for your input sample, please see if the following is what you are after i.e. whether my understanding of your requirement is correct-


data have;
input id	TrainingDate :mmddyy10.	Score	TestDate :mmddyy10.;
format TrainingDate mmddyy10. TestDate mmddyy10.;
cards;
1	11/21/16	3	08/26/10
1	11/21/16	3	06/27/11
1	11/21/16	7	10/20/16
1	11/21/16	3	06/13/17
1	11/21/16	2.5	08/13/17
2	04/23/18	5	11/15/10
2	04/23/18	3.3	10/26/11
2	04/23/18	13.4	10/26/11
2	04/23/18	3.3	11/28/12
2	04/23/18	11.8	11/28/12
2	04/23/18	6	12/27/17
2	04/23/18	3.6	05/09/18
2	04/23/18	30	08/10/18
2	04/23/18	2.6	01/02/19
3	10/08/18	3	06/11/11
3	10/08/18	5	05/14/18
3	10/08/18	4	06/11/18
3	10/08/18	3	08/13/18
4	12/06/16	1.4	07/01/17
4	12/06/16	4.6	11/17/17
4	12/06/16	2	07/19/18
;
proc sql;
 create table want as
 select *
 from have
 group by id
 having max(ifn(TestDate<=TrainingDate,TestDate,.))=testdate or min(ifn(TestDate>TrainingDate,TestDate,.))=testdate
 order by id,TestDate;
quit;
Kurt_Bremser
Super User

Sort by id and testdate, then you can do a "look-ahead merge" (firstobs=2) of the dataset with itself.

For code, please supply example data in usable form (as code in a data step with datalines).

novinosrin
Tourmaline | Level 20

Some fun with Hashes-

 

data have;
input id	TrainingDate :mmddyy10.	Score	TestDate :mmddyy10.;
format TrainingDate mmddyy10. TestDate mmddyy10.;
cards;
1	11/21/16	3	08/26/10
1	11/21/16	3	06/27/11
1	11/21/16	7	10/20/16
1	11/21/16	3	06/13/17
1	11/21/16	2.5	08/13/17
2	04/23/18	5	11/15/10
2	04/23/18	3.3	10/26/11
2	04/23/18	13.4	10/26/11
2	04/23/18	3.3	11/28/12
2	04/23/18	11.8	11/28/12
2	04/23/18	6	12/27/17
2	04/23/18	3.6	05/09/18
2	04/23/18	30	08/10/18
2	04/23/18	2.6	01/02/19
3	10/08/18	3	06/11/11
3	10/08/18	5	05/14/18
3	10/08/18	4	06/11/18
3	10/08/18	3	08/13/18
4	12/06/16	1.4	07/01/17
4	12/06/16	4.6	11/17/17
4	12/06/16	2	07/19/18
;

data want;
 if _n_=1 then do;
  dcl hash h(dataset:'have(obs=0)',multidata:'y');
  h.definekey('id');
  h.definedata(all:'y');
  h.definedone();
  dcl hiter hi('h');
 end;
 do until(last.id);
  set have;
  by id;
  if h.num_items =2 then continue;
  if TestDate<=TrainingDate then h.replace();
  else if _n_ then do;
   h.add();
   _n_=0;
  end;
 end;
 do while(hi.next()=0);
  output;
 end;
 h.clear();
run;
Barkat
Pyrite | Level 9

Thanks! Could you help me learn the full procedure by recommending some materials.

novinosrin
Tourmaline | Level 20

@Barkat  Following are the books I would recommend-

 

1. By Ron Cody. Learning SAS® by Example: A Programmer's Guide, Second Edition- 

2. Practical and Efficient SAS Programming: The Insider's Guide: Messineo, Martha

3. High-Performance SAS Coding --Graffeuille, Christian  aka @ChrisNZ  

4. Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study eBook: Dorfman,Paul, Henderson, Don  courtesy @hashman / @DonH

 

Start with 1 & 2. Learn, understand & practice. This is absolute. Take your time before you hit 3 & 4 until you have a solid grasp of 1,2. 

 

Source: I have a paid subscription with https://www.oreilly.com/ e-library. It is expensive however my life is dependent on it, so I need it. Depending on your professional needs you may choose to subscribe or buy each one of these books on Amazon or SAS press. Best!

 

@ShelleySessoms  Mam, any thoughts or would you like to add something more or perhaps advice a prospective discount on offer?

ShelleySessoms
Community Manager

You have a great list @novinosrin. I don't have anything additional to add at this time.

 

As far as discounts, you can always visit https://support.sas.com/en/books.html and see what is available. The SAS Books teams lists all discounts and offers on their website. There is always a "book of the month" and a list of free e-books.

 

Happy learning, @Barkat.  

mkeintz
PROC Star

It appears your data are already sorted by id/testdate.  If so, then:

 

data want;
  set have;
  where trainingdate>=testdate;
  by id;
  if last.id;
run;

The WHERE statement "outsources" the filter to keep only qualifying testdates.  The DATA steps never sees the non-qualifying testdates.  Therefore the last testdate seen for each id  (the "by id;" and "if last.id" statements) is the most recent qualifying testdate.  

 

The important point here is the operational difference between the where statement and the subsetting if statement.  The latter filters only the records that are seen by the data step, while the former, as said above, filters prior to processing by the data step.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1490 views
  • 8 likes
  • 5 in conversation