11-30-2015 03:57 PM
I have two datasets. Dataset1 is at the person-level and has ID as well as some categorical variables. Dataset2 is at the person-event level and has a matching ID variable and an event date variable that is month/year.
1 abc ghi
2 def jkl
1 Apr 2012
1 May 2012
1 Jun 2012
1 Aug 2012
Some people from Dataset 1 will not be in Dataset 2 at all. What I want is to keep everyone from Dataset 1 that has events in April, May, and June 2012 in Dataset 2 (so ID=1 would stay, ID=2 would not as they were not found in Dataset 2).
Any help is much appreciated.
11-30-2015 04:40 PM
Here's a way to do it using a subquery in a SQL procedure:
data dataset1; input ID$ categ_var1$ categ_var2$ categ_var3$ categ_var4$ categ_var5$; datalines; 1 abc ghi mno st wx 2 def jkl pqr uv yz ; run; data dataset2; input ID$ event_date; informat ID $3. event_date mmddyy10.; format ID $3. event_date mmddyy10.; datalines; 1 04/01/2012 1 05/01/2012 1 06/01/2012 1 08/01/2012 ; run; proc sql; create table want as select A.* from dataset1 as A Where A.ID in (select distinct ID from dataset2 as B Where year(B.event_date)=2012 AND month(B.event_date) in (4,5,6)); quit;
11-30-2015 10:27 PM - edited 11-30-2015 10:28 PM
The exact syntax will depend on whether EVENT_DATE is text vs. a SAS date. You could use the SQL approach already suggested (but with modifications for the EVENT_DATE selection if EVENT_DATE happens to be text). It's also straightforward to sort and merge. Assuming both data sets are sorted by ID:
merge dataset1 (in=in1) dataset2 (in=in2);
if in1 and in2;
/* For a text EVENT_DATE subset as follows */
if event_date in ('Apr 2012', 'May 2012', 'Jun 2012');
/* For a SAS date version of EVENT_DATE subset as follows */
if '01apr2012'd <= event_date <= '30jun2012'd;
Things to consider: Do you want to drop EVENT_DATE from the final data set? Could there be variations in spelling, such as Apr, apr, and APR?
11-30-2015 11:27 PM
You could also use a hash table if the lookup dataset can fit into memory.
DATA WANT; IF _N_ = 1 THEN DO; DECLARE HASH H(DATASET:"WORK.DS2 (WHERE = (EVENT_DATE IN ('APRIL 2012', 'MAY 2012', 'JUNE 2012')))"); H.DEFINEKEY("ID"); H.DEFINEDONE(); END; SET WORK.DS1; RC = H.CHECK(); IF RC = 0 THEN OUTPUT; RUN;
In the above example we are subsetting the lookup data by the event_dates. This is only applicable to SAS 9.2 and above.