Hi,
I have two datasets. Dataset1 is at the person-level and has ID as well as some categorical variables. Dataset2 is at the person-event level and has a matching ID variable and an event date variable that is month/year.
Dataset 1
ID categ_var1...categ_var5
1 abc ghi
2 def jkl
Dataset 2
ID event_date
1 Apr 2012
1 May 2012
1 Jun 2012
1 Aug 2012
Some people from Dataset 1 will not be in Dataset 2 at all. What I want is to keep everyone from Dataset 1 that has events in April, May, and June 2012 in Dataset 2 (so ID=1 would stay, ID=2 would not as they were not found in Dataset 2).
Any help is much appreciated.
Here's a way to do it using a subquery in a SQL procedure:
data dataset1;
input ID$ categ_var1$ categ_var2$ categ_var3$ categ_var4$ categ_var5$;
datalines;
1 abc ghi mno st wx
2 def jkl pqr uv yz
;
run;
data dataset2;
input ID$ event_date;
informat ID $3. event_date mmddyy10.;
format ID $3. event_date mmddyy10.;
datalines;
1 04/01/2012
1 05/01/2012
1 06/01/2012
1 08/01/2012
;
run;
proc sql;
create table want as
select A.*
from dataset1 as A
Where A.ID in (select distinct ID
from dataset2 as B
Where year(B.event_date)=2012 AND month(B.event_date) in (4,5,6));
quit;
The exact syntax will depend on whether EVENT_DATE is text vs. a SAS date. You could use the SQL approach already suggested (but with modifications for the EVENT_DATE selection if EVENT_DATE happens to be text). It's also straightforward to sort and merge. Assuming both data sets are sorted by ID:
data want;
merge dataset1 (in=in1) dataset2 (in=in2);
by id;
if in1 and in2;
/* For a text EVENT_DATE subset as follows */
if event_date in ('Apr 2012', 'May 2012', 'Jun 2012');
/* For a SAS date version of EVENT_DATE subset as follows */
if '01apr2012'd <= event_date <= '30jun2012'd;
run;
Things to consider: Do you want to drop EVENT_DATE from the final data set? Could there be variations in spelling, such as Apr, apr, and APR?
You could also use a hash table if the lookup dataset can fit into memory.
DATA WANT; IF _N_ = 1 THEN DO; DECLARE HASH H(DATASET:"WORK.DS2 (WHERE = (EVENT_DATE IN ('APRIL 2012', 'MAY 2012', 'JUNE 2012')))"); H.DEFINEKEY("ID"); H.DEFINEDONE(); END; SET WORK.DS1; RC = H.CHECK(); IF RC = 0 THEN OUTPUT; RUN;
In the above example we are subsetting the lookup data by the event_dates. This is only applicable to SAS 9.2 and above.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.