BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have two datasets. Dataset1 is at the person-level and has ID as well as some categorical variables. Dataset2 is at the person-event level and has a matching ID variable and an event date variable that is month/year.

 

Dataset 1

ID  categ_var1...categ_var5

1     abc                ghi

2     def                 jkl

 

Dataset 2

ID     event_date

1       Apr 2012

1       May 2012

1       Jun 2012

1       Aug 2012

 

Some people from Dataset 1 will not be in Dataset 2 at all. What I want is to keep everyone from Dataset 1 that has events in April, May, and June 2012 in Dataset 2 (so ID=1 would stay, ID=2 would not as they were not found in Dataset 2).

 

Any help is much appreciated.

 

 

3 REPLIES 3
dcruik
Lapis Lazuli | Level 10

Here's a way to do it using a subquery in a SQL procedure:

 

 

data dataset1;
input ID$ categ_var1$ categ_var2$ categ_var3$ categ_var4$ categ_var5$;
datalines;
1 abc ghi mno st wx
2 def jkl pqr uv yz
;
run;

data dataset2;
input ID$ event_date;
informat ID $3. event_date mmddyy10.;
format ID $3. event_date mmddyy10.;
datalines;
1 04/01/2012
1 05/01/2012
1 06/01/2012
1 08/01/2012
;
run;

proc sql;
create table want as
select A.*
from dataset1 as A
Where A.ID in (select distinct ID 
	       from dataset2 as B
	       Where year(B.event_date)=2012 AND month(B.event_date) in (4,5,6));
quit;

 

Astounding
PROC Star

The exact syntax will depend on whether EVENT_DATE is text vs. a SAS date.  You could use the SQL approach already suggested (but with modifications for the EVENT_DATE selection if EVENT_DATE happens to be text).  It's also straightforward to sort and merge.  Assuming both data sets are sorted by ID:

 

data want;

merge dataset1 (in=in1) dataset2 (in=in2);

by id;

if in1 and in2;

/* For a text EVENT_DATE subset as follows */

if event_date in ('Apr 2012', 'May 2012', 'Jun 2012');

/* For a SAS date version of EVENT_DATE subset as follows */

if '01apr2012'd <= event_date <= '30jun2012'd;

run;

 

Things to consider:  Do you want to drop EVENT_DATE from the final data set?  Could there be variations in spelling, such as Apr, apr, and APR?

Scott_Mitchell
Quartz | Level 8

You could also use a hash table if the lookup dataset can fit into memory.

 

DATA WANT;
	IF _N_ = 1 THEN DO;
		DECLARE HASH H(DATASET:"WORK.DS2 (WHERE = (EVENT_DATE IN ('APRIL 2012', 'MAY 2012', 'JUNE 2012')))");
		H.DEFINEKEY("ID");
		H.DEFINEDONE();
	END;

	SET WORK.DS1;

	RC = H.CHECK();

	IF RC = 0 THEN OUTPUT;
RUN;

 

In the above example we are subsetting the lookup data by the event_dates.  This is only applicable to SAS 9.2 and above.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1054 views
  • 0 likes
  • 4 in conversation