BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have two datasets. Dataset1 is at the person-level and has ID as well as some categorical variables. Dataset2 is at the person-event level and has a matching ID variable and an event date variable that is month/year.

 

Dataset 1

ID  categ_var1...categ_var5

1     abc                ghi

2     def                 jkl

 

Dataset 2

ID     event_date

1       Apr 2012

1       May 2012

1       Jun 2012

1       Aug 2012

 

Some people from Dataset 1 will not be in Dataset 2 at all. What I want is to keep everyone from Dataset 1 that has events in April, May, and June 2012 in Dataset 2 (so ID=1 would stay, ID=2 would not as they were not found in Dataset 2).

 

Any help is much appreciated.

 

 

3 REPLIES 3
dcruik
Lapis Lazuli | Level 10

Here's a way to do it using a subquery in a SQL procedure:

 

 

data dataset1;
input ID$ categ_var1$ categ_var2$ categ_var3$ categ_var4$ categ_var5$;
datalines;
1 abc ghi mno st wx
2 def jkl pqr uv yz
;
run;

data dataset2;
input ID$ event_date;
informat ID $3. event_date mmddyy10.;
format ID $3. event_date mmddyy10.;
datalines;
1 04/01/2012
1 05/01/2012
1 06/01/2012
1 08/01/2012
;
run;

proc sql;
create table want as
select A.*
from dataset1 as A
Where A.ID in (select distinct ID 
	       from dataset2 as B
	       Where year(B.event_date)=2012 AND month(B.event_date) in (4,5,6));
quit;

 

Astounding
PROC Star

The exact syntax will depend on whether EVENT_DATE is text vs. a SAS date.  You could use the SQL approach already suggested (but with modifications for the EVENT_DATE selection if EVENT_DATE happens to be text).  It's also straightforward to sort and merge.  Assuming both data sets are sorted by ID:

 

data want;

merge dataset1 (in=in1) dataset2 (in=in2);

by id;

if in1 and in2;

/* For a text EVENT_DATE subset as follows */

if event_date in ('Apr 2012', 'May 2012', 'Jun 2012');

/* For a SAS date version of EVENT_DATE subset as follows */

if '01apr2012'd <= event_date <= '30jun2012'd;

run;

 

Things to consider:  Do you want to drop EVENT_DATE from the final data set?  Could there be variations in spelling, such as Apr, apr, and APR?

Scott_Mitchell
Quartz | Level 8

You could also use a hash table if the lookup dataset can fit into memory.

 

DATA WANT;
	IF _N_ = 1 THEN DO;
		DECLARE HASH H(DATASET:"WORK.DS2 (WHERE = (EVENT_DATE IN ('APRIL 2012', 'MAY 2012', 'JUNE 2012')))");
		H.DEFINEKEY("ID");
		H.DEFINEDONE();
	END;

	SET WORK.DS1;

	RC = H.CHECK();

	IF RC = 0 THEN OUTPUT;
RUN;

 

In the above example we are subsetting the lookup data by the event_dates.  This is only applicable to SAS 9.2 and above.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 779 views
  • 0 likes
  • 4 in conversation