Contributor
Posts: 62

# SET statement and two datasets

Hi everyone,

What's the code for using data step to do calculation on one dataset based on values from other dataset.

Here is an example:

Dataset A contains 5 observations (VAR1=1, 2, 3-5)

Dataset B contains 1000 ID, and each ID contains 10 observations (VAR2=1, 2, 3, -10) (i.e. dataset B has 10,000 records).

Using SET statement and data step , how can I read A, and B at the same time, and perform calculations on B for each VAR1 value, then keep the last observation of each ID in B for each VAR1. Final result would be a dataset with 5000 records (with 1000 ID, and 5 observations for each ID for VAR1=1,2, -5)

Hope this is clear :-/

Super User
Posts: 23,700

## Re: SET statement and two datasets

Please repost this as question rather than a discussion, preferably with sample data, input and expected output.

Contributor
Posts: 62

## Re: SET statement and two datasets

So here is a mock data.

Objective is that for each timepoint (in A) calculate how many visits occurred before that timepoint for each ID, so that expected outcome would be below. My question, is how to do that using SET statements and dataset (and not using merges or SQL).

ID    timepoint           NumVisitsBeforTimePoint

1     1                   2

1     2                   5

1     3                   8

1     4                   9

1     5                   10

2     1                   2

2     2                   5

.

.

and so forth

data A;

input timepoint \$2. refDate mmddyy10.;

format refDate mmddyy10.;

cards;

1 03/01/2010

2 06/01/2011

3 01/01/2012

4 07/01/2012

5 02/01/2013

;

run;

%let start='01jan2010'd;

data B;

format visitdate mmddyy10.;

do id = 1 to 5;

do visit=1 to 10 ;

retain start &start;

start= rand("Uniform") * 15 + 120 + start;

visitdate=start;

if visit=1 then start=&start;

output;

end;

end;

drop start;

run;

Super User
Posts: 23,700

## Re: SET statement and two datasets

Still not a question.

Indexes and point are allowed?

Contributor
Posts: 62

## Re: SET statement and two datasets

Yes Indexes and point are welcome.

I know, I was able to do it after many-to-many merge using PROC SQL then use couple of datasteps to get the required results. But so far wasn't successful to get SET statement code work correctly, which is what I'm interested in in this discussion.  Thanks all.

Super User
Posts: 9,599

## Re: SET statement and two datasets

Hi,

Why not use one SQL statement and use a subquery to get your total?

proc sql;

create table WANT as

select  A.ID,

A.TIMEPOINT,

(select COUNT(THIS.START) from HAVE_B THIS where THIS.ID=A.ID and THIS.DATE <= A.DATE) as NUMVISITSBEFORETIMEPOINT

from    HAVE_A A;

quit;

Regular Contributor
Posts: 217

## Re: SET statement and two datasets

SAS can solve problems in numerous ways.  Whatever you do, you have to compare a "current" value to prior values to see if a condition is true for the current value.  "SAS 9.3 Statements Reference" and "SAS 9.3 Language Reference Concepts" has excellent explanations of how the SET statement works. It appears, at first glance that you should place all the A table values onto the B table and process your selection logic against a single dataset.  Page 352 of the Statements Reference book has examples that might help you.  I personally would join the two tables into one and then use arrays. Maybe I would used retained values and maybe I would use LAG values. The CONCEPTS book presents some great ideas as well.

http://support.sas.com/documentation/cdl/en/lestmtsref/63323/HTML/default/viewer.htm#titlepage.htm

http://support.sas.com/documentation/cdl/en/lrcon/65287/HTML/default/viewer.htm#titlepage.htm

Discussion stats
• 6 replies
• 371 views
• 0 likes
• 4 in conversation