03-30-2014 10:50 PM
What's the code for using data step to do calculation on one dataset based on values from other dataset.
Here is an example:
Dataset A contains 5 observations (VAR1=1, 2, 3-5)
Dataset B contains 1000 ID, and each ID contains 10 observations (VAR2=1, 2, 3, -10) (i.e. dataset B has 10,000 records).
Using SET statement and data step , how can I read A, and B at the same time, and perform calculations on B for each VAR1 value, then keep the last observation of each ID in B for each VAR1. Final result would be a dataset with 5000 records (with 1000 ID, and 5 observations for each ID for VAR1=1,2, -5)
Hope this is clear :-/
03-31-2014 03:53 PM
So here is a mock data.
Objective is that for each timepoint (in A) calculate how many visits occurred before that timepoint for each ID, so that expected outcome would be below. My question, is how to do that using SET statements and dataset (and not using merges or SQL).
ID timepoint NumVisitsBeforTimePoint
1 1 2
1 2 5
1 3 8
1 4 9
1 5 10
2 1 2
2 2 5
and so forth
input timepoint $2. refDate mmddyy10.;
format refDate mmddyy10.;
format visitdate mmddyy10.;
do id = 1 to 5;
do visit=1 to 10 ;
retain start &start;
start= rand("Uniform") * 15 + 120 + start;
if visit=1 then start=&start;
03-31-2014 05:06 PM
Yes Indexes and point are welcome.
I know, I was able to do it after many-to-many merge using PROC SQL then use couple of datasteps to get the required results. But so far wasn't successful to get SET statement code work correctly, which is what I'm interested in in this discussion. Thanks all.
04-01-2014 04:42 AM
Why not use one SQL statement and use a subquery to get your total?
create table WANT as
(select COUNT(THIS.START) from HAVE_B THIS where THIS.ID=A.ID and THIS.DATE <= A.DATE) as NUMVISITSBEFORETIMEPOINT
from HAVE_A A;
03-31-2014 04:44 PM
SAS can solve problems in numerous ways. Whatever you do, you have to compare a "current" value to prior values to see if a condition is true for the current value. "SAS 9.3 Statements Reference" and "SAS 9.3 Language Reference Concepts" has excellent explanations of how the SET statement works. It appears, at first glance that you should place all the A table values onto the B table and process your selection logic against a single dataset. Page 352 of the Statements Reference book has examples that might help you. I personally would join the two tables into one and then use arrays. Maybe I would used retained values and maybe I would use LAG values. The CONCEPTS book presents some great ideas as well.