Hi everyone,
What's the code for using data step to do calculation on one dataset based on values from other dataset.
Here is an example:
Dataset A contains 5 observations (VAR1=1, 2, 3-5)
Dataset B contains 1000 ID, and each ID contains 10 observations (VAR2=1, 2, 3, -10) (i.e. dataset B has 10,000 records).
Using SET statement and data step , how can I read A, and B at the same time, and perform calculations on B for each VAR1 value, then keep the last observation of each ID in B for each VAR1. Final result would be a dataset with 5000 records (with 1000 ID, and 5 observations for each ID for VAR1=1,2, -5)
Hope this is clear 😕
Please repost this as question rather than a discussion, preferably with sample data, input and expected output.
So here is a mock data.
Objective is that for each timepoint (in A) calculate how many visits occurred before that timepoint for each ID, so that expected outcome would be below. My question, is how to do that using SET statements and dataset (and not using merges or SQL).
ID timepoint NumVisitsBeforTimePoint
1 1 2
1 2 5
1 3 8
1 4 9
1 5 10
2 1 2
2 2 5
.
.
and so forth
data A;
input timepoint $2. refDate mmddyy10.;
format refDate mmddyy10.;
cards;
1 03/01/2010
2 06/01/2011
3 01/01/2012
4 07/01/2012
5 02/01/2013
;
run;
%let start='01jan2010'd;
data B;
format visitdate mmddyy10.;
do id = 1 to 5;
do visit=1 to 10 ;
retain start &start;
start= rand("Uniform") * 15 + 120 + start;
visitdate=start;
if visit=1 then start=&start;
output;
end;
end;
drop start;
run;
Still not a question.
Indexes and point are allowed?
Yes Indexes and point are welcome.
I know, I was able to do it after many-to-many merge using PROC SQL then use couple of datasteps to get the required results. But so far wasn't successful to get SET statement code work correctly, which is what I'm interested in in this discussion. Thanks all.
Hi,
Why not use one SQL statement and use a subquery to get your total?
proc sql;
create table WANT as
select A.ID,
A.TIMEPOINT,
(select COUNT(THIS.START) from HAVE_B THIS where THIS.ID=A.ID and THIS.DATE <= A.DATE) as NUMVISITSBEFORETIMEPOINT
from HAVE_A A;
quit;
SAS can solve problems in numerous ways. Whatever you do, you have to compare a "current" value to prior values to see if a condition is true for the current value. "SAS 9.3 Statements Reference" and "SAS 9.3 Language Reference Concepts" has excellent explanations of how the SET statement works. It appears, at first glance that you should place all the A table values onto the B table and process your selection logic against a single dataset. Page 352 of the Statements Reference book has examples that might help you. I personally would join the two tables into one and then use arrays. Maybe I would used retained values and maybe I would use LAG values. The CONCEPTS book presents some great ideas as well.
http://support.sas.com/documentation/cdl/en/lestmtsref/63323/HTML/default/viewer.htm#titlepage.htm
http://support.sas.com/documentation/cdl/en/lrcon/65287/HTML/default/viewer.htm#titlepage.htm
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.