Hi,
I'm relatively new to SAS and I'm struggling with determining the first and last values in a group if they are different from each other. I have brought the data to this form and I need to keep only those values whose first ID value is equal to B while the second is equal to A. According to this, in the following example, only two groups should remain. Unfortunately with proc sql the condition with if first. and last. then is not working. Does anyone have any ideas?
Date | ID | Group |
06JUL2022:17:56:14 | B | X/27580 |
04OCT2022:00:08:25 | B | X/27580 |
02JAN2023:17:01:33 | A | X/27580 |
02AUG2022:12:29:01 | B | X/27581 |
04OCT2022:00:05:16 | A | X/27581 |
02JAN2023:16:52:33 | B | X/27581 |
06JUL2022:17:54:43 | A | X/27582 |
27JUL2022:12:59:25 | B | X/27582 |
02AUG2022:12:21:25 | B | X/27582 |
04AUG2022:14:56:26 | A | X/27582 |
04OCT2022:00:05:16 | B | X/27582 |
06JUL2022:18:16:03 | B | X/27583 |
25JUL2022:14:31:08 | B | X/27583 |
09AUG2022:14:15:41 | B | X/27583 |
23AUG2022:11:34:58 | A | X/27583 |
04OCT2022:00:01:15 | B | X/27583 |
02JAN2023:19:10:10 | A | X/27583 |
Hi @rzuf,
Here's an elementary approach without DO loops:
data want(drop=id);
set have;
by group date;
if first.group then Count=(id='B');
if last.group & count & id='A';
retain count;
run;
This assumes that the input dataset ("HAVE") is sorted by Group Date. Variable Count is set to 1 if the first observation of a group has ID='B' and to 0 otherwise. Thanks to the RETAIN statement, this information is kept until the last observation of the group, which will be written to the output dataset ("WANT") if Count=1 and ID='A', dropping the now redundant variable ID.
The First. and Last. automatic variables only occur in data step code where there is a BY statement.
So SQL would not have them because it does not have a By statement. Also, SQL does not process records according to any existing data order. Very few operations that rely on sequential processing or identification will work well if at all in SQL.
So, what is the desired output from that data?
Double DO loop:
data want;
count = 0;
do until (last.group);
set have;
by group;
count + 1;
if count = 1 and id = "B" then b = 1;
if count = 2 and id = "A" then a = 1;
end;
do until (last.group);
set have;
by group;
if a and b then output;
end;
drop count a b;
run;
Untested; for tested code, supply usable example data in a working data step with datalines.
Hi @rzuf,
Here's an elementary approach without DO loops:
data want(drop=id);
set have;
by group date;
if first.group then Count=(id='B');
if last.group & count & id='A';
retain count;
run;
This assumes that the input dataset ("HAVE") is sorted by Group Date. Variable Count is set to 1 if the first observation of a group has ID='B' and to 0 otherwise. Thanks to the RETAIN statement, this information is kept until the last observation of the group, which will be written to the output dataset ("WANT") if Count=1 and ID='A', dropping the now redundant variable ID.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.