Hi...
Is the folllwing two codes doing the SAME for the below data?
if true can someone explain the do until code???????why two SET stmnts used etc etc???
1)IF we have two or more consecutive INs then we need to take the first IN
2)if we have two or more consecutive OUTS then we need to take the LAST OUT
data want;
do until(last.flag);
set have;
by id flag notsorted;
end;
do until(last.flag);
set have;
by id flag notsorted;
if flag='IN' and first.flag then output;
if flag='OUT' and last.flag then output;
end;
run;
set have;
by id flag notsorted;
if flag='IN' and first.flag then output;
if flag='OUT' and last.flag then output;
run;
HAVE
ID recorded_time flag
101 01AUG2013:10:00 IN
101 01AUG2013:14:15 IN
101 01AUG2013:17:15 OUT
102 02AUG2013:19:23 OUT
102 02AUG2013:21:04 OUT
102 11AUG2013:01:01 IN
102 11AUG2013:01:30 OUT
103 09AUG2013:19:25 IN
103 14AUG2013:14:35 OUT
103 14AUG2013:23:01 IN
103 14AUG2013:23:12 IN
103 17AUG2013:11:20 OUT
104 02AUG2013:04:02 OUT
104 08AUG2013:15:00 IN
104 20AUG2013:14:11 IN
104 20AUG2013:17:12 OUT
105 07AUG2013:02:50 IN
105 07AUG2013:02:55 IN
105 11AUG2013:14:00 OUT
105 16AUG2013:14:10 OUT
106 02AUG2013:08:15 OUT
107 20AUG2013:13:05 IN
108 12AUG2013:11:00 IN
108 12AUG2013:18:37 OUT
108 17AUG2013:03:55 OUT
108 22AUG2013:11:25 OUT
109 06AUG2013:18:55 IN
109 07AUG2013:11:20 OUT
109 13AUG2013:00:09 IN
109 14AUG2013:10:35 OUT
WANT
ID recorded_time flag
101 01AUG2013:10:00 IN
101 01AUG2013:17:15 OUT
102 02AUG2013:21:04 OUT
102 11AUG2013:01:01 IN
102 11AUG2013:01:30 OUT
103 09AUG2013:19:25 IN
103 14AUG2013:14:35 OUT
103 14AUG2013:23:01 IN
103 17AUG2013:11:20 OUT
104 02AUG2013:04:02 OUT
104 08AUG2013:15:00 IN
104 20AUG2013:17:12 OUT
105 07AUG2013:02:50 IN
105 16AUG2013:14:10 OUT
106 02AUG2013:08:15 OUT
107 20AUG2013:13:05 IN
108 12AUG2013:11:00 IN
108 22AUG2013:11:25 OUT
109 06AUG2013:18:55 IN
109 07AUG2013:11:20 OUT
109 13AUG2013:00:09 IN
109 14AUG2013:10:35 OUT
Hi Robert,
At first glance, both codes do produce the same results. The only distinction is that in the first approach doesn't increment _N_ the same way as it uses do blocks. However, the first do until block is redundant with the second. You are essentially reading however many rows there are in your by group, doing nothing with it and then reading them all again to do the desired conditionnal processing in the second do block.
It should be simplified as
data want;
do until(last.flag);
set have;
by id flag notsorted;
if flag='IN' and first.flag then output;
if flag='OUT' and last.flag then output;
end;
As an experiment, try the following:
data want;
do until(last.flag);
i=_N_;
set have;
by id flag notsorted;
if flag='IN' and first.flag then output;
if flag='OUT' and last.flag then output;
end;
run;
data want2;
set have;
i=_N_;
by id flag notsorted;
if flag='IN' and first.flag then output;
if flag='OUT' and last.flag then output;
run;
This will depict the slight subtlety between the 2. Please note also that because the first code is enclosed within DO block, it means that the PDV only gets reset to missing each time _N_ iterates. Thus, if you calculated any new column from your data conditionnally, the values would be kept as though you had a retain statement by group. However, this is a non-issue with your example.
Vince
Hi,
Thanks for the response. If you take 109 for example
109 06AUG2013:18:55 IN
109 07AUG2013:11:20 OUT
109 13AUG2013:00:09 IN
109 14AUG2013:10:35 OUT
data want2;
set have;
i=_N_;
by id flag notsorted;
if flag='IN' and first.flag then output;
if flag='OUT' and last.flag then output;
run;
because of the non sorted options within the same ID we have many first.'s and last.'s instead of having
one first. and one last.??????
Thanks
The not sorted allows you not to receive an error message because your data is not sorted according to your BY variables. I've assumed from your example that it was intended to get each pair of in/out and that all you wanted to remove from your data were the successive INs or OUTs.
The not sorted options allows you to retain the chronological ordering of your data and creates new by groups each time there is a change in a by group variable even if the same group existed before. So yes, it allows you to have many firsts and lasts for each by group for so long as they are separated by a different by group.
If you wanted to get strictly the earliest IN and lastest OUT for each ID rather than each pair of alternate, you would've done
proc sort data=have;
by id flag recorded_time;
run;
data want;
set have;
by id flag;
if first.flag and flag='IN' then output;
if last.flag and flag='OUT' then output;
run;
The re-sorting by flag before the chronological series allows you to retain strictly one 'IN' and one 'OUT' value. Since your data is sorted according to your by variables, the notsorted option would have no effect.
Not sure that answers your question - hope it helps.
Vince
Hi Vince,
I get what you are trying to explain...
My only concern and thought is
wont there be duplicates because of using the SET statement twice???
data want;
do until(last.flag);
set have;
by id flag notsorted;
end;
do until(last.flag);
set have;
by id flag notsorted;
if flag='IN' and first.flag then output;
if flag='OUT' and last.flag then output;
end;
run;
/*ALSO CAN I USE THE ELSE IF instead of IF condition in the second IF condition???
No. Your output is controlled by the OUTPUT statements in the two conditionnal statements in the second DO loop. Thus, as I mentioned in my first post, the first DO block is useless code. It only slows your program down.
Since both set statement use a different copy of the same dataset, there is a single PDV with the variables in HAVE. The PDV gets overwritten for each set statement in the first do block nothing gets output. The second do block begins, reads from "another copy" of HAVE and overwrites the PDV until one of the two IF conditions are met in which case there is an output (another copy conceptually that is not exactly how it works in practice - I believe that each set statement uses a different pointer, think like an internal _N_ to the set statement so they both read through the entire dataset and the rows read in the first block have no impact on the rows read in the second block).
I'm not sure my explanation is great above but anyway it gives some insight on how the set statements apply to the PDV and why the first DO block is useless to the desired output. DO UNTIL blocks are always entered because the until condition only applies at the end of the do block. Thus, even if you are at last.flag from the first block, you still enter the second block and then last.flag is set to 0 and first.flag is set to 1 from the new row read from set have.
Yes, you could use an ELSE IF because the conditions are mutually exclusive. In fact, it would save you from testing the second condition for each first.flag thus slightly improving the program efficiency.
Be aware though, first.flag and last.flag are not mutually exclusive, the empty intersection is that of flag='IN' and flag='OUT'.
For example
if first.flag then output;
if last.flag then output;
behaves differently from
if first.flag then output;
else if last.flag then output;
In the former, anytime there is a single element in a by flag group will be output twice as it is both the first and the last flag of the group so you will absolutely always have 2 rows produced for each by flag group.
In the later, the else removes double printing whenever a group has a single element.
Hi,
Thanks for the detailed explanation....
In my case even if there is a single element "I am saying
if flag='IN' and first.flag then output;
if flag='OUT' and last.flag then output;
I am specifying what the flag is so i think it would not pull any duplicates?
Thanks
Indeed I said that in the previous post. Two ifs with mutually exclusive conditions cannot both be true at the same time so you will not have duplicate output from not using an ELSE statement. You simply slightly reduce program efficiency
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.