Hello all,
I've been racking my brain on this data manipulation and it is intuitively easy, but I'm having trouble implementing it. I have something like the following table:
ID | Entry | Exit |
1 | 0 | 1 |
1 | 1 | 2 |
1 | 3 | 2 |
2 | 0 | 5 |
2 | 5 | 6 |
2 | 6 | 7 |
2 | 7 | 8 |
3 | 0 | 3 |
3 | 3 | 4 |
3 | 5 | 4 |
This is incorrect as Entry should always be LESS THAN Exit and the Exit value for the previous row should be the Entry value for the subsequent row. The following table is correct:
ID | Entry | Exit |
1 | 0 | 1 |
1 | 1 | 2 |
1 | 2 | 3 |
2 | 0 | 5 |
2 | 5 | 6 |
2 | 6 | 7 |
2 | 7 | 8 |
3 | 0 | 3 |
3 | 3 | 4 |
3 | 4 | 5 |
I've tried sorting the data by ascending ID and descending entry and exit. Afterwards, I used the lag function. However, I was not getting the bottom table. Any thoughts?
Thank you!
John
Very interesting question .
data have; input ID Entry Exit ; cards; 1 0 1 1 1 3 1 3 2 2 0 5 2 5 6 2 6 7 2 7 8 3 0 3 3 3 5 3 5 4 ; run; data temp(drop=entry exit); set have; value=entry;output; value=exit;output; run; proc sort data=temp nodupkey;by id value;run; data x; set temp; by id; if not first.id and not last.id then output; output; run; data x; set x; if mod(_n_,2)=1 then do;name='entry'; n+1;end; else name='exit'; run; proc transpose data=x out=want(drop=_: n); by id n; id name; var value; run;
Xia Keshan
data have;
input ID Entry Exit;
datalines;
1 0 1
1 1 2
1 3 2
2 0 5
2 5 6
2 6 7
2 7 8
3 0 3
3 3 4
3 5 4
;
data want (drop=entry exit rename=(new_entry=entry new_exit=exit));
set have;
if entry>exit then new_entry=coalesce(exit,entry);
else new_entry=entry;
if entry>exit then new_exit=coalesce(entry,exit);
else new_exit=exit;
run;
Thank you for the quick reply. I made a mistake in my "incorrect" table. It should be:
ID | Entry | Exit |
1 | 0 | 1 |
1 | 1 | 3 |
1 | 3 | 2 |
2 | 0 | 5 |
2 | 5 | 6 |
2 | 6 | 7 |
2 | 7 | 8 |
3 | 0 | 3 |
3 | 3 | 5 |
3 | 5 | 4 |
data want(drop=entry exit rename=(new_entry=entry cnt=exit));
set have;
by id;
if entry>exit then new_entry=coalesce(exit,entry);
else new_entry=entry;
if first.id then cnt=exit;
else cnt+1;
run;
Thanks for your help thus far. However, my data is a little more general than what I presented. In other words, the values are not all integers that increment by 1. They could have fractions after them. Is there a way to swap two values within a column, in the same way coalesce "swaps" values?
Hi,
What about something like:
proc sql;
create table WANT as
select ID,
MIN(ENTRY,EXIT) as ENTRY,
MAX(ENTRY,EXIT) as EXIT
from HAVE;
quit;
For ID 1 what is "correct" .
If we simply sort ENTRY,EXIT then lag(exit) does not equal entry for obs 2 and 3.
How do you get from here?
ID | Entry | Exit |
1 | 0 | 1 |
1 | 1 | 3 |
1 | 3 | 2 |
to what I guess it still "correct" from your first post?
ID | Entry | Exit |
1 | 0 | 1 |
1 | 1 | 2 |
1 | 2 | 3 |
Do you sort columns first or check lag(exit) eq entry?
Yes for ID = 1 the bottom table is correct. To be more specific, here is an actual output of my table:
ID | Entry | Exit |
498 | 0.00000 | 0.12320 |
498 | 0.12320 | 0.25462 |
498 | 0.25462 | 1.78234 |
498 | 1.78234 | 3.20602 |
498 | 3.20602 | 3.15127 |
500 | 0.00000 | 2.00684 |
500 | 2.00684 | 2.16838 |
500 | 2.16838 | 2.19849 |
500 | 2.19849 | 3.04175 |
500 | 3.04175 | 2.54346 |
You'll notice that for both, ID 498 and 500, the second to last exit values should be flipped with the last exit value. Also for the last row of each ID, the entry and exit should be flipped.
Very interesting question .
data have; input ID Entry Exit ; cards; 1 0 1 1 1 3 1 3 2 2 0 5 2 5 6 2 6 7 2 7 8 3 0 3 3 3 5 3 5 4 ; run; data temp(drop=entry exit); set have; value=entry;output; value=exit;output; run; proc sort data=temp nodupkey;by id value;run; data x; set temp; by id; if not first.id and not last.id then output; output; run; data x; set x; if mod(_n_,2)=1 then do;name='entry'; n+1;end; else name='exit'; run; proc transpose data=x out=want(drop=_: n); by id n; id name; var value; run;
Xia Keshan
Thank you very much!
Do you want to keep the original Entry value, because it looks like you dont need it, entry starts at zero and is then the previous exit value.
So you could just ignore it:
data have;;
input id entry exit;
cards;
498 0.00000 0.12320
498 0.12320 0.25462
498 0.25462 1.78234
498 1.78234 3.20602
498 3.20602 3.15127
500 0.00000 2.00684
500 2.00684 2.16838
500 2.16838 2.19849
500 2.19849 3.04175
500 3.04175 2.54346
;
run;
proc sort data=have;
by id exit;
run;
data want;
set have(drop=entry);
by id;
retain entry;
if first.id then
entry = 0;
output;
entry = exit;
run;
I would second @MichaelPearce, and using lag() to make it even less verbose:
data want;
set have (drop=entry);
by id;
Entry=ifn(first.id,0,lag(exit));
run;
Haikuo
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.