Hi,
In this toy court data (have) I would like to remove all observations where the suspect (suspect
= 1) has been charged (charged
= 1), further, I also want to remove their respective witnesses (witnes_for_suspect), i.e. remove all suspects and their witnesses from the records and be left with observations where the charges haven't been made (charged
= 0).
The id
column is unique and applies to everyone in the data. The column suspect
shows who is a suspect (suspect = 1) and who is a witness (suspect = 0). The witnes_for_suspect
column is not unique, it borrows information from the id
column, and is useful to show to which suspect (their id
) a given witness belongs. e.g. the suspect (id = 1) has two witnesses (ids = 2 and 3) and has been charged (charged = 1) hence should be removed.
data have;
input id witnes_for_suspect suspect charged;
cards;
1 . 1 1
2 1 0 0
3 1 0 0
4 . 1 1
5 4 0 0
6 4 0 0
7 4 0 0
8 . 1 0
9 8 0 0
10 8 0 0
11 8 0 0
12 8 0 0
;
run;
The output data (want) should be:
data want;
input id witnes_to_suspect suspect charged;
cards;
8 . 1 0
9 8 0 0
10 8 0 0
11 8 0 0
12 8 0 0
;
run;
Thanks for any help!
One way
data want(drop=rc);
if _N_ = 1 then do;
declare hash h();
h.defineKey('id');
h.defineDone();
end;
set have end=lr;
if suspect=1 & charged=1 then rc=h.add();
if min(h.check(), h.check(key:witnes_for_suspect)) gt 0;
run;
One way
data want(drop=rc);
if _N_ = 1 then do;
declare hash h();
h.defineKey('id');
h.defineDone();
end;
set have end=lr;
if suspect=1 & charged=1 then rc=h.add();
if min(h.check(), h.check(key:witnes_for_suspect)) gt 0;
run;
@PeterClemmensen Thank you for the attempt and maybe I misunderstand your code or I didn't clarify what i need.
However, it seems the code provided does not yield the expected result.
The suspects with id = 1 and id = 4 have been charged (charged = 1), and therefore they, together with their witnesses (ids (2,3) and ids(5,6,7 respectively) should not be in the dataset want. The dataset want should only have the observations created in the second code I posted.
@Solsidan, my code creates exactly the result you want 🙂
The below code
data have;
input id witnes_for_suspect suspect charged;
cards;
1 . 1 1
2 1 0 0
3 1 0 0
4 . 1 1
5 4 0 0
6 4 0 0
7 4 0 0
8 . 1 0
9 8 0 0
10 8 0 0
11 8 0 0
12 8 0 0
;
run;
data want(drop=rc);
if _N_ = 1 then do;
declare hash h();
h.defineKey('id');
h.defineDone();
end;
set have end=lr;
if suspect=1 & charged=1 then rc=h.add();
if min(h.check(), h.check(key:witnes_for_suspect)) gt 0;
run;
Creates the data set below. If this is not correct, please explain how the data below differs from your posted 'want' data set 🙂
Thank you @PeterClemmensen - you were absolutely right! Sorry
I figured out where the bug was:
In creating the toy dataset, I changed the variable name (from "witnes_to_suspect
" to "witnes_for_suspect
") on the "Insert SAS code" window so that it reads better, and that alas was the bug. So when I ran your code with my data which has the variable name "witnes_to_suspect
" it gave wrong results. Very dumb of me 🙂
Thank you so much for your time and code!
data have;
input id witnes_to_suspect suspect charged;
cards;
1 . 1 1
2 1 0 0
3 1 0 0
4 . 1 1
5 4 0 0
6 4 0 0
7 4 0 0
8 . 1 0
9 8 0 0
10 8 0 0
11 8 0 0
12 8 0 0
;
run;
Ah ok. Makes sense. No problem, glad you found your answer 🙂
@PeterClemmensenSorry to drag you back to this, but perhaps you can help me figure out what am doing wrong in applying your code. I am afraid I have now tangled myself as I tried to apply your kindly provided code to several datasets (with tens of thousands of observations) that share similar structure to the toy dataset - but it does not seem to work as it did on the toy dataset. Of course it works perfectly fine on the toy dataset, no question, but I think it must be that I really don't understand what your code is doing for me to tweak it to work on other datasets.
When I apply it to a new sample data, the code only removes the suspects (cases) who have been charged (transp) but not their witnesses (lopnr_fall). id of course is lopnr.
I attach the new dataset and adapted code for you to try. The expected result is only observations with lopnr/lopnr_fall = 52.
Thank you!
data confused (drop=rc);
if _N_ = 1 then do;
declare hash h();
h.defineKey('lopnr');
h.defineDone();
end;
set micro2 end=lr;
if case=1 & transp=1 then rc=h.add();
if min(h.check(), h.check(key:lopnr_fall)) gt 0;
run;
@Solsidan no problem 🙂 In the initial solution, I assume that the data is sorted. This is not the case in your actual data. I should have handled that from the beginning.
Anyway, you can do this instead. It is simpler 🙂
data confused;
if _N_ = 1 then do;
declare hash h(dataset:'Micro2(where=(case=1 & transp=1))');
h.defineKey('lopnr');
h.defineDone();
end;
set micro2;
if min(h.check(), h.check(key:lopnr_fall)) gt 0;
run;
Which gives
@PeterClemmensenMany thanks, this works!
Of all the things I tried to and planeed to troubleshoot with, sorting was/would not have been one of them. Amazing how such a humble step has great implications.
Could you kindly walk me though your code, so I and perhaps others (like me) in future will understand it.
Thanks again for all your time!
Yes, no problem 🙂
In step 1, I create a hash object and fill it with the lopnr values for which case=1 and transp=1.
In step 2, I simply read the Micro2 data set one obs at the time.
In Step three, i use the Hash Object Check Method twice. Once, to search for the current lopnr value in the hash object and once to search for the current lopnr_fall value. If one of those searches are succesful, the method returns a positive non-zero value. Thus, I use the Min Function to check if they both fail the search. And if they do both fail the search, I output the observation.
data confused;
/* 1 */
if _N_ = 1 then do;
declare hash h(dataset:'Micro2(where=(case=1 & transp=1))');
h.defineKey('lopnr');
h.defineDone();
end;
/* 2 */
set micro2;
/* 3 */
if min(h.check(), h.check(key:lopnr_fall)) gt 0;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.