BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Solsidan
Fluorite | Level 6

Hi,

In this toy court data (have) I would like to remove all observations where the suspect (suspect = 1) has been charged (charged = 1), further, I also want to remove their respective witnesses (witnes_for_suspect), i.e. remove all suspects and their witnesses from the records and be left with observations where the charges haven't been made (charged = 0).

 

The id column is unique and applies to everyone in the data. The column suspect shows who is a suspect (suspect = 1) and who is a witness (suspect = 0). The witnes_for_suspect column is not unique, it borrows information from the id column, and is useful to show to which suspect (their id ) a given witness belongs. e.g. the suspect (id = 1) has two witnesses (ids = 2 and 3) and has been charged (charged = 1) hence should be removed.

 

 

 

 

 

data have;
input id witnes_for_suspect suspect charged;
cards;
1     .  1  1
2     1  0  0
3     1  0  0
4     .  1  1
5     4  0  0
6     4  0  0
7     4  0  0
8     .  1  0
9     8  0  0
10     8  0  0
11     8  0  0
12     8  0  0
;
run;

 

 

 

The output data (want) should be:

 

data want;
input id witnes_to_suspect suspect charged;
cards;
8     .  1  0
9     8  0  0
10     8  0  0
11     8  0  0
12     8  0  0
;
run;

 

Thanks for any help!

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

One way

 

data want(drop=rc);
   if _N_ = 1 then do;
      declare hash h();
      h.defineKey('id');
      h.defineDone();
   end;

   set have end=lr;

   if suspect=1 & charged=1 then rc=h.add();
   if min(h.check(), h.check(key:witnes_for_suspect)) gt 0;
run;

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

One way

 

data want(drop=rc);
   if _N_ = 1 then do;
      declare hash h();
      h.defineKey('id');
      h.defineDone();
   end;

   set have end=lr;

   if suspect=1 & charged=1 then rc=h.add();
   if min(h.check(), h.check(key:witnes_for_suspect)) gt 0;
run;
Solsidan
Fluorite | Level 6

@PeterClemmensen  Thank you for the attempt and maybe I misunderstand your code or I didn't clarify what i need.

 

However, it seems the code provided does not yield the expected result.

 

The suspects with id = 1 and id = 4 have been charged (charged = 1), and therefore they, together with their witnesses (ids (2,3) and ids(5,6,7 respectively) should not be in the dataset want. The dataset want should only have the observations created in the second code I posted.

PeterClemmensen
Tourmaline | Level 20

@Solsidan, my code creates exactly the result you want 🙂

 

The below code

 

data have;
input id witnes_for_suspect suspect charged;
cards;
1     .  1  1
2     1  0  0
3     1  0  0
4     .  1  1
5     4  0  0
6     4  0  0
7     4  0  0
8     .  1  0
9     8  0  0
10     8  0  0
11     8  0  0
12     8  0  0
;
run;

data want(drop=rc);
   if _N_ = 1 then do;
      declare hash h();
      h.defineKey('id');
      h.defineDone();
   end;

   set have end=lr;

   if suspect=1 & charged=1 then rc=h.add();
   if min(h.check(), h.check(key:witnes_for_suspect)) gt 0;
run;

 

Creates the data set below. If this is not correct, please explain how the data below differs from your posted 'want' data set 🙂

 

 

Capture.PNG

Solsidan
Fluorite | Level 6

Thank you @PeterClemmensen - you were absolutely right! Sorry Man Sad

 

I figured out where the bug was:

 

In creating the toy dataset, I changed the variable name (from "witnes_to_suspect" to "witnes_for_suspect") on the "Insert SAS code" window so that it reads better, and that alas was the bug. So when I ran your code with my data which has the variable name "witnes_to_suspect" it gave wrong results. Very dumb of me 🙂

 

Thank you so much for your time and code!

 

 

data have;
input id witnes_to_suspect suspect charged;
cards;
1     .  1  1
2     1  0  0
3     1  0  0
4     .  1  1
5     4  0  0
6     4  0  0
7     4  0  0
8     .  1  0
9     8  0  0
10     8  0  0
11     8  0  0
12     8  0  0
;
run;
PeterClemmensen
Tourmaline | Level 20

Ah ok. Makes sense. No problem, glad you found your answer 🙂

Solsidan
Fluorite | Level 6

@PeterClemmensenSorry to drag you back to this, but perhaps you can help me figure out what am doing wrong in applying your code. I am afraid I have now tangled myself as I tried to apply your kindly provided code to several datasets (with tens of thousands of observations) that share similar structure to the toy dataset - but it does not seem to work as it did on the toy dataset. Of course it works perfectly fine on the toy dataset, no question, but I think it must be that I really don't understand what your code is doing for me to tweak it to work on other datasets.

 

When I apply it to a new sample data, the code only removes the suspects (cases) who have been charged (transp) but not their witnesses (lopnr_fall). id of course is lopnr.

 

I attach the new dataset and adapted code for you to try. The expected result is only observations with lopnr/lopnr_fall = 52.

 

Thank you!

 

 

data confused (drop=rc);
   if _N_ = 1 then do;
      declare hash h();
      h.defineKey('lopnr');
      h.defineDone();
   end;

   set micro2 end=lr;

   if case=1 & transp=1 then rc=h.add();
   if min(h.check(), h.check(key:lopnr_fall)) gt 0;
run;
PeterClemmensen
Tourmaline | Level 20

@Solsidan no problem 🙂 In the initial solution, I assume that the data is sorted. This is not the case in your actual data. I should have handled that from the beginning. 

 

Anyway, you can do this instead. It is simpler 🙂

 

data confused;
   if _N_ = 1 then do;
      declare hash h(dataset:'Micro2(where=(case=1 & transp=1))');
      h.defineKey('lopnr');
      h.defineDone();
   end;

   set micro2;

   if min(h.check(), h.check(key:lopnr_fall)) gt 0;
run;

Which gives

 

 

Capture.PNG

Solsidan
Fluorite | Level 6

@PeterClemmensenMany thanks, this works!

 

Of all the things I tried to and planeed to troubleshoot with, sorting was/would not have been one of them. Amazing how such a humble step has great implications.

 

Could you kindly walk me though your code, so I and perhaps others (like me) in future will understand it.

 

Thanks again for all your time!

PeterClemmensen
Tourmaline | Level 20

Yes, no problem 🙂

 

In step 1, I create a hash object and fill it with the lopnr values for which case=1 and transp=1.

 

In step 2, I simply read the Micro2 data set one obs at the time.

 

In Step three, i use the Hash Object Check Method twice. Once, to search for the current lopnr value in the hash object and once to search for the current lopnr_fall value. If one of those searches are succesful, the method returns a positive non-zero value. Thus, I use the Min Function to check if they both fail the search. And if they do both fail the search, I output the observation. 

 

data confused;
   /* 1 */
   if _N_ = 1 then do;
      declare hash h(dataset:'Micro2(where=(case=1 & transp=1))');
      h.defineKey('lopnr');
      h.defineDone();
   end;
   /* 2 */
   set micro2;
   /* 3 */
   if min(h.check(), h.check(key:lopnr_fall)) gt 0;
run;
Solsidan
Fluorite | Level 6
Clearly explained, thank you!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1602 views
  • 4 likes
  • 2 in conversation