Solved: Issue with multiple rows of same participants

sandrube · Posted 04-20-2020 04:07 PM

I'm looking how I can keep only those with var1=1 and var2=1. If one participants has var1=1 and var2=1, I was to keep all rows of the participants.

This is the initial dataset

data new;
infile cards missover;
input id var1 var2;
cards;
11 1 0
11 1 0
12 1 1
12 1 1
12 0 1
13 1 1
14 0 1
15 1 1
16 1 1
17 0 1
17 0 1
17 1 1
17 0 0
run;

The final dataset should be like this:

data new1;
infile cards missover;
input id var1 var2;
cards;
12 1 1
12 1 1
12 0 1
13 1 1
15 1 1
16 1 1
17 0 1
17 0 1
17 1 1
17 0 0
run;

Thank you!

mkeintz · Posted 04-20-2020 04:24 PM

This is a good example of applying a self-merge of a subset of a dataset with the entire dataset, as in:

data new;
infile cards missover;
input id var1 var2;
cards;
11 1 0
11 1 0
12 1 1
12 1 1
12 0 1
13 1 1
14 0 1
15 1 1
16 1 1
17 0 1
17 0 1
17 1 1
17 0 0
run;

data new1;
  merge new (where=(var1=1 and var2=1)  in=inkeep)
        new;
  by id;
  if inkeep;
run;

Why does this work:

The MERGE with a BY statement tells SAS to match records (based on ID) satisfying the first argument of merge with the second argument.
The "IN=" parameter sets the dummy variable INKEEP to 1 if the match-merge has any observation satisfying "var1=1 and var2=1".
Now you might be worried about a "collision" of data values. For example the first obs matching var1=1 and var2=1 might not be the first obs overall for a given ID. But MERGE works such that data values in the 2nd argument superseded the data values in the first argument for variable having the same name.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 04-20-2020 04:24 PM

This is a good example of applying a self-merge of a subset of a dataset with the entire dataset, as in:

data new;
infile cards missover;
input id var1 var2;
cards;
11 1 0
11 1 0
12 1 1
12 1 1
12 0 1
13 1 1
14 0 1
15 1 1
16 1 1
17 0 1
17 0 1
17 1 1
17 0 0
run;

data new1;
  merge new (where=(var1=1 and var2=1)  in=inkeep)
        new;
  by id;
  if inkeep;
run;

Why does this work:

The MERGE with a BY statement tells SAS to match records (based on ID) satisfying the first argument of merge with the second argument.
The "IN=" parameter sets the dummy variable INKEEP to 1 if the match-merge has any observation satisfying "var1=1 and var2=1".
Now you might be worried about a "collision" of data values. For example the first obs matching var1=1 and var2=1 might not be the first obs overall for a given ID. But MERGE works such that data values in the 2nd argument superseded the data values in the first argument for variable having the same name.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sandrube · Posted 04-20-2020 04:28 PM

Thank you very much for the solution!!!

Issue with multiple rows of same participants

Re: Issue with multiple rows of same participants

Re: Issue with multiple rows of same participants

Re: Issue with multiple rows of same participants

Issue with multiple rows of same participants

Re: Issue with multiple rows of same participants

Re: Issue with multiple rows of same participants

Re: Issue with multiple rows of same participants

Registration is open