Solved
New Contributor
Posts: 3

# Comparing groups using a counter variable

The data is basically a list of testing results from thousands of users. Each user can take the test multiple times. The first part of my analysis is to generate the number of people that failed on their first attempt (pretty straightforward...I created a counter variable to enumerate each try and used a data step to limit observations to users who tried at least twice...from there I identified any first time failers where count=1 and pass=0). The other part of the analysis is to generate the number of people (from the original group of people that failed on their first attempt) that eventually passed after failing the first time.

I am having trouble with the code when it comes to sub-setting just the observations that failed on their first try in order to find out how many eventually passed.

My idea is to remove all observations that passed on the first try and all subsequent attempts after the first try (assuming they passed on the first try). Once I have removed those observations, the idea is to enumerate the total number of passes (in the subset of people that failed on their first try) through more if statements and de-duplicating any multiple passes (from multiple attempts from a single user) in a sort statement to find the unique number of passes.

Once I have the unique number of passes (from the group that failed on their first attempt), I think I can subtract that from the number of people that failed on their first attempt in order to find the number of people that never passed.

Example data set ("have"):

Email    Tries    Score   Pass     Count

1         4         78        0            1

1         4         100      1            2

1         4         56        0            3

1         4         100      1            4

2         2          67       0            1

2         2          89       1            2

3         2          78       0            1

3         2          89       1            2

4         4          80       1            1

4         4          78       0            2

4         4          81       1            3

4         4          79       0            4

Code that I attempted:

*creating a counter variable that enumerates each attempt;

proc sort data=have;

by email;

run;

data have1;

set have;

count +1;

by email;

if first.email then count=1;

keep tries pass email count score;

run;

data want;

set have1;

where tries ge 2; *selecting all students who tried at least twice;

if count=1 and pass=0 then firstfail=1; *creating a new variable for anyone that failed on the first try;

else firstfail=0;

if count=1 and pass=1 then delete; *removing observations that passed on the first try; *I know this part is wrong because I need to remove all subsequent observations following the observation that had a pass on the first try;

if 2=<count<=40 and pass=1 then passlater=1; *Again, I know this is bad code because this potentially still includes observations that passed on the first try...I would like to understand how to remove all of the associated counts (i.e. count 2,3,4,....) after the first observation (count=1) that I deleted;

else passlater=0;

run;

proc freq data=want; *this proc will provide the denominator for everyone that failed on their first attempt;

tables firstfail/list;

run;

data want1;

set want;

where passlater=1;

run;

proc sort data=passlater nodupkey;

by studentemail;

run;

proc freq data=passlater;

tables passlater/list;

run;

Thanks for the assistance!

Accepted Solutions
Solution
‎02-07-2014 09:52 PM
Posts: 5,531

## Re: Comparing groups using a counter variable

I don't understant your full requirement but this bit of code might give you a start:

data have;

input Email    Tries    Score   Pass;

datalines;

1         4          78       0

1         4         100       0

1         4          56       0

1         4         100       1

2         2          67       0

2         2          89       0

3         2          78       0

3         2          89       1

4         4          80       1

4         4          78       0

4         4          81       1

4         4          79       0

;

proc sort data=have; by email; run;

data havec;

set have;

by email;

if first.email then count=0;

count+1;

run;

/* Identify Email which fail on count=1 then pass on count>1,

keep count of first pass=1 after fail(s) */

proc sql;

create table failFirstThenPass as

select email, min(count) as passCount

from havec

where email in (select email from havec where count=1 and pass=0)

and pass=1

group by email;

select * from failFirstThenPass;

quit;

PG

PG

All Replies
PROC Star
Posts: 8,164

## Re: Comparing groups using a counter variable

Given your data set "have" what should the resulting file look like (i.e., show the desired result)?

New Contributor
Posts: 3

## Re: Comparing groups using a counter variable

Hi Art,

If the deletions are possible, I would like to see email addresses with a passing first attempt (i.e. email=4 and count=1) removed; however, I would like count=2, count=3, and count=4 removed as well because I am only concerned about email addresses that failed on the first attempt and passed later on (i.e. email=1). To clarify, I would like to only see observations that failed on the first attempt AND passed later on in my final data set. Presumably, this number will be less than or equal to the total number that failed on their first try (and tried at least twice) so I will be able to calculate those that never passed by subtracting the resultant number of observations (those that failed on their first attempt and passed later on) from the total number that failed on their first try after trying at least twice.

Below is the resultant (final) dataset I am hoping to obtain (all "email=4" rows have been removed since email=4 passed on the first attempt ). I would like to see a single indicator for Passlater (1=yes) calculated based on any pass after count 1. So for example, Email=1 has two passes after the fail on count 1 (a pass on count 2 and a pass on count 4) but I would like that to count as a "single" pass since they eventually passed on a later attempt (regardless of which count the pass occurred on).

Email    Tries    Score   Pass     Count     Failfirst     Passlater

1         4         78        0            1            1                1

1         4         100      1            2

1         4         56        0            3

1         4         100      1            4

2         2          67       0            1            1                1

2         2          89       1            2

3         2          78       0            1            1                1

3         2          89       1            2

4         4          80       1            1

4         4          78       0            2

4         4          81       1            3

4         4          79       0            4

Thanks!

PROC Star
Posts: 8,164

## Re: Comparing groups using a counter variable

If that is all you need then how about?:

data want (drop=keepme);

set have;

by email;

retain keepme;

if first.email then do;

if pass eq 1 and count eq 1 then keepme=0;

else keepme=1;

end;

if keepme then output;

run;

Solution
‎02-07-2014 09:52 PM
Posts: 5,531

## Re: Comparing groups using a counter variable

I don't understant your full requirement but this bit of code might give you a start:

data have;

input Email    Tries    Score   Pass;

datalines;

1         4          78       0

1         4         100       0

1         4          56       0

1         4         100       1

2         2          67       0

2         2          89       0

3         2          78       0

3         2          89       1

4         4          80       1

4         4          78       0

4         4          81       1

4         4          79       0

;

proc sort data=have; by email; run;

data havec;

set have;

by email;

if first.email then count=0;

count+1;

run;

/* Identify Email which fail on count=1 then pass on count>1,

keep count of first pass=1 after fail(s) */

proc sql;

create table failFirstThenPass as

select email, min(count) as passCount

from havec

where email in (select email from havec where count=1 and pass=0)

and pass=1

group by email;

select * from failFirstThenPass;

quit;

PG

PG
New Contributor
Posts: 3

## Re: Comparing groups using a counter variable

This worked! SQL is magic! Thanks PG + Art.

🔒 This topic is solved and locked.