Hi. I have a data set that contains some variables that I need to identify if it's the first and only occurrence, or first of multiple, and create new fields in a new data set that can be reviewed further. The data sets have thousands of lines.
Here's an example of what I am starting with:
id
line
exclude_ind
reimb_amt
12345
01
00
20
12345
02
00
0
54321
01
01
10
54321
02
01
10
54321
03
00
5
34567
01
05
0
34567
02
05
15
34567
03
09
20
34567
04
03
10
From the above data set, I want a field named id_count, which identifies unique id's. Each unique id gets a value = 01.
I want to create a field called exclude_ind_count, which is made of a value = 01 for each unique value in the exclude_ind field, within the unique id field.
So, the resulting new data set looks like this:
id
id_count
line
exclude_ind
exclude_ind_count
reimb_amt
12345
01
01
00
01
20
12345
00
02
00
00
0
54321
01
01
01
01
10
54321
00
02
01
00
10
54321
00
03
00
01
5
34567
01
01
05
01
0
34567
00
02
05
00
15
34567
00
03
09
01
20
34567
00
04
03
01
10
So, each id_count field should, when summed, = 1 for each unique id. Basically I want a count of how many unique id's are in my data set.
And, as shown, within each unique id, each unique instance of exclude_ind has a resulting exclude_ind_count = 01. So, when summed, I can get an idea of how many id's have either 1 exclude_ind, or multiple exclude_ind.
... View more