DATA Step, Macro, Functions and more

Comparing rows

Reply
Contributor
Posts: 38

Comparing rows

I have a dataset that has multiple records (ID2) for each person (ID1).

Note: (1) each Id1 has multiple ID2, (2) also the number of ID2 varies within each ID1. For E.g ID1 1x has 6 ID2, ID1 ef3 has 4 ID2.

I want to compare the observations for variables v1-v3 for each id1. E.g. I want to check if value for v1 in row1 is same as value in row2.  Again I want to compare only within the same person or ID1 All variables are categorical. Transposing the dataset is not an option for me due to the size of the dataset. Any help would be greatly appreciated.

Thanks in advance.

Header 1Header 2Header 3Header 4Header 5
id1id2v1v2v3

1x

23dr1r2
1xefr1
1xsdg5r2
1xdrtr2r1
1xs5r1r2
1xgertr2r3
ef3sy45r1
ef36ejer3
ef347str3r1
ef3dy45r3r1
d2srgw5r1r2
d2srr2
d2srtr3r2
d25r5r3
d245j
d2eyhr4
d256t4r5
d2hr6r2
d2dh45r5
Valued Guide
Posts: 2,175

Comparing rows

sounds like a job for a DoW loop.

What info is wanted from these compares?

  1. all blank
  2. single non-blank value
  3. all non-blanks are the same
  4. differences present

or

  3.  # distinct values

hth

peter

Contributor
Posts: 38

Re: Comparing rows

Hi Peter,

I have a variables that I did not add in my original post. For example, assume V1-3 as race1, race2, and race3. I have a overall variable called Race. So (1) if I have different values in race1-3 then I coded as Race=Multiple, (2) if I have only one value in race1 and race2-3 are missing then I code as Race=Single.

So for a person to be of multiple race he can have values in race1-3 in a single id2 or one value in first id2 and anther value in the next id2.

E.g Scenario 1:

id1       id2         v1          v2          v3    race

q           d1         r1          r2                 Multiple

q           d2         r2

Scenario 2:

id1       id2        v1           v2           v3     race

q          d1        r1

q          d2        r2                                  Multiple

I already checked the values between v1-3 for each observation (scenario 1). Now I am trying to compare the value between rows for each person (scenario 2).

Hope this helps

Valued Guide
Posts: 765

Re: Comparing rows

Hi ... if the question is "are all non-missing values the same for variables V1-V3 within values of ID1",

I think this answers that question.

This will produce a "1" when they are the same and a "0" if different for variables MULT1-MULT3

(variable "o" maintains the ID1 order of the original data set).

proc sql;

create table want (drop=o) as

select id1, monotonic() as o, (min(v1) eq max(v1)) as mult1, (min(v2) eq max(v2)) as mult2, (min(v3) eq max(v3)) as mult3

from have

group id1

order o

;

quit;

In your posted data, no variable values meet the above criteria except for variable V3 where there there  is only one non-missing value within each ID1.

id1    mult1    mult2    mult3

1x       0        0        1

ef3      0        0        1

d2       0        0        1

If you want all the observations in the final data set with added variables MULT1-MULT3, change SELECT ID1 to SELECT * .

Super User
Super User
Posts: 6,500

Re: Comparing rows

How about putting them into groups?  Requires sorting so if your data is really large might not be reasonable.

data groups ;

   set have;

   by id1 v1-v3 ;

   group + first.v3;

run;

Super User
Posts: 9,681

Re: Comparing rows

You can use array to solve it.

Ksharp

Ask a Question
Discussion stats
  • 5 replies
  • 199 views
  • 0 likes
  • 5 in conversation