Comparing rows

rsva · Posted 11-02-2011 01:01 PM

I have a dataset that has multiple records (ID2) for each person (ID1).

Note: (1) each Id1 has multiple ID2, (2) also the number of ID2 varies within each ID1. For E.g ID1 1x has 6 ID2, ID1 ef3 has 4 ID2.

I want to compare the observations for variables v1-v3 for each id1. E.g. I want to check if value for v1 in row1 is same as value in row2. Again I want to compare only within the same person or ID1 All variables are categorical. Transposing the dataset is not an option for me due to the size of the dataset. Any help would be greatly appreciated.

Thanks in advance.

Header 1	Header 2	Header 3	Header 4	Header 5
id1	id2	v1	v2	v3
1x	23d	r1	r2
1x	ef		r1
1x	sdg5	r2
1x	drt	r2	r1
1x	s5	r1	r2
1x	gert	r2		r3
ef3	sy45	r1
ef3	6eje		r3
ef3	47st	r3		r1
ef3	dy45	r3	r1
d2	srgw5	r1	r2
d2	sr	r2
d2	srt	r3		r2
d2	5r5		r3
d2	45j
d2	eyh	r4
d2	56t4	r5
d2	hr6	r2
d2	dh45	r5

Peter_C · Posted 11-02-2011 01:35 PM

sounds like a job for a DoW loop.

What info is wanted from these compares?

all blank
single non-blank value
all non-blanks are the same
differences present

or

3. # distinct values

hth

peter

rsva · Posted 11-02-2011 02:05 PM

Hi Peter,

I have a variables that I did not add in my original post. For example, assume V1-3 as race1, race2, and race3. I have a overall variable called Race. So (1) if I have different values in race1-3 then I coded as Race=Multiple, (2) if I have only one value in race1 and race2-3 are missing then I code as Race=Single.

So for a person to be of multiple race he can have values in race1-3 in a single id2 or one value in first id2 and anther value in the next id2.

E.g Scenario 1:

id1 id2 v1 v2 v3 race

q d1 r1 r2 Multiple

q d2 r2

Scenario 2:

id1 id2 v1 v2 v3 race

q d1 r1

q d2 r2 Multiple

I already checked the values between v1-3 for each observation (scenario 1). Now I am trying to compare the value between rows for each person (scenario 2).

Hope this helps

MikeZdeb · Posted 11-02-2011 02:19 PM

Hi ... if the question is "are all non-missing values the same for variables V1-V3 within values of ID1",

I think this answers that question.

This will produce a "1" when they are the same and a "0" if different for variables MULT1-MULT3

(variable "o" maintains the ID1 order of the original data set).

proc sql;

create table want (drop=o) as

select id1, monotonic() as o, (min(v1) eq max(v1)) as mult1, (min(v2) eq max(v2)) as mult2, (min(v3) eq max(v3)) as mult3

from have

group id1

order o

;

quit;

In your posted data, no variable values meet the above criteria except for variable V3 where there there is only one non-missing value within each ID1.

id1 mult1 mult2 mult3

1x 0 0 1

ef3 0 0 1

d2 0 0 1

If you want all the observations in the final data set with added variables MULT1-MULT3, change SELECT ID1 to SELECT * .

Tom · Posted 11-02-2011 01:42 PM

How about putting them into groups? Requires sorting so if your data is really large might not be reasonable.

data groups ;

set have;

by id1 v1-v3 ;

group + first.v3;

run;

Ksharp · Posted 11-03-2011 04:28 AM

You can use array to solve it.

Ksharp

Comparing rows

Comparing rows

Re: Comparing rows

Re: Comparing rows

Re: Comparing rows

Re: Comparing rows

Comparing rows

Comparing rows

Re: Comparing rows

Re: Comparing rows

Re: Comparing rows

Re: Comparing rows

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away