11-02-2011 01:01 PM
I have a dataset that has multiple records (ID2) for each person (ID1).
Note: (1) each Id1 has multiple ID2, (2) also the number of ID2 varies within each ID1. For E.g ID1 1x has 6 ID2, ID1 ef3 has 4 ID2.
I want to compare the observations for variables v1-v3 for each id1. E.g. I want to check if value for v1 in row1 is same as value in row2. Again I want to compare only within the same person or ID1 All variables are categorical. Transposing the dataset is not an option for me due to the size of the dataset. Any help would be greatly appreciated.
Thanks in advance.
|Header 1||Header 2||Header 3||Header 4||Header 5|
11-02-2011 01:35 PM
sounds like a job for a DoW loop.
What info is wanted from these compares?
3. # distinct values
11-02-2011 02:05 PM
I have a variables that I did not add in my original post. For example, assume V1-3 as race1, race2, and race3. I have a overall variable called Race. So (1) if I have different values in race1-3 then I coded as Race=Multiple, (2) if I have only one value in race1 and race2-3 are missing then I code as Race=Single.
So for a person to be of multiple race he can have values in race1-3 in a single id2 or one value in first id2 and anther value in the next id2.
E.g Scenario 1:
id1 id2 v1 v2 v3 race
q d1 r1 r2 Multiple
q d2 r2
id1 id2 v1 v2 v3 race
q d1 r1
q d2 r2 Multiple
I already checked the values between v1-3 for each observation (scenario 1). Now I am trying to compare the value between rows for each person (scenario 2).
Hope this helps
11-02-2011 02:19 PM
Hi ... if the question is "are all non-missing values the same for variables V1-V3 within values of ID1",
I think this answers that question.
This will produce a "1" when they are the same and a "0" if different for variables MULT1-MULT3
(variable "o" maintains the ID1 order of the original data set).
create table want (drop=o) as
select id1, monotonic() as o, (min(v1) eq max(v1)) as mult1, (min(v2) eq max(v2)) as mult2, (min(v3) eq max(v3)) as mult3
In your posted data, no variable values meet the above criteria except for variable V3 where there there is only one non-missing value within each ID1.
id1 mult1 mult2 mult3
1x 0 0 1
ef3 0 0 1
d2 0 0 1
If you want all the observations in the final data set with added variables MULT1-MULT3, change SELECT ID1 to SELECT * .
11-02-2011 01:42 PM
How about putting them into groups? Requires sorting so if your data is really large might not be reasonable.
data groups ;
by id1 v1-v3 ;
group + first.v3;