Solved: Re: Reorganizing Data - Matching, Kappa Statistic

amng · Posted 11-09-2020 11:36 AM

Hi, I have a dataset with about 7000 observations. Some are identified as twins, signaled by a 0/1 variable "TWIN", and a family variable ID "FAMILY" indicates whether they are part of the same family (e.g. twins would have the same Family ID). I would like to run a Kappa Statistic showing the concordance between their reports on a childhood variable "CHILD", but am unsure how to get the data set in the appropriate mode for that without manually looking and recognizing everyone with the same family variable as twins. Once I reorganize the data, I can run a Kappa Statistic without any issues.

Any help would be appreciated.

Reeza · Posted 11-09-2020 04:07 PM

1. Filter only family that has twins as 1

2. Ensure that there are two records minimum for the twins identified (some have only 1???)

3. Take the lower ID as the first ID and maximum as second ID - will not work if you need to scale to triplets or other multiple children.

Will also not work if you have kids in a family with twins and the non-twins have the twin flag = 1.

For the shown use cases it will work, if it doesn't work for you please post your log and the examples of cases where it doesn't work.

proc sql;
create table want as
select familyID, min(childID) as child_twin_ID1, max(childID) as child_twin_ID2
from have
where twins = 1
group by familyID
having count(familyID) >=2;
quit;

@amng wrote:

Data attached as an example. On the lefthand side it an example of the data I have, and the righthand side is what I would like to have. Thanks so much!

View solution in original post

Reeza · Posted 11-09-2020 11:43 AM

You need to show your data set if you need help reformatting it.

amng · Posted 11-09-2020 11:48 AM

For example:

FAMILY ID TWIN CHILD

1 1 4

2 0 5

3 1 3

3 1 4

4 0 2

4 0 5

Reeza · Posted 11-09-2020 01:37 PM

And the expected output (exactly please) if this is the input.

amng · Posted 11-09-2020 01:54 PM

And the output dataset should just include twins, with reports side by side for each set of twins (individuals that indicate twin=1 and have the same family ID). For example:

CHILD1 CHILD2 FAMILY ID

5 6 1

2 2 2

3 4 3

4 5 4

Reeza · Posted 11-09-2020 01:58 PM

I don't see how the data you provided maps to the data shown? Where does the Child1/2 come from, is that the Child record from the initial table?If so, where did the 6 come from?

amng · Posted 11-09-2020 02:02 PM

Sorry, this is not the actual dataset just an example of how it looks currently vs. how it needs to be organized to run my proposed analysis.

amng · Posted 11-09-2020 03:38 PM

Data attached as an example. On the lefthand side it an example of the data I have, and the righthand side is what I would like to have. Thanks so much!

Reeza · Posted 11-09-2020 04:07 PM

1. Filter only family that has twins as 1

2. Ensure that there are two records minimum for the twins identified (some have only 1???)

3. Take the lower ID as the first ID and maximum as second ID - will not work if you need to scale to triplets or other multiple children.

Will also not work if you have kids in a family with twins and the non-twins have the twin flag = 1.

For the shown use cases it will work, if it doesn't work for you please post your log and the examples of cases where it doesn't work.

proc sql;
create table want as
select familyID, min(childID) as child_twin_ID1, max(childID) as child_twin_ID2
from have
where twins = 1
group by familyID
having count(familyID) >=2;
quit;

@amng wrote:

Data attached as an example. On the lefthand side it an example of the data I have, and the righthand side is what I would like to have. Thanks so much!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away