Hi, I have a dataset with about 7000 observations. Some are identified as twins, signaled by a 0/1 variable "TWIN", and a family variable ID "FAMILY" indicates whether they are part of the same family (e.g. twins would have the same Family ID). I would like to run a Kappa Statistic showing the concordance between their reports on a childhood variable "CHILD", but am unsure how to get the data set in the appropriate mode for that without manually looking and recognizing everyone with the same family variable as twins. Once I reorganize the data, I can run a Kappa Statistic without any issues.
Any help would be appreciated.
1. Filter only family that has twins as 1
2. Ensure that there are two records minimum for the twins identified (some have only 1???)
3. Take the lower ID as the first ID and maximum as second ID - will not work if you need to scale to triplets or other multiple children.
Will also not work if you have kids in a family with twins and the non-twins have the twin flag = 1.
For the shown use cases it will work, if it doesn't work for you please post your log and the examples of cases where it doesn't work.
proc sql;
create table want as
select familyID, min(childID) as child_twin_ID1, max(childID) as child_twin_ID2
from have
where twins = 1
group by familyID
having count(familyID) >=2;
quit;
@amng wrote:
Data attached as an example. On the lefthand side it an example of the data I have, and the righthand side is what I would like to have. Thanks so much!
For example:
FAMILY ID TWIN CHILD
1 1 4
2 0 5
3 1 3
3 1 4
4 0 2
4 0 5
And the output dataset should just include twins, with reports side by side for each set of twins (individuals that indicate twin=1 and have the same family ID). For example:
CHILD1 CHILD2 FAMILY ID
5 6 1
2 2 2
3 4 3
4 5 4
Sorry, this is not the actual dataset just an example of how it looks currently vs. how it needs to be organized to run my proposed analysis.
Data attached as an example. On the lefthand side it an example of the data I have, and the righthand side is what I would like to have. Thanks so much!
1. Filter only family that has twins as 1
2. Ensure that there are two records minimum for the twins identified (some have only 1???)
3. Take the lower ID as the first ID and maximum as second ID - will not work if you need to scale to triplets or other multiple children.
Will also not work if you have kids in a family with twins and the non-twins have the twin flag = 1.
For the shown use cases it will work, if it doesn't work for you please post your log and the examples of cases where it doesn't work.
proc sql;
create table want as
select familyID, min(childID) as child_twin_ID1, max(childID) as child_twin_ID2
from have
where twins = 1
group by familyID
having count(familyID) >=2;
quit;
@amng wrote:
Data attached as an example. On the lefthand side it an example of the data I have, and the righthand side is what I would like to have. Thanks so much!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.