I have a dataset that has multiple records (ID2) for each person (ID1).
Note: (1) each Id1 has multiple ID2, (2) also the number of ID2 varies within each ID1. For E.g ID1 1x has 6 ID2, ID1 ef3 has 4 ID2.
I want to compare the observations for variables v1-v3 for each id1. E.g. I want to check if value for v1 in row1 is same as value in row2. Again I want to compare only within the same person or ID1 All variables are categorical. Transposing the dataset is not an option for me due to the size of the dataset. Any help would be greatly appreciated.
Thanks in advance.
Header 1 | Header 2 | Header 3 | Header 4 | Header 5 |
---|---|---|---|---|
id1 | id2 | v1 | v2 | v3 |
1x | 23d | r1 | r2 | |
1x | ef | r1 | ||
1x | sdg5 | r2 | ||
1x | drt | r2 | r1 | |
1x | s5 | r1 | r2 | |
1x | gert | r2 | r3 | |
ef3 | sy45 | r1 | ||
ef3 | 6eje | r3 | ||
ef3 | 47st | r3 | r1 | |
ef3 | dy45 | r3 | r1 | |
d2 | srgw5 | r1 | r2 | |
d2 | sr | r2 | ||
d2 | srt | r3 | r2 | |
d2 | 5r5 | r3 | ||
d2 | 45j | |||
d2 | eyh | r4 | ||
d2 | 56t4 | r5 | ||
d2 | hr6 | r2 | ||
d2 | dh45 | r5 |
sounds like a job for a DoW loop.
What info is wanted from these compares?
or
3. # distinct values
hth
peter
Hi Peter,
I have a variables that I did not add in my original post. For example, assume V1-3 as race1, race2, and race3. I have a overall variable called Race. So (1) if I have different values in race1-3 then I coded as Race=Multiple, (2) if I have only one value in race1 and race2-3 are missing then I code as Race=Single.
So for a person to be of multiple race he can have values in race1-3 in a single id2 or one value in first id2 and anther value in the next id2.
E.g Scenario 1:
id1 id2 v1 v2 v3 race
q d1 r1 r2 Multiple
q d2 r2
Scenario 2:
id1 id2 v1 v2 v3 race
q d1 r1
q d2 r2 Multiple
I already checked the values between v1-3 for each observation (scenario 1). Now I am trying to compare the value between rows for each person (scenario 2).
Hope this helps
Hi ... if the question is "are all non-missing values the same for variables V1-V3 within values of ID1",
I think this answers that question.
This will produce a "1" when they are the same and a "0" if different for variables MULT1-MULT3
(variable "o" maintains the ID1 order of the original data set).
proc sql;
create table want (drop=o) as
select id1, monotonic() as o, (min(v1) eq max(v1)) as mult1, (min(v2) eq max(v2)) as mult2, (min(v3) eq max(v3)) as mult3
from have
group id1
order o
;
quit;
In your posted data, no variable values meet the above criteria except for variable V3 where there there is only one non-missing value within each ID1.
id1 mult1 mult2 mult3
1x 0 0 1
ef3 0 0 1
d2 0 0 1
If you want all the observations in the final data set with added variables MULT1-MULT3, change SELECT ID1 to SELECT * .
How about putting them into groups? Requires sorting so if your data is really large might not be reasonable.
data groups ;
set have;
by id1 v1-v3 ;
group + first.v3;
run;
You can use array to solve it.
Ksharp
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.