Hi,
I would like to perform the following task.
When relationship changes across enrollment periods, I would like to drop all enrollment periods where relationship to head of household is not self. My data is in the long format.
Have:
Client ID Enrollment ID Relationship
1 20 Self (head of household)
1 21 Head of household's Child
1 22 Head of household's Spouse
2 30 Head of household's Child
2 31 Self (head of household)
2 32 Head of household's Spouse
Want:
Client ID Enrollment ID Relationship
1 20 Self (head of household)
2 31 Self (head of household)
Your help would be truly appreciated!
Thank you in advance!
-Harry
Hi @novinosrin,
For some reasons, the suggested code did not work for the original dataset.
I used the following code and achieved what I wanted to. Thank you for your help!
data want;
set have;
if RelationToHoH='Self (head of household)' then varseq = 0;
else varseq = 1;
proc sort; by household_id client_Uid varseq; run;
run;
data want1;
set want;
by household_id client_Uid;
length relvalue $79.;
retain relvalue;
if first.client_uid then relvalue=RelationToHoH;
if RelationToHoH eq relvalue then flag =1;
if flag=. then delete;
run;
data want;
set have;
if relationship=:'Self';
run;
Hi Paige Miller,
Thank you for your prompt response. I should have been more clear on my question. Your suggested code only selects "self" and removes all others. However, I want to retain IDs ( no matter what their relationship is) where we have only one enrollment period. Thank you!
Have:
Client ID Enrollment ID Relationship
1 20 Self (head of household)
1 21 Head of household's Child
1 22 Head of household's Spouse
2 30 Head of household's Child
2 31 Self (head of household)
2 32 Head of household's Spouse
3 10 Head of household's Child
Want:
Client ID Enrollment ID Relationship
1 20 Self (head of household)
2 31 Self (head of household)
3 10 Head of household's Child
data have;
infile cards truncover;
input ClientID EnrollmentID Relationship $100.;
cards;
1 20 Self (head of household)
1 21 Head of household's Child
1 22 Head of household's Spouse
2 30 Head of household's Child
2 31 Self (head of household)
2 32 Head of household's Spouse
3 10 Head of household's Child
;
data want;
set have;
by clientid;
if first.clientid and last.clientid then output;
else if relationship=:'Self' then output;
run;
However, I want to retain IDs ( no matter what their relationship is) where we have only one enrollment period.
Hi,
Thank you so much for your response. It works perfectly! I have a minor modification on the data, could you please help me how to achieve the following? In addition to what you have suggested, I want to keep all enrollments if an household has at least one self head of household member. For example, the household ID 75, I want to keep all enrollments for client 5,6, and 9. Thank you in advance!
Have:
Household ID Client ID Enrollment ID Relationship
55 1 20 Self (head of household)
55 1 21 Head of household's Child
55 1 22 Head of household's Spouse
60 2 30 Head of household's Child
60 2 31 Self (head of household)
60 2 32 Head of household's Spouse
. 3 10 Head of household's Child
75 5 101 Self (head of household)
75 5 102 Self (head of household)
75 6 103 Head of household's Child
75 9 105 Head of household's spouse
80 20 110
80 21 111
Want:
Household ID Client ID Enrollment ID Relationship
55 1 20 Self (head of household)
60 2 31 Self (head of household)
. 3 10 Head of household's Child
75 5 101 Self (head of household)
75 5 102 Self (head of household)
75 6 103 Head of household's Child
75 9 105 Head of household's Spouse
80 20 110
80 21 111
HI @HarryB HarryB Assuming I understand your modified requirement. Please test and let us know. Thanks!
data have;
infile cards truncover;
input HouseholdID ClientID EnrollmentID Relationship $100.;
cards;
55 1 20 Self (head of household)
55 1 21 Head of household's Child
55 1 22 Head of household's Spouse
60 2 30 Head of household's Child
60 2 31 Self (head of household)
60 2 32 Head of household's Spouse
. 3 10 Head of household's Child
75 5 101 Self (head of household)
75 5 102 Self (head of household)
75 6 103 Head of household's Child
75 9 105 Head of household's spouse
80 20 110 .
80 21 111 .
;
data want;
do until(last.HouseholdID);
set have;
by HouseholdID clientid notsorted;
if first.clientid then _n=sum(_n,1);
if relationship=:'Self' then _f=1;
end;
do until(last.HouseholdID);
set have;
by HouseholdID clientid notsorted;
if _n=1 and _f=1 and relationship=:'Self' then output;
else if not(_n=1 and _f=1) then output;;
end;
drop _:;
run;
Hi @HarryB I think the following is perhaps closer,
data have;
infile cards truncover;
input HouseholdID ClientID EnrollmentID Relationship $100.;
cards;
55 1 20 Self (head of household)
55 1 21 Head of household's Child
55 1 22 Head of household's Spouse
60 2 30 Head of household's Child
60 2 31 Self (head of household)
60 2 32 Head of household's Spouse
. 3 10 Head of household's Child
75 5 101 Self (head of household)
75 5 102 Self (head of household)
75 6 103 Head of household's Child
75 9 105 Head of household's spouse
80 20 110 .
80 21 111 .
;
data want;
do until(last.HouseholdID);
set have;
by HouseholdID clientid notsorted;
if first.clientid then _n=sum(_n,1);
if relationship=:'Self' then _f=1;
end;
do until(last.HouseholdID);
set have;
by HouseholdID clientid notsorted;
if first.clientid and last.clientid then output;
else if _n=1 and _f=1 and relationship=:'Self' then output;
else if _n>1 and _f=1 then output;
end;
drop _:;
run;
Hi novinosrin ,
For some reasons, it works perfectly on the example I posted, but does not work on my dataset. It does not give me an error but it does not make any changes to the previous dataset. I am surprised! Any ideas/help?
Just to make it clear, the enrollment ID in the dataset is not in order, it is a unique ID per enrollment, where one row represent one enrollment. The relationship: "self" could be in any enrollment (first, middle or last or anywhere). Please let me know if this makes different. Thanks!
Can you please post a comprehensive sample that is a good representative of your real?
" The relationship: "self" could be in any enrollment (first, middle or last or anywhere)."
i think somewhere there's a misunderstanding
I am sorry for the confusion.
Where the relationship changes across enrollment periods of the same client, I want to drop enrollment periods where the relationship is not self. (If an individual reports self head of household at a one-time point, the relationship cannot change since the client has the same household ID, meaning from the same family).
In some cases, one individual enrolls into the program multiple times alone (missing household ID), eg client ID 85; If this client reports the self head of household at one enrollment period but any other relationship in other enrollment periods, I would like to drop other enrollment periods where the relationship is not self.
Thank you!!
Have:
Household ID Client ID Enrollment ID Relationship
55 1 20 Self (head of household)
55 1 21 Head of household's Child
55 1 22 Head of household's Spouse
60 2 30 Head of household's Child
60 2 31 Self (head of household)
60 2 32 Head of household's Spouse
. 3 10 Head of household's Child
75 5 101 Self (head of household)
75 5 102 Self (head of household)
75 6 103 Head of household's Child
75 9 105 Head of household's Spouse
80 20 110
80 21 111
34 77 20 Self (head of household)
34 77 21 Head of household's Child
34 77 22 Self (head of household)
. 85 555 Self (head of household)
. 85 556 Head of household's Child
. 85 226 Self (head of household)
1 2 232 Head of household's Child
Want:
Household ID Client ID Enrollment ID Relationship
55 1 20 Self (head of household)
60 2 31 Self (head of household)
. 3 10 Head of household's Child
75 5 101 Self (head of household)
75 5 102 Self (head of household)
75 6 103 Head of household's Child
75 9 105 Head of household's Spouse
80 20 110
80 21 111
34 77 20 Self (head of household)
34 77 22 Self (head of household)
. 85 555 Self (head of household)
. 85 226 Self (head of household)
1 2 232 Head of household's Child
@HarryB The same code is producing the new WANT for the new HAVE as I ran a test with the 2 new samples. I'm about sleep and I'll explain the logic at length tomorrow morning and you should be able to pick where my thinking/understanding went wrong. We shall proceed from thereafter.
We just have to work interactively, nonetheless I hope somebody else might have given you the much needed solution while I'm sleeping. Good night!
This was the test:
data have;
infile cards truncover;
input HouseholdID ClientID EnrollmentID Relationship $100.;
cards;
55 1 20 Self (head of household)
55 1 21 Head of household's Child
55 1 22 Head of household's Spouse
60 2 30 Head of household's Child
60 2 31 Self (head of household)
60 2 32 Head of household's Spouse
. 3 10 Head of household's Child
75 5 101 Self (head of household)
75 5 102 Self (head of household)
75 6 103 Head of household's Child
75 9 105 Head of household's Spouse
80 20 110 .
80 21 111 .
34 77 20 Self (head of household)
34 77 21 Head of household's Child
34 77 22 Self (head of household)
. 85 555 Self (head of household)
. 85 556 Head of household's Child
. 85 226 Self (head of household)
1 2 232 Head of household's Child
;
data want;
do until(last.HouseholdID);
set have;
by HouseholdID clientid notsorted;
if first.clientid then _n=sum(_n,1);
if relationship=:'Self' then _f=1;
end;
do until(last.HouseholdID);
set have;
by HouseholdID clientid notsorted;
if first.clientid and last.clientid then output;
else if _n=1 and _f=1 and relationship=:'Self' then output;
else if _n>1 and _f=1 then output;
end;
drop _:;
run;
my detailed explanation will follow tomorrow morning if this thread still hasnt been solved by then
Hi @novinosrin,
For some reasons, the suggested code did not work for the original dataset.
I used the following code and achieved what I wanted to. Thank you for your help!
data want;
set have;
if RelationToHoH='Self (head of household)' then varseq = 0;
else varseq = 1;
proc sort; by household_id client_Uid varseq; run;
run;
data want1;
set want;
by household_id client_Uid;
length relvalue $79.;
retain relvalue;
if first.client_uid then relvalue=RelationToHoH;
if RelationToHoH eq relvalue then flag =1;
if flag=. then delete;
run;
@HarryB great. well done. I'm so glad
I am sorry to bother you multiple times. Does it make different when we have complete missing of household ID for some clients? Majority of my clients do not have household ID.
Have:
Household ID Client ID Enrollment ID Relationship
55 1 20 Self (head of household)
55 1 21 Head of household's Child
55 1 22 Head of household's Spouse
60 2 30 Head of household's Child
60 2 31 Self (head of household)
60 2 32 Head of household's Spouse
. 3 10 Head of household's Child
75 5 101 Self (head of household)
75 5 102 Self (head of household)
75 6 103 Head of household's Child
75 9 105 Head of household's spouse
80 20 110
80 21 111
. 55 98 Head of household's spouse
. 55 70 Other
. 55 65 Head of household's Child
. 55 118 Self (head of household)
Want:
Household ID Client ID Enrollment ID Relationship
55 1 20 Self (head of household)
60 2 31 Self (head of household)
. 3 10 Head of household's Child
75 5 101 Self (head of household)
75 5 102 Self (head of household)
75 6 103 Head of household's Child
75 9 105 Head of household's Spouse
80 20 110
80 21 111
. 55 118 Self (head of household)
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.