Solved: Re: Remove rows if certain variable changes across time in a longitudi...

HarryB · Posted 04-24-2019 04:22 PM

Hi,

I would like to perform the following task.

When relationship changes across enrollment periods, I would like to drop all enrollment periods where relationship to head of household is not self. My data is in the long format.

Have:

Client ID Enrollment ID Relationship

1 20 Self (head of household)

1 21 Head of household's Child

1 22 Head of household's Spouse

2 30 Head of household's Child

2 31 Self (head of household)

2 32 Head of household's Spouse

Want:

Client ID Enrollment ID Relationship

1 20 Self (head of household)

2 31 Self (head of household)

Your help would be truly appreciated!

Thank you in advance!

-Harry

HarryB · Posted 05-17-2019 05:12 PM

Hi @novinosrin,

For some reasons, the suggested code did not work for the original dataset.

I used the following code and achieved what I wanted to. Thank you for your help!

data want;

set have;

if RelationToHoH='Self (head of household)' then varseq = 0;

else varseq = 1;

proc sort; by household_id client_Uid varseq; run;

run;

data want1;

set want;

by household_id client_Uid;

length relvalue $79.;

retain relvalue;

if first.client_uid then relvalue=RelationToHoH;

if RelationToHoH eq relvalue then flag =1;

if flag=. then delete;

run;

View solution in original post

PaigeMiller · Posted 04-24-2019 04:26 PM

data want;
    set have;
    if relationship=:'Self';
run;

--
Paige Miller

HarryB · Posted 04-24-2019 04:52 PM

Hi Paige Miller,

Thank you for your prompt response. I should have been more clear on my question. Your suggested code only selects "self" and removes all others. However, I want to retain IDs ( no matter what their relationship is) where we have only one enrollment period. Thank you!

Have:

Client ID Enrollment ID Relationship

1 20 Self (head of household)

1 21 Head of household's Child

1 22 Head of household's Spouse

2 30 Head of household's Child

2 31 Self (head of household)

2 32 Head of household's Spouse

3 10 Head of household's Child

Want:

Client ID Enrollment ID Relationship

1 20 Self (head of household)

2 31 Self (head of household)

3 10 Head of household's Child

novinosrin · Posted 04-24-2019 06:22 PM


data have;
infile cards truncover;
input ClientID    EnrollmentID    Relationship $100.;
cards;
1                       20                   Self (head of household)
1                       21                   Head of household's Child
1                       22                   Head of household's Spouse          
2                       30                   Head of household's Child
2                       31                   Self (head of household)
2                       32                   Head of household's Spouse  
3                       10                   Head of household's Child
;

data want;
set have;
by clientid;
if first.clientid and last.clientid then output;
else if  relationship=:'Self' then output;
run;

However, I want to retain IDs ( no matter what their relationship is) where we have only one enrollment period.

HarryB · Posted 04-25-2019 11:21 AM

Hi,

Thank you so much for your response. It works perfectly! I have a minor modification on the data, could you please help me how to achieve the following? In addition to what you have suggested, I want to keep all enrollments if an household has at least one self head of household member. For example, the household ID 75, I want to keep all enrollments for client 5,6, and 9. Thank you in advance!

Have:

Household ID Client ID Enrollment ID Relationship

55 1 20 Self (head of household)

55 1 21 Head of household's Child

55 1 22 Head of household's Spouse

60 2 30 Head of household's Child

60 2 31 Self (head of household)

60 2 32 Head of household's Spouse

. 3 10 Head of household's Child

75 5 101 Self (head of household)

75 5 102 Self (head of household)

75 6 103 Head of household's Child

75 9 105 Head of household's spouse

80 20 110

80 21 111

Want:

Household ID Client ID Enrollment ID Relationship

55 1 20 Self (head of household)

60 2 31 Self (head of household)

. 3 10 Head of household's Child

75 5 101 Self (head of household)

75 5 102 Self (head of household)

75 6 103 Head of household's Child

75 9 105 Head of household's Spouse

80 20 110

80 21 111

novinosrin · Posted 04-25-2019 12:47 PM

HI @HarryB HarryB Assuming I understand your modified requirement. Please test and let us know. Thanks!


data have;
infile cards truncover;
input HouseholdID ClientID       EnrollmentID         Relationship $100.;
cards;
 55                     1                       20                   Self (head of household)
       55                    1                        21                   Head of household's Child
       55                    1                        22                   Head of household's Spouse          
       60                   2                         30                   Head of household's Child
       60                   2                         31                   Self (head of household)
       60                   2                        32                   Head of household's Spouse  
         .                    3                        10                   Head of household's Child
       75                   5                        101                 Self (head of household)
       75                   5                         102                 Self (head of household)
       75                   6                         103                 Head of household's Child
       75                   9                         105                 Head of household's spouse
       80                   20                        110                .
       80                   21                        111                .
;


data want;
do until(last.HouseholdID);
set have;
by HouseholdID clientid notsorted;
if first.clientid then _n=sum(_n,1);
if  relationship=:'Self' then _f=1;
end;
do until(last.HouseholdID);
set have;
by HouseholdID clientid notsorted;
if _n=1 and  _f=1 and  relationship=:'Self' then output;
else if not(_n=1 and _f=1)  then output;;
end;
drop _:;
run;

novinosrin · Posted 04-25-2019 02:18 PM

Hi @HarryB I think the following is perhaps closer,

data have;
infile cards truncover;
input HouseholdID ClientID       EnrollmentID         Relationship $100.;
cards;
 55                     1                       20                   Self (head of household)
       55                    1                        21                   Head of household's Child
       55                    1                        22                   Head of household's Spouse          
       60                   2                         30                   Head of household's Child
       60                   2                         31                   Self (head of household)
       60                   2                        32                   Head of household's Spouse  
         .                    3                        10                   Head of household's Child
       75                   5                        101                 Self (head of household)
       75                   5                         102                 Self (head of household)
       75                   6                         103                 Head of household's Child
       75                   9                         105                 Head of household's spouse
       80                   20                        110                .
       80                   21                        111                .
;

data want;
do until(last.HouseholdID);
set have;
by HouseholdID clientid notsorted;
if first.clientid then _n=sum(_n,1);
if  relationship=:'Self' then _f=1;
end;
do until(last.HouseholdID);
set have;
by HouseholdID clientid notsorted;
if first.clientid and last.clientid then output;
else if _n=1 and  _f=1 and  relationship=:'Self' then output;
else if _n>1 and _f=1 then output;
end;
drop _:;
run;

HarryB · Posted 04-25-2019 05:58 PM

Hi novinosrin ,
For some reasons, it works perfectly on the example I posted, but does not work on my dataset. It does not give me an error but it does not make any changes to the previous dataset. I am surprised! Any ideas/help?

Just to make it clear, the enrollment ID in the dataset is not in order, it is a unique ID per enrollment, where one row represent one enrollment. The relationship: "self" could be in any enrollment (first, middle or last or anywhere). Please let me know if this makes different. Thanks!

novinosrin · Posted 04-25-2019 06:56 PM

Can you please post a comprehensive sample that is a good representative of your real?

" The relationship: "self" could be in any enrollment (first, middle or last or anywhere)."

i think somewhere there's a misunderstanding

HarryB · Posted 04-25-2019 08:14 PM

I am sorry for the confusion.

Where the relationship changes across enrollment periods of the same client, I want to drop enrollment periods where the relationship is not self. (If an individual reports self head of household at a one-time point, the relationship cannot change since the client has the same household ID, meaning from the same family).

In some cases, one individual enrolls into the program multiple times alone (missing household ID), eg client ID 85; If this client reports the self head of household at one enrollment period but any other relationship in other enrollment periods, I would like to drop other enrollment periods where the relationship is not self.

Thank you!!

Have:

Household ID Client ID Enrollment ID Relationship

55 1 20 Self (head of household)

55 1 21 Head of household's Child

55 1 22 Head of household's Spouse

60 2 30 Head of household's Child

60 2 31 Self (head of household)

60 2 32 Head of household's Spouse

. 3 10 Head of household's Child

75 5 101 Self (head of household)

75 5 102 Self (head of household)

75 6 103 Head of household's Child

75 9 105 Head of household's Spouse

80 20 110

80 21 111

34 77 20 Self (head of household)

34 77 21 Head of household's Child

34 77 22 Self (head of household)

. 85 555 Self (head of household)

. 85 556 Head of household's Child

. 85 226 Self (head of household)

1 2 232 Head of household's Child

Want:

Household ID Client ID Enrollment ID Relationship

55 1 20 Self (head of household)

60 2 31 Self (head of household)

. 3 10 Head of household's Child

75 5 101 Self (head of household)

75 5 102 Self (head of household)

75 6 103 Head of household's Child

75 9 105 Head of household's Spouse

80 20 110

80 21 111

34 77 20 Self (head of household)

34 77 22 Self (head of household)

. 85 555 Self (head of household)

. 85 226 Self (head of household)

1 2 232 Head of household's Child

novinosrin · Posted 04-25-2019 11:07 PM

@HarryB The same code is producing the new WANT for the new HAVE as I ran a test with the 2 new samples. I'm about sleep and I'll explain the logic at length tomorrow morning and you should be able to pick where my thinking/understanding went wrong. We shall proceed from thereafter.

We just have to work interactively, nonetheless I hope somebody else might have given you the much needed solution while I'm sleeping. Good night!

This was the test:


data have;
infile cards truncover;
input HouseholdID ClientID       EnrollmentID         Relationship $100.;
cards;
       55                     1                       20                   Self (head of household)
       55                    1                        21                   Head of household's Child
       55                    1                        22                   Head of household's Spouse          
       60                   2                         30                   Head of household's Child
      60                   2                         31                   Self (head of household)
       60                   2                        32                   Head of household's Spouse  
        .                    3                        10                   Head of household's Child
       75                   5                        101                Self (head of household)
       75                   5                         102                 Self (head of household)
       75                   6                         103                 Head of household's Child
       75                   9                         105                Head of household's Spouse
       80                   20                        110                 .
       80                   21                        111                .
       34                   77                         20                   Self (head of household)
       34                   77                        21                   Head of household's Child
       34                   77                       22                   Self (head of household)  
        .                    85                        555                   Self (head of household)
       .                     85                        556                   Head of household's Child
       .                     85                       226                  Self (head of household)  
       1                    2                        232                  Head of household's Child
;


data want;
do until(last.HouseholdID);
set have;
by HouseholdID clientid notsorted;
if first.clientid then _n=sum(_n,1);
if  relationship=:'Self' then _f=1;
end;
do until(last.HouseholdID);
set have;
by HouseholdID clientid notsorted;
if first.clientid and last.clientid then output;
else if _n=1 and  _f=1 and  relationship=:'Self' then output;
else if _n>1 and _f=1 then output;
end;
drop _:;
run;

my detailed explanation will follow tomorrow morning if this thread still hasnt been solved by then

HarryB · Posted 05-17-2019 05:12 PM

Hi @novinosrin,

For some reasons, the suggested code did not work for the original dataset.

I used the following code and achieved what I wanted to. Thank you for your help!

data want;

set have;

if RelationToHoH='Self (head of household)' then varseq = 0;

else varseq = 1;

proc sort; by household_id client_Uid varseq; run;

run;

data want1;

set want;

by household_id client_Uid;

length relvalue $79.;

retain relvalue;

if first.client_uid then relvalue=RelationToHoH;

if RelationToHoH eq relvalue then flag =1;

if flag=. then delete;

run;

novinosrin · Posted 05-17-2019 10:06 PM

@HarryB great. well done. I'm so glad

HarryB · Posted 04-25-2019 06:44 PM

I am sorry to bother you multiple times. Does it make different when we have complete missing of household ID for some clients? Majority of my clients do not have household ID.

Have:

Household ID Client ID Enrollment ID Relationship

55 1 20 Self (head of household)

55 1 21 Head of household's Child

55 1 22 Head of household's Spouse

60 2 30 Head of household's Child

60 2 31 Self (head of household)

60 2 32 Head of household's Spouse

. 3 10 Head of household's Child

75 5 101 Self (head of household)

75 5 102 Self (head of household)

75 6 103 Head of household's Child

75 9 105 Head of household's spouse

80 20 110

80 21 111

. 55 98 Head of household's spouse

. 55 70 Other

. 55 65 Head of household's Child

. 55 118 Self (head of household)

Want:

Household ID Client ID Enrollment ID Relationship

55 1 20 Self (head of household)

60 2 31 Self (head of household)

. 3 10 Head of household's Child

75 5 101 Self (head of household)

75 5 102 Self (head of household)

75 6 103 Head of household's Child

75 9 105 Head of household's Spouse

80 20 110

80 21 111

. 55 118 Self (head of household)

Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Re: Remove rows if certain variable changes across time in a longitudinal data.

Catch up on SAS Innovate 2026