DATA Step, Macro, Functions and more

Pulling people with multiple rows without collapsing

Reply
Frequent Contributor
Posts: 138

Pulling people with multiple rows without collapsing

Hi,

 

I have a dataset that is at the level of person and values of a categorical variable (categ_1). There are several possible values of categ_1 but no person can have more than 2 values. Each row also has a value for categ_2 and a date variable. 

 

ID   categ_1     categ_2     date

1      abc            jkl             1/1/2009

1      def            mno          1/1/2010

2      abc           mno          1/1/2009

 

What I'm trying to do is isolate those people with multiple values of categ_1, and find out what their values of categ_2 are for each value of categ_1, by date (so the one with the earlier date is their first categ_1 and the one with the later date is their second categ_1). 

 

What I did was this:

 

proc sort data=want;

by ID date;

run;

 

data have;

set want;

by ID date;

if first.ID then first_categ_1=1;

else second_categ_1=1;

if first_categ_1=1 and categ_2='jkl' then new_var='First categ 1=JKL';

if first_categ_1=1 and categ_2='mno' then new_var='First categ 1=MNO';

if second_categ_1=1 and categ_2='mno' then new_var='Second categ 1=MNO';

if second_categ_1=1 and categ_2='mno' then new_var='Second categ 1=MNO';

run;

 

This works fine and gives me the information I want; the issue is that I want to exclude anyone who only has one value for categ_1 from my frequencies. I'm not sure how to keep both rows for people with two rows with different values of categ_1 and get rid of the rows for those who only have one row with one value of categ_1. I don't want to collapse because of new_var which tells me what happened to each person's first categ_1 and second categ_1. 

 

Any help is much appreciated.

Super User
Posts: 11,343

Re: Pulling people with multiple rows without collapsing

Posted in reply to Walternate

It isn't clear, at least to me, if you want to exclude from an analysis but leave them in the data OR to remove the records from the data.

 

You might be able to set a flag that allows either with:

 

Categ_1_flag= ( first.id and last.id);

If there is only one value for the ID variable then this flag will be 1 other wise 0.

You could either drop the record with

If categ_1_flag=0;

or exclude from an analysis with a where statement or clause:

 

proc freq data=have;

   where categ_1_flag=0;

   tables ....

run;

Ask a Question
Discussion stats
  • 1 reply
  • 142 views
  • 0 likes
  • 2 in conversation