DATA Step, Macro, Functions and more

Using the First. Last. with multiple BY variables

Reply
Occasional Contributor
Posts: 6

Using the First. Last. with multiple BY variables

Hello,

 

My data set has observations for families accross a ten  month period. Thus, each family has ten observations, one for each month. Within a family, are family members. I want to utilize the First. Last. structure in SAS to loop over these family members, within a given family while considering the particular month. Can someone help me understand how to do this? I am thinking that I ought to first create a family identification number. Then I will, in my data set, do BY Family_ID and Month. I will then if First.Fam_ID then do, etc. Does this seem reasonable? Thank you for your help.

PROC Star
Posts: 551

Re: Using the First. Last. with multiple BY variables

Yes, your logic seems reasonable as long as you sort your dataset by the appropriate variables Smiley Happy Post a sample of your data, that makes it much easier to help you with some lines of code.

Occasional Contributor
Posts: 6

Re: Using the First. Last. with multiple BY variables

Thank you for your reply! I’ll get a sample posted shortly. When I use First.Month and Last.Month, I get a much smaller sample than if I do a First.Month without a Last.Month. Why is that? This probably just shows that I do not understand much of the mechanics underlying the do loop.

Zach



Super User
Super User
Posts: 6,499

Re: Using the First. Last. with multiple BY variables

There is only one FIRST.MONTH record for a given combination of the by variables (when considering just the variables in the BY list that preceed and include month).  Same for LAST.MONTH record. So if there are 10 family members for FAMILY_ID=1 and MONTH='Jan' there is one that is FIRST.MONTH and one that LAST.MONTH and 8 that are neither. Only way for a record to be both the first and the last is if it is the only record in that group.

 

Occasional Contributor
Posts: 6

Re: Using the First. Last. with multiple BY variables

Okay gotcha! By doing it that way, would SAS not be looping over the remainder of the months though? So I do want to know the family members of a given family for every month. If I do not specifiy something at the end of my code like "If Last.Month then Output" will I not get what I really want? Thank you

Super User
Super User
Posts: 6,499

Re: Using the First. Last. with multiple BY variables

Right if you filter from many records to one you will lose information on the details.

 

Sometimes you want that like the COUNT or SUM for the group.

Or you might want to transform the data structue. For example you might want to make a more horizontal list by converting muliple rows to multiple variables.  You for something like a list perhasp you concatenate the values into a string and keep that instead of the original variable.

Super User
Super User
Posts: 6,499

Re: Using the First. Last. with multiple BY variables

If you want to take action by month then I would expect that you want to sort by Famity, Month and then by individual ID and then use the FIRST.MONTH and LAST.MONTH variables.

 

Let's take a simple example just to see how it might work.  Say we had information on whether anyone hada cold in that particular month. We could then count how many members in the family had a cold for the month.

data have;
  input family_id month member_id cold @@;
cards;
1 1 1 0 1 1 2 1 1 1 3 0
1 2 1 0 1 2 2 0 1 2 3 0
2 1 1 0 2 1 2 0
2 2 1 0 2 2 2 0
;

data want ;
  do until (last.month);
    set have ;
    by family_id month member_id ;
    num_colds = sum(0,has_cold);
  end;
  keep family_id month num_colds;
run;

Basically the FIRST/LAST flags for the last variable in the BY statement will change the most often since it changes both with the value of that variable changes but also when any of the variables before it changes. 

Occasional Contributor
Posts: 6

Re: Using the First. Last. with multiple BY variables

Thanks for your reply! So I would need to create an Individual Member family ID? SO my main goal is to count, for example, the number of kids in a family in a partcular month. So to do like your example, I would sort by family ID, Month, and Individual ID, then use the first.month last.month to loop until last month is reached. do I understand you correctly? Thanks

Super User
Super User
Posts: 6,499

Re: Using the First. Last. with multiple BY variables

Yes, but you do not NEED a member id.  

 

You do not NEED to sort by more BY variables than you are using in your code.  But if you want to take the FIRST.X then in some cases it matters how the records are sorted within the group.. For example if you sorted by FAMILY_ID MONTH HEIGHT and took the reocrd with FIRST.MOnth you would have the shortest member of the family for that month.

 

 

Ask a Question
Discussion stats
  • 8 replies
  • 643 views
  • 0 likes
  • 3 in conversation