05-03-2017 01:30 PM
My data set has observations for families accross a ten month period. Thus, each family has ten observations, one for each month. Within a family, are family members. I want to utilize the First. Last. structure in SAS to loop over these family members, within a given family while considering the particular month. Can someone help me understand how to do this? I am thinking that I ought to first create a family identification number. Then I will, in my data set, do BY Family_ID and Month. I will then if First.Fam_ID then do, etc. Does this seem reasonable? Thank you for your help.
05-03-2017 01:37 PM
Yes, your logic seems reasonable as long as you sort your dataset by the appropriate variables Post a sample of your data, that makes it much easier to help you with some lines of code.
05-03-2017 01:43 PM
05-03-2017 01:48 PM
There is only one FIRST.MONTH record for a given combination of the by variables (when considering just the variables in the BY list that preceed and include month). Same for LAST.MONTH record. So if there are 10 family members for FAMILY_ID=1 and MONTH='Jan' there is one that is FIRST.MONTH and one that LAST.MONTH and 8 that are neither. Only way for a record to be both the first and the last is if it is the only record in that group.
05-03-2017 01:53 PM
Okay gotcha! By doing it that way, would SAS not be looping over the remainder of the months though? So I do want to know the family members of a given family for every month. If I do not specifiy something at the end of my code like "If Last.Month then Output" will I not get what I really want? Thank you
05-03-2017 01:58 PM
Right if you filter from many records to one you will lose information on the details.
Sometimes you want that like the COUNT or SUM for the group.
Or you might want to transform the data structue. For example you might want to make a more horizontal list by converting muliple rows to multiple variables. You for something like a list perhasp you concatenate the values into a string and keep that instead of the original variable.
05-03-2017 01:44 PM
If you want to take action by month then I would expect that you want to sort by Famity, Month and then by individual ID and then use the FIRST.MONTH and LAST.MONTH variables.
Let's take a simple example just to see how it might work. Say we had information on whether anyone hada cold in that particular month. We could then count how many members in the family had a cold for the month.
data have; input family_id month member_id cold @@; cards; 1 1 1 0 1 1 2 1 1 1 3 0 1 2 1 0 1 2 2 0 1 2 3 0 2 1 1 0 2 1 2 0 2 2 1 0 2 2 2 0 ; data want ; do until (last.month); set have ; by family_id month member_id ; num_colds = sum(0,has_cold); end; keep family_id month num_colds; run;
Basically the FIRST/LAST flags for the last variable in the BY statement will change the most often since it changes both with the value of that variable changes but also when any of the variables before it changes.
05-03-2017 01:49 PM
Thanks for your reply! So I would need to create an Individual Member family ID? SO my main goal is to count, for example, the number of kids in a family in a partcular month. So to do like your example, I would sort by family ID, Month, and Individual ID, then use the first.month last.month to loop until last month is reached. do I understand you correctly? Thanks
05-03-2017 01:53 PM
Yes, but you do not NEED a member id.
You do not NEED to sort by more BY variables than you are using in your code. But if you want to take the FIRST.X then in some cases it matters how the records are sorted within the group.. For example if you sorted by FAMILY_ID MONTH HEIGHT and took the reocrd with FIRST.MOnth you would have the shortest member of the family for that month.