Solved: Paste last value in by group to all other observations in the same gro...

apaez062 · Posted 02-20-2019 04:58 PM

Hello you fine wizards,

I have a bit of a repeat question that was never answered in an old post. (https://communities.sas.com/t5/SAS-Enterprise-Guide/How-do-I-quot-fill-down-quot-in-a-table/m-p/3752...)

I want to paste the last value of every group (e.g. Person value) to all other observations within that group (e.g. Want).

Bonus points if you could wizard a solution that would work regardless of the position of the value (e.g. first, last, middle, etc.)

If below person 1=. person 2=. and person 3=600 then I would want both person 1 and 2 to have Person 3's value pasted as their own value (e.g. want). And do the same for every family.

Family Person Want

1 . 600

1 600 600

2 . 700

2 700 700

Thank you so much!

novinosrin · Posted 02-20-2019 05:03 PM


data have;
input Family  Person;  
cards;
1              .           600
1              .           600
1            600        600
2             .            700
2             .            700
2             .            700
2             700       700
;

proc sql;
create table want(drop=rn) as
select *,max(person) as want, monotonic() as rn
from have
group by family
order by rn;
quit;

View solution in original post

novinosrin · Posted 02-20-2019 05:03 PM


data have;
input Family  Person;  
cards;
1              .           600
1              .           600
1            600        600
2             .            700
2             .            700
2             .            700
2             700       700
;

proc sql;
create table want(drop=rn) as
select *,max(person) as want, monotonic() as rn
from have
group by family
order by rn;
quit;

novinosrin · Posted 02-20-2019 05:06 PM

Assuming your dataset is sorted by family, then


data want;
merge have have(rename=(person=want) where=(want>.) keep=family person);
by family;
run;

apaez062 · Posted 02-21-2019 07:47 AM

Hi Novinosrin,

Thank you for the solution.
As I'm fairly new to sas, please bear with me.

Please correct me if I misunderstand anything, I'm just confirming I follow the logic.
1. in the first data step you're creating the dataset [have] with the data I included.

2. In the proc sql you're creating a table [want] with
the maximum value from person (in the case of family 1 this is 600 in the case of family 2 then it's 700) which appears for every record - not just the last person. You're calling this variable 'want'.

3. in the same line you're creating a variable [RN] with the monotonic function to provide us with the row number which begins with person 1 and ends with the last person per every group.
4. you're reading this from dataset 'have' which you created before but you're grouping it by family ID and sorting by rownumber [RN].
5. In the resulting dataset you've all variables except RN, I'm not sure why you'd drop it? Is it only used to sort by row number? I'm assuming the way proc sql reads the dataset the RN variable gets dropped only after it uses it to sort?

6. in the next data step you output the same dataset [want] and bring in the variables from the initial 'have' dataset.

7. you rename the person variable as 'want' but only for observations where want has a value and you keep both family and person variables and this is done within every family group via the 'by' statement.

8. what I don't understand is the last data step. I would think you'd need to merge the want dataset with the have dataset in the merge statement, no? (Again, I'm new with this). I'm not sure why you have the 'have' dataset referenced twice in the same statement.

9. Also, are the last values of every person pasted to all other observations in each group in the MAX statement or the merge statement? I'm a little lost there.

10. You mentioned 'assuming your dataset is sorted by family' do you mean I need to add an additional proc sort step just after the proc sql but before the next data step to sort it? Would this not be taken care of the group by statement in the sql? Or I guess they'd be grouped by family ID but not sorted by family ID?

Thanks again friend I'm just trying to understand how it works.

novinosrin · Posted 02-21-2019 09:28 AM

Hi @apaez062 Thank you for the follow up.Please find my responses in italics to your questions.

Please correct me if I misunderstand anything, I'm just confirming I follow the logic.
1. in the first data step you're creating the dataset [have] with the data I included.

Response 1: Your understanding is correct. Basically that’s the input sample for us to test with.

2. In the proc sql you're creating a table [want] with
the maximum value from person (in the case of family 1 this is 600 in the case of family 2 then it's 700) which appears for every record - not just the last person. You're calling this variable 'want'.

Response 2: Yes that’s correct. Since your sample has all missing besides the last, it’s very straight forward to get the max value for the by group which is greater than missing.

3. in the same line you're creating a variable [RN] with the monotonic function to provide us with the row number which begins with person 1 and ends with the last person per every group.

Response 3: Yes that’s correct. I created RN(Rownumber) for the reason the order of the results from proc sql execution is never guaranteed and so to get the order to match your input sample, I am explicitly going with an order by RN

you're reading this from dataset 'have' which you created before but you're grouping it by family ID and sorting by rownumber [RN].

Response 4: Yes. I think Response 3 explains your 4 as well

In the resulting dataset you've all variables except RN, I'm not sure why you'd drop it? Is it only used to sort by row number? I'm assuming the way proc sql reads the dataset the RN variable gets dropped only after it uses it to sort?

Response 5: Well, you can keep it if you want. I didn’t see a need there to have an extra variable, so as you rightly pointed out it is only used to sort by row number

6. in the next data step you output the same dataset [want] and bring in the variables from the initial 'have' dataset.

Response 6: Yes, Have is our input sample rather what you have/gave us. J

7. you rename the person variable as 'want' but only for observations where want has a value and you keep both family and person variables and this is done within every family group via the 'by' statement.

Response 7: Yes, very good understanding,.

8. What I don't understand is the last data step. I would think you'd need to merge the want dataset with the have dataset in the merge statement, no? (Again, I'm new with this). I'm not sure why you have the 'have' dataset referenced twice in the same statement.

Response 8: Basically what I am doing is a “self join” or you may call it “self merge”. If you choose to go for datastep solution as opposed to proc sql, make sure for match merge using by group processing requires the incoming datasets to be sorted by group variables. This prerequisite is a “must”

9. Also, are the last values of every person pasted to all other observations in each group in the MAX statement or the merge statement? I'm a little lost there.

Response 9. Both are basically joins, we join the full with one that has values

10. You mentioned 'assuming your dataset is sorted by family' do you mean I need to add an additional proc sort step just after the proc sql but before the next data step to sort it? Would this not be taken care of the group by statement in the sql? Or I guess they'd be grouped by family ID but not sorted by family ID?

Response 10: Answered in Response 8. Proc sql doesn’t require an explicit presort however it does an implicit sort but the output is never guaranteed to have the order of your input. Therefore to be safe and not to be sorry, it’s better to go for an explicit sort to make sure we have the order we want.

Hope the above helps

apaez062 · Posted 02-22-2019 08:20 PM

Looks to have worked. Thanks chief!

novinosrin · Posted 02-22-2019 10:03 PM

You are welcome!

Allaluiah · Posted 02-21-2019 10:49 AM

Hello @apaez062 Looks like you are new on the forum. Welcome to SAS communities and I hope you enjoy learning and sharing knowledge with us. It looks like your question has been answered. If yes, can you mark the solution as answered and close the thread?

Paste last value in by group to all other observations in the same group, repeat for each group

Re: Paste last value in by group to all other observations in the same group, repeat for each group

Re: Paste last value in by group to all other observations in the same group, repeat for each group

Re: Paste last value in by group to all other observations in the same group, repeat for each group

Re: Paste last value in by group to all other observations in the same group, repeat for each group

Re: Paste last value in by group to all other observations in the same group, repeat for each group

Re: Paste last value in by group to all other observations in the same group, repeat for each group

Re: Paste last value in by group to all other observations in the same group, repeat for each group

Re: Paste last value in by group to all other observations in the same group, repeat for each group

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away