Hello you fine wizards,
I have a bit of a repeat question that was never answered in an old post. (https://communities.sas.com/t5/SAS-Enterprise-Guide/How-do-I-quot-fill-down-quot-in-a-table/m-p/3752...)
I want to paste the last value of every group (e.g. Person value) to all other observations within that group (e.g. Want).
Bonus points if you could wizard a solution that would work regardless of the position of the value (e.g. first, last, middle, etc.)
If below person 1=. person 2=. and person 3=600 then I would want both person 1 and 2 to have Person 3's value pasted as their own value (e.g. want). And do the same for every family.
Family Person Want
1 . 600
1 . 600
1 600 600
2 . 700
2 . 700
2 . 700
2 700 700
Thank you so much!
data have;
input Family Person;
cards;
1 . 600
1 . 600
1 600 600
2 . 700
2 . 700
2 . 700
2 700 700
;
proc sql;
create table want(drop=rn) as
select *,max(person) as want, monotonic() as rn
from have
group by family
order by rn;
quit;
data have;
input Family Person;
cards;
1 . 600
1 . 600
1 600 600
2 . 700
2 . 700
2 . 700
2 700 700
;
proc sql;
create table want(drop=rn) as
select *,max(person) as want, monotonic() as rn
from have
group by family
order by rn;
quit;
Assuming your dataset is sorted by family, then
data want;
merge have have(rename=(person=want) where=(want>.) keep=family person);
by family;
run;
Hi @apaez062 Thank you for the follow up.Please find my responses in italics to your questions.
Please correct me if I misunderstand anything, I'm just confirming I follow the logic.
1. in the first data step you're creating the dataset [have] with the data I included.
Response 1: Your understanding is correct. Basically that’s the input sample for us to test with.
2. In the proc sql you're creating a table [want] with
the maximum value from person (in the case of family 1 this is 600 in the case of family 2 then it's 700) which appears for every record - not just the last person. You're calling this variable 'want'.
Response 2: Yes that’s correct. Since your sample has all missing besides the last, it’s very straight forward to get the max value for the by group which is greater than missing.
3. in the same line you're creating a variable [RN] with the monotonic function to provide us with the row number which begins with person 1 and ends with the last person per every group.
Response 3: Yes that’s correct. I created RN(Rownumber) for the reason the order of the results from proc sql execution is never guaranteed and so to get the order to match your input sample, I am explicitly going with an order by RN
Response 4: Yes. I think Response 3 explains your 4 as well
Response 5: Well, you can keep it if you want. I didn’t see a need there to have an extra variable, so as you rightly pointed out it is only used to sort by row number
6. in the next data step you output the same dataset [want] and bring in the variables from the initial 'have' dataset.
Response 6: Yes, Have is our input sample rather what you have/gave us. J
7. you rename the person variable as 'want' but only for observations where want has a value and you keep both family and person variables and this is done within every family group via the 'by' statement.
Response 7: Yes, very good understanding,.
8. What I don't understand is the last data step. I would think you'd need to merge the want dataset with the have dataset in the merge statement, no? (Again, I'm new with this). I'm not sure why you have the 'have' dataset referenced twice in the same statement.
Response 8: Basically what I am doing is a “self join” or you may call it “self merge”. If you choose to go for datastep solution as opposed to proc sql, make sure for match merge using by group processing requires the incoming datasets to be sorted by group variables. This prerequisite is a “must”
9. Also, are the last values of every person pasted to all other observations in each group in the MAX statement or the merge statement? I'm a little lost there.
Response 9. Both are basically joins, we join the full with one that has values
10. You mentioned 'assuming your dataset is sorted by family' do you mean I need to add an additional proc sort step just after the proc sql but before the next data step to sort it? Would this not be taken care of the group by statement in the sql? Or I guess they'd be grouped by family ID but not sorted by family ID?
Response 10: Answered in Response 8. Proc sql doesn’t require an explicit presort however it does an implicit sort but the output is never guaranteed to have the order of your input. Therefore to be safe and not to be sorry, it’s better to go for an explicit sort to make sure we have the order we want.
Hope the above helps
Looks to have worked. Thanks chief!
You are welcome!
Hello @apaez062 Looks like you are new on the forum. Welcome to SAS communities and I hope you enjoy learning and sharing knowledge with us. It looks like your question has been answered. If yes, can you mark the solution as answered and close the thread?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.