BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
apaez062
Fluorite | Level 6

Hello you fine wizards,

 

I have a bit of a repeat question that was never answered in an old post. (https://communities.sas.com/t5/SAS-Enterprise-Guide/How-do-I-quot-fill-down-quot-in-a-table/m-p/3752...)

 

I want to paste the last value of every group (e.g. Person value) to all other observations within that group (e.g. Want).

Bonus points if you could wizard a solution that would work regardless of the position of the value (e.g. first, last, middle, etc.)

 

If below person 1=. person 2=. and person 3=600 then I would want both person 1 and 2 to have Person 3's value pasted as their own value (e.g. want).  And do the same for every family.

 

Family  Person  Want

1              .           600

1              .           600

1            600        600

2             .            700

2             .            700

2             .            700

2             700       700

 

Thank you so much!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

data have;
input Family  Person;  
cards;
1              .           600
1              .           600
1            600        600
2             .            700
2             .            700
2             .            700
2             700       700
;

proc sql;
create table want(drop=rn) as
select *,max(person) as want, monotonic() as rn
from have
group by family
order by rn;
quit;

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

data have;
input Family  Person;  
cards;
1              .           600
1              .           600
1            600        600
2             .            700
2             .            700
2             .            700
2             700       700
;

proc sql;
create table want(drop=rn) as
select *,max(person) as want, monotonic() as rn
from have
group by family
order by rn;
quit;
novinosrin
Tourmaline | Level 20

Assuming your dataset is sorted by family, then

 


data want;
merge have have(rename=(person=want) where=(want>.) keep=family person);
by family;
run;
apaez062
Fluorite | Level 6
Hi Novinosrin,

Thank you for the solution.
As I'm fairly new to sas, please bear with me.

Please correct me if I misunderstand anything, I'm just confirming I follow the logic.
1. in the first data step you're creating the dataset [have] with the data I included.

2. In the proc sql you're creating a table [want] with
the maximum value from person (in the case of family 1 this is 600 in the case of family 2 then it's 700) which appears for every record - not just the last person. You're calling this variable 'want'.

3. in the same line you're creating a variable [RN] with the monotonic function to provide us with the row number which begins with person 1 and ends with the last person per every group.
4. you're reading this from dataset 'have' which you created before but you're grouping it by family ID and sorting by rownumber [RN].
5. In the resulting dataset you've all variables except RN, I'm not sure why you'd drop it? Is it only used to sort by row number? I'm assuming the way proc sql reads the dataset the RN variable gets dropped only after it uses it to sort?

6. in the next data step you output the same dataset [want] and bring in the variables from the initial 'have' dataset.

7. you rename the person variable as 'want' but only for observations where want has a value and you keep both family and person variables and this is done within every family group via the 'by' statement.

8. what I don't understand is the last data step. I would think you'd need to merge the want dataset with the have dataset in the merge statement, no? (Again, I'm new with this). I'm not sure why you have the 'have' dataset referenced twice in the same statement.

9. Also, are the last values of every person pasted to all other observations in each group in the MAX statement or the merge statement? I'm a little lost there.

10. You mentioned 'assuming your dataset is sorted by family' do you mean I need to add an additional proc sort step just after the proc sql but before the next data step to sort it? Would this not be taken care of the group by statement in the sql? Or I guess they'd be grouped by family ID but not sorted by family ID?

Thanks again friend I'm just trying to understand how it works.

novinosrin
Tourmaline | Level 20

Hi @apaez062    Thank you for the follow up.Please find my responses in italics to your questions. 

 

 

 

Please correct me if I misunderstand anything, I'm just confirming I follow the logic.
1. in the first data step you're creating the dataset [have] with the data I included.  

Response 1: Your understanding is correct. Basically that’s the input sample for us to test with.

2. In the proc sql you're creating a table [want] with
the maximum value from person (in the case of family 1 this is 600 in the case of family 2 then it's 700) which appears for every record - not just the last person. You're calling this variable 'want'.

Response 2: Yes that’s correct. Since your sample has all missing besides the last, it’s very straight forward to get the max value for the by group which is greater than missing.

3. in the same line you're creating a variable [RN] with the monotonic function to provide us with the row number which begins with person 1 and ends with the last person per every group.

Response 3: Yes that’s correct. I created RN(Rownumber) for the reason the order of the results from  proc sql execution is never guaranteed and so to get the order to match your input sample, I am explicitly going with an order by RN

 

  1. you're reading this from dataset 'have' which you created before but you're grouping it by family ID and sorting by rownumber [RN].

Response 4: Yes. I think Response 3 explains your 4 as well


  1. In the resulting dataset you've all variables except RN, I'm not sure why you'd drop it? Is it only used to sort by row number? I'm assuming the way proc sql reads the dataset the RN variable gets dropped only after it uses it to sort?

Response 5: Well, you can keep it if you want. I didn’t see a need there to have an extra variable, so as you rightly pointed out it is only used to sort by row number

6. in the next data step you output the same dataset [want] and bring in the variables from the initial 'have' dataset.

Response 6: Yes, Have is our input sample rather what you have/gave us. J

7. you rename the person variable as 'want' but only for observations where want has a value and you keep both family and person variables and this is done within every family group via the 'by' statement.

Response 7: Yes, very good understanding,.

8. What I don't understand is the last data step. I would think you'd need to merge the want dataset with the have dataset in the merge statement, no? (Again, I'm new with this). I'm not sure why you have the 'have' dataset referenced twice in the same statement.

Response 8: Basically what I am doing is a “self join” or you may call it “self merge”. If you choose to go for datastep solution as opposed to proc sql, make sure for match merge using by group processing requires the incoming datasets to be sorted by group variables. This prerequisite is a “must”

9. Also, are the last values of every person pasted to all other observations in each group in the MAX statement or the merge statement? I'm a little lost there.

Response 9. Both are basically joins, we join the full with one that has values

10. You mentioned 'assuming your dataset is sorted by family' do you mean I need to add an additional proc sort step just after the proc sql but before the next data step to sort it? Would this not be taken care of the group by statement in the sql? Or I guess they'd be grouped by family ID but not sorted by family ID?

Response 10: Answered in Response 8. Proc sql doesn’t require an explicit presort however it does an implicit sort but the output is never guaranteed to have the order of your input. Therefore to be safe and not to be sorry, it’s better to go for an explicit sort to make sure we have the order we want.

 

 

Hope the above helps

Allaluiah
Quartz | Level 8

Hello @apaez062   Looks like you are new on the forum. Welcome to SAS communities and I hope you enjoy learning and sharing knowledge with us. It looks like your question has been answered. If yes, can you mark the solution as answered and close the thread?

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3543 views
  • 1 like
  • 3 in conversation