BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fcf
Fluorite | Level 6 fcf
Fluorite | Level 6

Hello!

 

I have these three columns: name, cod and id. Based on the same value in "name" on the previous row I want to provide the same cod on the id column, which is also the previous id/cod. When there is no cod associated with the name in any row, I want to keep it as blank.

 

So this is what I have:

Namecodid
Ana Maria Santos123456789123456789
Ana Maria Santos123456789123456789
Ana Maria Santos  
João Pedro Costa  
João Pedro Costa  
Diogo Silva987654321987654321
Diogo Silva987654321987654321
Diogo Silva  

 

And this is what I want:

Namecodid
Ana Maria Santos123456789123456789
Ana Maria Santos123456789123456789
Ana Maria Santos 123456789
João Pedro Costa  
João Pedro Costa  
Diogo Silva987654321987654321
Diogo Silva987654321987654321
Diogo Silva 987654321

 

Thank you in advance.

 

- I've tried merging, but it returns more rows (i don't know why);

- I've tried using first. and last.;

- I've tried using retain but it keeps on also filling the blank ones that I want to keep as blank.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

That is because _id will be numeric, as the data step compiler will assume that when the variable is first encountered in the CALL MISSING.

One needs to take care of all of that:

data have;
input Name $17. (cod id) (:$10.);
infile datalines truncover;
datalines;
Ana Maria Santos 123456789 123456789
Ana Maria Santos 123456789 123456789
Ana Maria Santos                    
João Pedro Costa                    
João Pedro Costa                    
Diogo Silva      987654321 987654321
Diogo Silva      987654321 987654321
Diogo Silva                         
;

data want(drop=_id);
   set have;
   by Name notsorted;
   length _id $10;
   if first.Name then call missing(_id);
   if cod then _id = id;
   else         id = _id;
   retain _id;
run;

I also had to change the informat for Name because of the UTF character in João

View solution in original post

8 REPLIES 8
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
input Name $ 1-16 cod id;
infile datalines missover;
datalines;
Ana Maria Santos 123456789 123456789
Ana Maria Santos 123456789 123456789
Ana Maria Santos                    
João Pedro Costa                    
João Pedro Costa                    
Diogo Silva      987654321 987654321
Diogo Silva      987654321 987654321
Diogo Silva                         
;

data want(drop=_id);
   set have;
   by Name notsorted;
   if first.Name then call missing(_id);
   if cod then _id = id;
   else         id = _id;
   retain _id;
run;

 

Result:

 

Name             cod       id 
Ana Maria Santos 123456789 123456789 
Ana Maria Santos 123456789 123456789 
Ana Maria Santos .         123456789 
João Pedro Costa .         . 
João Pedro Costa .         . 
Diogo Silva      987654321 987654321 
Diogo Silva      987654321 987654321 
Diogo Silva      .         987654321 

 

fcf
Fluorite | Level 6 fcf
Fluorite | Level 6
Thank you 🙂 It worked. I just don't understand why I now have '.' in the output in the ones that were blank
Kurt_Bremser
Super User

That's why we request that example data is posted in a data step with datalines. Only then will the attributes of columns be clear and unambiguous.

I guess(!) that in your data cod and id are character, but that was not really clear from the quasi-screenshot you posted.

ballardw
Super User

@fcf wrote:
Thank you 🙂 It worked. I just don't understand why I now have '.' in the output in the ones that were blank

The example that @PeterClemmensen used numeric values because you did not indicate one way or the other. The default missing character . is displayed when the value is numeric.

fcf
Fluorite | Level 6 fcf
Fluorite | Level 6

ok thank you, next time I will say the type of variables! my variable still remains as character but now it has that point.

Kurt_Bremser
Super User

That is because _id will be numeric, as the data step compiler will assume that when the variable is first encountered in the CALL MISSING.

One needs to take care of all of that:

data have;
input Name $17. (cod id) (:$10.);
infile datalines truncover;
datalines;
Ana Maria Santos 123456789 123456789
Ana Maria Santos 123456789 123456789
Ana Maria Santos                    
João Pedro Costa                    
João Pedro Costa                    
Diogo Silva      987654321 987654321
Diogo Silva      987654321 987654321
Diogo Silva                         
;

data want(drop=_id);
   set have;
   by Name notsorted;
   length _id $10;
   if first.Name then call missing(_id);
   if cod then _id = id;
   else         id = _id;
   retain _id;
run;

I also had to change the informat for Name because of the UTF character in João

fcf
Fluorite | Level 6 fcf
Fluorite | Level 6
Thank you so much, it worked clearly! Have a nice day and thank you, next time I will write the type of variables i have.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2020 views
  • 2 likes
  • 4 in conversation