Hello All,
I need to fill in missing rows by groups( ticker, year) for the same person. Here how the data looks like and how I would like to have the data. I really appreciate your help. I have huge data set and can not do manually. Any code will be appreciate it. I use SAS 9.4
CURRENT
Ticker | year | Fname | Lname | aud | exec | com | nom |
AB | 2012 | P | G | ||||
AB | 2012 | P | G | X | X | X | |
AB | 2012 | P | G | X | |||
DD | 2012 | G | L | ||||
DD | 2012 | G | L | C | |||
EG | 2013 | W | Z | ||||
EG | 2013 | W | Z | C | C |
DESIRED
ticker | year | fname | lname | aud | exc | com | nom |
AB | 2012 | P | G | X | X | X | X |
AB | 2012 | P | G | X | X | X | X |
AB | 2012 | P | G | X | X | X | X |
DD | 2012 | G | L | C | |||
DD | 2012 | G | L | C | |||
EG | 2013 | W | Z | C | C | ||
EG | 2013 | W | Z | C | C |
data have;
input (Ticker year Fname Lname aud exec com nom) ($);
cards;
AB 2012 P G . . . .
AB 2012 P G X . X X
AB 2012 P G . X . .
DD 2012 G L . . . .
DD 2012 G L . C . .
EG 2013 W Z . . . .
EG 2013 W Z . . C C
;
proc sql;
create table want as
select Ticker,year,fname, lname,max(aud) as max,max(exec) as exec, max(com) as com, max(nom) as nom
from have
group by ticker, year;
quit;
Notes: Take advantage of autoremerge by not including fname,lname in group by
data have;
input (Ticker year Fname Lname aud exec com nom) ($);
cards;
AB 2012 P G . . . .
AB 2012 P G X . X X
AB 2012 P G . X . .
DD 2012 G L . . . .
DD 2012 G L . C . .
EG 2013 W Z . . . .
EG 2013 W Z . . C C
;
proc sql;
create table want as
select Ticker,year,fname, lname,max(aud) as max,max(exec) as exec, max(com) as com, max(nom) as nom
from have
group by ticker, year;
quit;
Notes: Take advantage of autoremerge by not including fname,lname in group by
Thanks a lot. It is great.
Thanks a lot. It works.
This is one of those problem where you can take advantage of the LOCF (last observation carried forward) task supported by the UPDATE statement accompanied by the OBS=0 parameter and a BY statement. Here LOCF means to bring forward the most recent non-missing value in the variables of interest, so by the end of the BY group (last record for a given ticker/year) you have the variables of interest. Then just re-read all the other variables for the same ticker/group and output:
data have;
input (Ticker year Fname Lname aud exec com nom) ($);
cards;
AB 2012 P G . . . .
AB 2012 P G X . X X
AB 2012 P G . X . .
DD 2012 G L . . . .
DD 2012 G L . C . .
EG 2013 W Z . . . .
EG 2013 W Z . . C C
;
data want;
/* LOCF all variables not in the BY statement */
update have (obs=0) have;
by ticker year;
/* At the end of the ticker/year, reread all the variables
original values, except those of LOCF interest*/
if last.year then do until (last.year);
set have (drop=aud exec com nom);
by ticker year;
output;
end;
run;
Note this assumes your data are sorted by ticker/year, and that no variable of interest has more than 1 non-missing value for a given ticker/year.
This works because the 2nd reading of the dataset (i.e. in the loop starting with "if last.year then do until (last.year);") does NOT re-read the variables of interest (aud exec com nom) due to the DROP= parameter. So they are preserved LOCF results produced by the first reading (in the UPDATE by-group).
Now fname and lname in your data are constant, so keeping the last valid value for them wouldn't matter, But if the fname or lname were to have changes, the code above preserves those changes (because fname and lname are not in the DROP= parameter).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.