Hi all,
Am fairly new to SAS and am encountering the following problem. Appreciate if someone could shed some light on it! 🙂
I have a dataset which looks like this:
SH_NAME refers to the shareholders who own the firm while holding refers to the percentage ownership of the firm.
REVENUE, PROFIT, ASSETS & EQUITY figures are that of the firm total.
There are 20 other columns with such firm-level variables. For some variable, there is no info on it (e.g. Equity in the example below).
FIRM_NAME | SH_NAME | HOLDING | REVENUE | PROFIT | ASSETS | EQUITY |
A | A1 | 100 | 10 | 2 | 20 | 10 |
B | B1 | 17 | ||||
B | B2 | 17 | 100 | 40 | 50 | |
B | B3 | 17 | ||||
B | B4 | 17 | ||||
B | B5 | 17 | ||||
B | B6 | 17 |
Is there a way to fill in the missing blanks such that the final result will look like the following table?
FIRM_NAME | SH_NAME | HOLDING | REVENUE | PROFIT | ASSETS | EQUITY |
A | A1 | 100 | 10 | 2 | 20 | 10 |
B | B1 | 17 | 100 | 40 | 50 | |
B | B2 | 17 | 100 | 40 | 50 | |
B | B3 | 17 | 100 | 40 | 50 | |
B | B4 | 17 | 100 | 40 | 50 | |
B | B5 | 17 | 100 | 40 | 50 | |
B | B6 | 17 | 100 | 40 | 50 |
Thanks in advance!
Here is one way
data have;
infile datalines missover dsd dlm=',';
input FIRM_NAME $ SH_NAME $ HOLDING REVENUE PROFIT ASSETS EQUITY;
datalines;
A,A1,100,10,2,20,10
B,B1,17,
B,B2,17,100,40,50,,
B,B3,17,
B,B4,17,
B,B5,17,
B,B6,17,
;
data temp;
update have(obs=0) have;
by FIRM_NAME;
output;
run;
proc sort data=temp;
by FIRM_NAME descending SH_NAME;
run;
data temp2;
update temp(obs=0) temp;
by FIRM_NAME;
output;
run;
proc sort data=temp2 out=want;
by FIRM_NAME SH_NAME;
run;
proc datasets lib=work nolist;
delete temp:;
run;
Here is one way
data have;
infile datalines missover dsd dlm=',';
input FIRM_NAME $ SH_NAME $ HOLDING REVENUE PROFIT ASSETS EQUITY;
datalines;
A,A1,100,10,2,20,10
B,B1,17,
B,B2,17,100,40,50,,
B,B3,17,
B,B4,17,
B,B5,17,
B,B6,17,
;
data temp;
update have(obs=0) have;
by FIRM_NAME;
output;
run;
proc sort data=temp;
by FIRM_NAME descending SH_NAME;
run;
data temp2;
update temp(obs=0) temp;
by FIRM_NAME;
output;
run;
proc sort data=temp2 out=want;
by FIRM_NAME SH_NAME;
run;
proc datasets lib=work nolist;
delete temp:;
run;
Thanks for the suggestion!
However, as i have c.20,000 rows (and around 10,000 blanks that needs to be filled based on the firm ID), writing out each line may take up quite some time, in which case Excel may be a better option to edit the cells?
I just took your sample data as an example here. My code can handle your 20.000 rows easily 🙂
Thanks! This worked fine for me! 🙂
Based on your sample data below could work.
data have;
infile datalines missover dsd dlm=',';
input FIRM_NAME $ SH_NAME $ HOLDING REVENUE PROFIT ASSETS EQUITY;
datalines;
A,A1,100,10,2,20,10
B,B1,17,
B,B2,17,100,40,50,,
B,B3,17,
B,B4,17,
B,B5,17,
B,B6,17,
;
proc sort data=have;
by firm_name sh_name;
run;
%let firm_level_vars=REVENUE PROFIT ASSETS;
%let firm_level_vars2=%sysfunc(compbl(&firm_level_vars));
%let firm_level_vars2=%sysfunc(translate(&firm_level_vars2,%str(,),%str( )));
data want;
merge
have(drop=&firm_level_vars)
have(keep=firm_name &firm_level_vars
where=(n(&firm_level_vars2)>0)
)
;
by firm_name;
run;
proc print data=want;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.