I have a data set and have created new columns from an original column called "Africa". This cell contained several variables together that I wanted to isolate and place the new variables into new columns e.g. Africa1, Africa2, etc.
E.g Africa= "Corporate/M&A (Band 3).Dispute Resolution (Band 2)
Africa1 = "Banking & Finance (Band 3)"
Africa2 = "Corporate/M&A (Band 2)"
Once I have done this I want to create a unique column that merges together the region and sector: E,g Africa_BF (this searches for Africa and Banking and Finance variable) that reads all the columns in Africa (1,2,3..) to put them into the new column. I populate the new column with the band numeric and hence the scan function. When creating the new column my code looks like the below:
if index(upcase(africa_wide1),'BANKING & FINANCE')>0 then Africa_BF=scan(africa_wide1,-1);
if index(upcase(africa_wide2),'BANKING & FINANCE')>0 then Africa_BF=scan(africa_wide2,-1);
if index(upcase(africa_wide3),'BANKING & FINANCE')>0 then Africa_BF=scan(africa_wide3,-1);
if index(upcase(africa_wide4),'BANKING & FINANCE')>0 then Africa_BF=scan(africa_wide4,-1);
if index(upcase(africa_wide5),'BANKING & FINANCE')>0 then Africa_BF=scan(africa_wide5,-1);
if index(upcase(africa_wide6),'BANKING & FINANCE')>0 then Africa_BF=scan(africa_wide6,-1);
if index(upcase(africa_wide7),'BANKING & FINANCE')>0 then Africa_BF=scan(africa_wide7,-1);
if index(upcase(africa_wide1),'CORPORATE/M&A')>0 then Africa_Corp_MA=scan(africa_wide1,-1);
if index(upcase(africa_wide2),'CORPORATE/M&A')>0 then Africa_Corp_MA=scan(africa_wide2,-1);
if index(upcase(africa_wide3),'CORPORATE/M&A')>0 then Africa_Corp_MA=scan(africa_wide3,-1);
if index(upcase(africa_wide4),'CORPORATE/M&A')>0 then Africa_Corp_MA=scan(africa_wide4,-1);
if index(upcase(africa_wide5),'CORPORATE/M&A')>0 then Africa_Corp_MA=scan(africa_wide5,-1);
if index(upcase(africa_wide6),'CORPORATE/M&A')>0 then Africa_Corp_MA=scan(africa_wide6,-1);
if index(upcase(africa_wide7),'CORPORATE/M&A')>0 then Africa_Corp_MA=scan(africa_wide7,-1);
There are roughly 80 regions and 40 sector types that would mean I would have to write code for hundreds of columns. Is there a more efficient way to do this? For each region there would be 40 possible outcomes of Sector/Region.
Many Thanks
Normalizing a fat variable into key/value pairs
see
https://communities.sas.com/t5/Base-SAS-Programming/SAS-Macros/m-p/349115
HAVE
====
Up to 40 obs WORK.HAVE total obs=2
FIRM AFRICA_WIDE
Firm1 Corporate/M&A (Band 3).Dispute Resolution (Band 2).Projects & Energy (Band 3).Projects & Energy: Mining & Minerals (Band 3)
Firm2 Corporate/M&A (Band 3).Dispute Resolution (Band 2).Projects & Energy (Band 3)
WANT (if you move the output statement inside the loop you will get the form you wanted or transpose want)
Up to 40 obs WORK.WANT total obs=7
Obs FIRM NAME PROJECTS BAND
1 Firm1 AFRICA1 Corporate/M&A (Band 3) 3
2 Firm1 AFRICA2 Dispute Resolution (Band 2) 2
3 Firm1 AFRICA3 Projects & Energy (Band 3) 3
4 Firm1 AFRICA4 Projects & Energy: Mining & Minerals (Band 3) 3
5 Firm2 AFRICA1 Corporate/M&A (Band 3) 3
6 Firm2 AFRICA2 Dispute Resolution (Band 2) 2
7 Firm2 AFRICA3 Projects & Energy (Band 3) 3
proc transpose data=want out=wantxpo;
by firm;
var band;
id projects;
run;quit;
CORPORATE_ DISPUTE_ PROJECTS___ ENERGY__
M_A__BAND_ RESOLUTION__ ENERGY__ MINING___
Obs FIRM _NAME_ 3_ BAND_2_ BAND_3_ MINE
1 Firm1 BAND 3 2 3 3
2 Firm2 BAND 3 2 3
WORKING CODE
* dimension (permax) are the max number of projects acroos all firs;
array africas[&permax] $64 africa1-africa&permax;
array africasvals[&permax] $5 africasvals1-africasvals&permax;
FULL SOLUTION
* _ _ _
_ __ ___ __ _| | _____ __| | __ _| |_ __ _
| '_ ` _ \ / _` | |/ / _ \ / _` |/ _` | __/ _` |
| | | | | | (_| | < __/ | (_| | (_| | || (_| |
|_| |_| |_|\__,_|_|\_\___| \__,_|\__,_|\__\__,_|
;
data have;
firm ='Firm1';
africa_wide ='Corporate/M&A (Band 3).Dispute Resolution (Band 2).Projects & Energy (Band 3).Projects & Energy: Mining & Minerals (Band 3)';
africa1 ='Corporate/M&A (Band 3)';
africa2 ='Dispute Resolution (Band 2)';
africa3 ='Projects & Energy (Band 3)';
africa4 ='Projects & Energy: Mining & Minerals (Band 3)';
africa_corp =3;
africa_resolut =2;
africa_nrgy_mine =3;
africa_nrgy_mine =3;
run;quit;
data have;
firm ='Firm1';
africa_wide ='Corporate/M&A (Band 3).Dispute Resolution (Band 2).Projects & Energy (Band 3).Projects & Energy: Mining & Minerals (Band 3)';
output;
firm ='Firm2';
africa_wide ='Corporate/M&A (Band 3).Dispute Resolution (Band 2).Projects & Energy (Band 3)';
output;
run;quit;
* _ _ _
___ ___ | |_ _| |_(_) ___ _ __
/ __|/ _ \| | | | | __| |/ _ \| '_ \
\__ \ (_) | | |_| | |_| | (_) | | | |
|___/\___/|_|\__,_|\__|_|\___/|_| |_|
;
data want;
* you need to know the maximum number of africas, ie africa1-africa9?;
if _n_ = 0 then do;
rc=%sysfunc(dosubl('
proc sql;
select max(percnt) into :permax separated by ""
from (select countc(africa_wide,".") + 1 as percnt from have group by firm)
;quit;
'));
end;
put "&&permax";
length projects $64;
set have;
array africas[&permax] $64 africa1-africa&permax;
array africasvals[&permax] $5 africasvals1-africasvals&permax;
do i = 1 to countc(africa_wide,'.') + 1;
africas[i]=scan(africa_wide,i,'.');
africasvals[i]=compress(africas[i],,'kd');
name=vname(africas[i]);
projects=africas[i];
band=compress(africas[i],,'kd');
output;
end;
keep firm name projects band;;
run;quit;
3686 data want;
3687 length project $64;
3688 * you need to know the maximum number of africas, ie africa1-africa9?;
3689 if _n_ = 0 then do;
3690 rc=%sysfunc(dosubl('
NOTE: A GROUP BY clause has been discarded because neither the SELECT clause nor the optional
HAVING clause of the associated table-expression referenced a summary function.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.03 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 3727.00k
OS Memory 20468.00k
Timestamp 04/11/2017 04:01:10 PM
Step Count 338 Switch Count 0
3691 proc sql;
3692 select max(percnt) into :permax separated by ""
3693 from (select countc(africa_wide,".") + 1 as percnt from have group by firm)
3694 ;quit;
3695 '));
3696 end;
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable PERMAX resolves to 4
3697 put "&&permax";
3698 set have;
3699 array africas[&permax] $64 africa1-africa&permax;
SYMBOLGEN: Macro variable PERMAX resolves to 4
SYMBOLGEN: Macro variable PERMAX resolves to 4
3700 array africasvals[&permax] $5 africasvals1-africasvals&permax;
SYMBOLGEN: Macro variable PERMAX resolves to 4
SYMBOLGEN: Macro variable PERMAX resolves to 4
3701 do i = 1 to countc(africa_wide,'.') + 1;
3702 africas[i]=scan(africa_wide,i,'.');
3703 africasvals[i]=compress(africas[i],,'kd');
3704 name=vname(africas[i]);
3705 projects=africas[i];
3706 band=compress(africas[i],,'kd');
3707 output;
3708 end;
3709 keep firm name projects band;;
3710 run;
4
NOTE: There were 2 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 7 observations and 4 variables.
NOTE: DATA statement used (Total process time):
First SAS Macro is not needed!
Look up array processing, something like:
data want; set have; array vals{*} africa_wide:; do i=1 to dim(vals); if vals{i}...; end; run;
If you post test data in the form of a datastep then more acurate code can be provided.
Hi,
Please find attached test data. This is the desired output. In essence, rather than write out all of the IF statements to populate the new columns I wondered if there was a more efficient alternative? By creating the new columns like Africa_BF it would allow me to eye-ball the data across all of the the different firms in column A and compare. The problem I have is that for each region there would be 80 columns for each one.
Thank you
Post data as text not XL and not attachement.
Please see the text below as one example:
Firm Name | Africa-wide | Africa_wide1 | Africa_wide2 | Africa_wide3 | Africa-Wide - Corporate/MA | Africa-Wide - Dispute Resolution | Africa-Wide - Projects & Energy |
Firm1 | Corporate/M&A (Band 3).Dispute Resolution (Band 2).Projects & Energy (Band 3).Projects & Energy: Mining & Minerals (Band 3) | Corporate/M&A (Band 3) | Dispute Resolution (Band 2) | Projects & Energy (Band 3) | 3 | 2 | 3 |
Here's a suggestion that I think will make life much easier.
Change the structure of the data. Instead of creating a variable named Africa_wide_1, create a variable named (something like) Africa_wide_Corporate_MA. Put the data into the right variable to begin with. Your subsequent processing will be (relatively) easy if there is only one variable that holds 'Corporate M&A/(Band #)'.
Normalizing a fat variable into key/value pairs
see
https://communities.sas.com/t5/Base-SAS-Programming/SAS-Macros/m-p/349115
HAVE
====
Up to 40 obs WORK.HAVE total obs=2
FIRM AFRICA_WIDE
Firm1 Corporate/M&A (Band 3).Dispute Resolution (Band 2).Projects & Energy (Band 3).Projects & Energy: Mining & Minerals (Band 3)
Firm2 Corporate/M&A (Band 3).Dispute Resolution (Band 2).Projects & Energy (Band 3)
WANT (if you move the output statement inside the loop you will get the form you wanted or transpose want)
Up to 40 obs WORK.WANT total obs=7
Obs FIRM NAME PROJECTS BAND
1 Firm1 AFRICA1 Corporate/M&A (Band 3) 3
2 Firm1 AFRICA2 Dispute Resolution (Band 2) 2
3 Firm1 AFRICA3 Projects & Energy (Band 3) 3
4 Firm1 AFRICA4 Projects & Energy: Mining & Minerals (Band 3) 3
5 Firm2 AFRICA1 Corporate/M&A (Band 3) 3
6 Firm2 AFRICA2 Dispute Resolution (Band 2) 2
7 Firm2 AFRICA3 Projects & Energy (Band 3) 3
proc transpose data=want out=wantxpo;
by firm;
var band;
id projects;
run;quit;
CORPORATE_ DISPUTE_ PROJECTS___ ENERGY__
M_A__BAND_ RESOLUTION__ ENERGY__ MINING___
Obs FIRM _NAME_ 3_ BAND_2_ BAND_3_ MINE
1 Firm1 BAND 3 2 3 3
2 Firm2 BAND 3 2 3
WORKING CODE
* dimension (permax) are the max number of projects acroos all firs;
array africas[&permax] $64 africa1-africa&permax;
array africasvals[&permax] $5 africasvals1-africasvals&permax;
FULL SOLUTION
* _ _ _
_ __ ___ __ _| | _____ __| | __ _| |_ __ _
| '_ ` _ \ / _` | |/ / _ \ / _` |/ _` | __/ _` |
| | | | | | (_| | < __/ | (_| | (_| | || (_| |
|_| |_| |_|\__,_|_|\_\___| \__,_|\__,_|\__\__,_|
;
data have;
firm ='Firm1';
africa_wide ='Corporate/M&A (Band 3).Dispute Resolution (Band 2).Projects & Energy (Band 3).Projects & Energy: Mining & Minerals (Band 3)';
africa1 ='Corporate/M&A (Band 3)';
africa2 ='Dispute Resolution (Band 2)';
africa3 ='Projects & Energy (Band 3)';
africa4 ='Projects & Energy: Mining & Minerals (Band 3)';
africa_corp =3;
africa_resolut =2;
africa_nrgy_mine =3;
africa_nrgy_mine =3;
run;quit;
data have;
firm ='Firm1';
africa_wide ='Corporate/M&A (Band 3).Dispute Resolution (Band 2).Projects & Energy (Band 3).Projects & Energy: Mining & Minerals (Band 3)';
output;
firm ='Firm2';
africa_wide ='Corporate/M&A (Band 3).Dispute Resolution (Band 2).Projects & Energy (Band 3)';
output;
run;quit;
* _ _ _
___ ___ | |_ _| |_(_) ___ _ __
/ __|/ _ \| | | | | __| |/ _ \| '_ \
\__ \ (_) | | |_| | |_| | (_) | | | |
|___/\___/|_|\__,_|\__|_|\___/|_| |_|
;
data want;
* you need to know the maximum number of africas, ie africa1-africa9?;
if _n_ = 0 then do;
rc=%sysfunc(dosubl('
proc sql;
select max(percnt) into :permax separated by ""
from (select countc(africa_wide,".") + 1 as percnt from have group by firm)
;quit;
'));
end;
put "&&permax";
length projects $64;
set have;
array africas[&permax] $64 africa1-africa&permax;
array africasvals[&permax] $5 africasvals1-africasvals&permax;
do i = 1 to countc(africa_wide,'.') + 1;
africas[i]=scan(africa_wide,i,'.');
africasvals[i]=compress(africas[i],,'kd');
name=vname(africas[i]);
projects=africas[i];
band=compress(africas[i],,'kd');
output;
end;
keep firm name projects band;;
run;quit;
3686 data want;
3687 length project $64;
3688 * you need to know the maximum number of africas, ie africa1-africa9?;
3689 if _n_ = 0 then do;
3690 rc=%sysfunc(dosubl('
NOTE: A GROUP BY clause has been discarded because neither the SELECT clause nor the optional
HAVING clause of the associated table-expression referenced a summary function.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.03 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 3727.00k
OS Memory 20468.00k
Timestamp 04/11/2017 04:01:10 PM
Step Count 338 Switch Count 0
3691 proc sql;
3692 select max(percnt) into :permax separated by ""
3693 from (select countc(africa_wide,".") + 1 as percnt from have group by firm)
3694 ;quit;
3695 '));
3696 end;
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable PERMAX resolves to 4
3697 put "&&permax";
3698 set have;
3699 array africas[&permax] $64 africa1-africa&permax;
SYMBOLGEN: Macro variable PERMAX resolves to 4
SYMBOLGEN: Macro variable PERMAX resolves to 4
3700 array africasvals[&permax] $5 africasvals1-africasvals&permax;
SYMBOLGEN: Macro variable PERMAX resolves to 4
SYMBOLGEN: Macro variable PERMAX resolves to 4
3701 do i = 1 to countc(africa_wide,'.') + 1;
3702 africas[i]=scan(africa_wide,i,'.');
3703 africasvals[i]=compress(africas[i],,'kd');
3704 name=vname(africas[i]);
3705 projects=africas[i];
3706 band=compress(africas[i],,'kd');
3707 output;
3708 end;
3709 keep firm name projects band;;
3710 run;
4
NOTE: There were 2 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 7 observations and 4 variables.
NOTE: DATA statement used (Total process time):
Thanks very much for your help
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.