Hello fellow SAS users.
I need help consolidating a SAS code that needs to be run for different categorical combinations.
For my purposes, there are three categories: Industry, Year, and Country. The following code is being used to calculate a market share variable, and it works. I last used this code about 1 year ago, at which point it was easiest to just copy/paste this code, which already executes the calculation by industry, with modifications for each Country/Year Combination.
As you can see in the current code, the third DATA step deletes any observations that are not in the first alphabetical country, Austria and for any years other than 2002 (the earliest year). As written, I have to run this code over 500 times for all of the country/year combinations.
I would like assistance that will run this same procedure for each Year/Country/Industry combination silmultaneously.
Any help is appreciated!
ODS HTML CLOSE; /*CLOSE PREVIOUS*/ ODS HTML; /*OPEN NEW*/ LIBNAME MYDATA 'C:\Users\wr778247\Dropbox\SASDATA'; DATA SPEC; SET 'C:\Users\wr778247\Dropbox\SASDATA\SPEC_NEW'; RENAME IND_CODE=INDCODE; RENAME SALES=REV; RENAME AUD=AUDCODE; RUN; DATA SPEC; SET SPEC; IF AUDPARENT='NULL' OR AUDPARENT='PVNT' OR AUDPARENT='PVND' THEN AUDITOR = AUDCODE; ELSE AUDITOR=AUDPARENT; IF COUNTRY NE 'AUSTRIA' THEN DELETE; RUN; *SETUP FOR 2002*; DATA AUSTRIA_TABLES_2002; SET SPEC; IF FYEAR NE 2002 THEN DELETE; IF COUNTRY NE 'AUSTRIA' THEN DELETE; INDCODE_LAG = LAG(INDCODE); INDCODE_LAG2 = LAG2(INDCODE); INDCODE_LAG3 = LAG3(INDCODE); INDCODE_LAG4 = LAG4(INDCODE); INDCODE_LAG5 = LAG5(INDCODE); INDCODE_LAG6 = LAG6(INDCODE); IF INDCODE_LAG = . THEN INDCODE_LAG=0; IF INDCODE_LAG2 = . THEN INDCODE_LAG2=0; IF INDCODE_LAG3 = . THEN INDCODE_LAG3=0; IF INDCODE_LAG4 = . THEN INDCODE_LAG4=0; IF INDCODE_LAG5 = . THEN INDCODE_LAG5=0; IF INDCODE-INDCODE_LAG=1 THEN INDRANK=1; IF INDCODE-INDCODE_LAG=0 THEN INDRANK=2; IF INDCODE-INDCODE_LAG2=0 THEN INDRANK=3; IF INDCODE-INDCODE_LAG3=0 THEN INDRANK=4; IF INDCODE-INDCODE_LAG4=0 THEN INDRANK=5; IF INDCODE-INDCODE_LAG5=0 THEN INDRANK=6; AUD1=LAG4(AUDITOR); AUD2=LAG3(AUDITOR); AUD3=LAG2(AUDITOR); AUD4=LAG(AUDITOR); AUD5=AUDITOR; IF AUD1=AUD2 THEN TOP2=1; ELSE TOP2=0; IF AUD1=AUD3 THEN TOP1_3=1; ELSE TOP1_3=0; IF AUD2=AUD3 THEN TOP2_3=1; ELSE TOP2_3=0; IF AUD1=AUD2=AUD3 THEN TOP3 =1; ELSE TOP3=0; IF AUD1=AUD2=AUD3=AUD4=AUD5 THEN TOP5=1; ELSE TOP5=0; IF AUD1=AUD2=AUD3 OR AUD1=AUD2=AUD4 OR AUD1=AUD2=AUD5 OR AUD1=AUD3=AUD4 OR AUD1=AUD3=AUD5 OR AUD1=AUD4=AUD5 THEN TOP1_ANY2=1; ELSE TOP1_ANY2=0; RUN; PROC SORT DATA=AUSTRIA_TABLES_2002; BY INDCODE; RUN; *BEGIN INDUSTRY TOTALS FOR 2002*; DATA AUSTRIA_IND_TOTALS_2002; SET AUSTRIA_TABLES_2002; BY INDCODE; IF INDCODE>0 THEN COUNT=1; ELSE COUNT=0; IF First.INDCODE=1 THEN TOTAL_ASSETS_USD=0; TOTAL_ASSETS_USD+ASSETS_USD; IF First.INDCODE=1 THEN TOTAL_REV=0; TOTAL_REV+REV; IF First.INDCODE=1 THEN TOTAL_AUDFEE=0; TOTAL_AUDFEE+AUDFEE; IF First.INDCODE=1 THEN IND_COUNT=0; IND_COUNT+COUNT; IF Last.INDCODE=0 THEN DELETE; DROP DSCD NAME CIK SIC SIC_DESC AUDKEY AUDFEE RELFEE NONFEE RESTATE FYE_DATE FYE FYE_MONTH SIGDATE REV NI ASSETS_USD INDUS2 INDUS3 FF_IND INDCODE_LAG INDCODE_LAG2 INDCODE_LAG3 INDCODE_LAG4 INDCODE_LAG5 INDCODE_LAG6 INDRANK AUD1 AUD2 AUD3 AUD4 AUD5 TOP2 TOP1_3 TOP2_3 TOP3 TOP5 TOP1_ANY2; RUN; *BEGIN HERFINDAHL INDEX FOR 2002*; *OBTAIN TOTAL ASSETS_USD PER AUDITOR PER INDUSTRY*; PROC SORT DATA= AUSTRIA_TABLES_2002; BY INDCODE AUDITOR; RUN; DATA AUSTRIA_AUD_TOTALS_2002; SET AUSTRIA_TABLES_2002; BY INDCODE AUDITOR; IF INDCODE>0 THEN COUNT=1; IF First.AUDITOR=1 THEN AUD_ASSETS_USD=0; AUD_ASSETS_USD+ASSETS_USD; IF First.AUDITOR=1 THEN AUD_REV=0; AUD_REV+REV; IF First.AUDITOR=1 THEN AUD_COUNT=0; AUD_COUNT+COUNT; IF Last.AUDITOR=0 THEN DELETE; DROP NAME CIK SIC SIC_DESC AUDKEY AUDFEE RELFEE NONFEE RESTATE FYE_DATE FYE FYE_MONTH SIGDATE REV NI ASSETS_USD INDUS2 INDUS3 FF_IND INDCODE_LAG INDCODE_LAG2 INDCODE_LAG3 INDCODE_LAG4 INDCODE_LAG5 INDCODE_LAG6 INDRANK AUD1 AUD2 AUD3 AUD4 AUD5 TOP2 TOP1_3 TOP2_3 TOP3 TOP5 TOP1_ANY2; RUN; *MERGE AUDIT TOTALS WITH INDUSTRY TOTALS; DATA AUSTRIA_HERF_2002; MERGE AUSTRIA_IND_TOTALS_2002 AUSTRIA_AUD_TOTALS_2002; BY INDCODE; DROP TOTAL_REV TOTAL_AUDFEE TOTAL_ ; MKTSHARE_ASSETS_USD=AUD_ASSETS_USD/TOTAL_ASSETS_USD; MKTSHARE_SALES=AUD_REV/TOTAL_REV; MKTSHARE_COUNT=AUD_COUNT/IND_COUNT; DROP AUDCODE AUDNAME AUDPARENT BIG4 BIG4_NAMED BIG4_LOCAL AFFIL; RUN; PROC PRINT DATA=AUSTRIA_HERF_2002 (OBS=100); RUN; *AT THIS POINT, THE ABOVE FILE HAS EACH AUDITOR'S MARKET SHARE IN EACH INDUSTRY*; *CALCULATE INDUSTRY SPECIALISTS*; *ASSETS_USD AUDITED*; PROC SORT DATA= AUSTRIA_HERF_2002; BY INDCODE DESCENDING MKTSHARE_ASSETS_USD ; RUN; DATA AUSTRIA_ASSETS_USD_SPEC_2002; SET AUSTRIA_HERF_2002; BY INDCODE; IF First.INDCODE=1 THEN FIRST=1; ELSE FIRST=0; SECOND = LAG(FIRST); IF FIRST=0 AND SECOND=0 THEN DELETE; AUDITOR_L = LAG(AUDITOR); IF FIRST=0 THEN TOP1_ASSETS_USD_MKTSHARE = AUDITOR_L; ELSE TOP1_ASSETS_USD_MKTSHARE = AUDITOR; IF FIRST=0 THEN TOP2_ASSETS_USD_MKTSHARE = AUDITOR; ELSE TOP2_ASSETS_USD_MKTSHARE = .; IF FIRST=1 AND MKTSHARE_COUNT NE 1 THEN DELETE; DROP AUDITOR COUNT TOTAL_ASSETS_USD IND_COUNT DSCD AUD_ASSETS_USD AUD_REV AUD_COUNT MKTSHARE_SALES MKTSHARE_COUNT FIRST SECOND AUDITOR_L COUNT TOTAL_ASSETS_USD IND_COUNT AUD_ASSETS_USD AUD_REV AUD_COUNT MKTSHARE_ASSETS_USD MKTSHARE_SALES MKTSHARE_COUNT AUDITOR FIRST SECOND; RUN; PROC PRINT DATA=AUSTRIA_ASSETS_USD_SPEC_2002; RUN; *AT THIS POINT, THE ABOVE FILE CONTAINS THE TOP1 AND TOP2 AUDITORS *BY MARKET SHARE OF ASSETS_USD AUDITOR *IN EACH INDUSTRY FOR THIS YEAR*;
Can you provide some dummy for 3 countries and 3 years so we can get a better handle on what is actually going on? This link:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... has instructions on how to create data step code from a SAS data set that you could paste into the forum or attach as a text file so we could create data to test code against.
I have a sneaking suspicion that Sorting the main data by country and year and then doing things using BY Country Year and some First.country and First.Year code may handle most of this.
Can you provide some dummy for 3 countries and 3 years so we can get a better handle on what is actually going on? This link:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... has instructions on how to create data step code from a SAS data set that you could paste into the forum or attach as a text file so we could create data to test code against.
I have a sneaking suspicion that Sorting the main data by country and year and then doing things using BY Country Year and some First.country and First.Year code may handle most of this.
Certainly; the attached Excel file has data for three firms, each in a different country, and across three years for each.
I think your idea of the solution may indeed be that simple. It's just that I haven't LOOKED at this in almost 2 years, so it is rather difficult to think about doing anything other than hitting the "Run" button.
I appreciate your time.
You actually need to read what people are posting. @ballardw gave you the link about how to post data here. DO NOT use Excel.
I did read what he said, but for the small sample data he requested (9 observations), it seemed much simpler to provide a dummy version of the data. @ballardw if you are not able (or permitted?) to use the Excel file, is there a faster way I can share this small dataset with you?
Alternatively, @ballardw I understand the end result necessary after visiting the page you suggested. That is, you want the SAS code that would "write" this dummy data set. I can do that without the macro and will add it once I have done so.
@Kurt_Bremser Franky, I do not use macros; while I understand their usefulness I do not have the skill the write them, and they are often not necessary for the type of data I deal with on a day-to-day basis. As a result, implementing pre-written macros is often a failed effort on my part.
About your problem:
Identify the items in your code that you would have to change
Put those items into macro variables and use them in your code
Once the code works, wrap it into a macro that accepts the variables as parameters
Store your parameter combinations in a dataset and use call execute() to run yor macro for each combination.
Hey guys, I just figured out a solution that circumvents the "code" itself. But thanks for the help!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.