Hello everyone, I have a monthly process where I get a table on which I need to perform a transformation that goes by adding the variable M1 and M2 of each branch. The problem is that from month to month I can have more or less branches, so my variables can vary, which makes it difficult for me to create a simple query to add variables having previously declared them. I need some procedure that knows it should sum variable starting with M1 with variable starting with M2, by branch name, also being prepared to changing number of branches (variables) from month to month.
Below I have two tables, which I hope will allow you to understand what my problem is. Additionally, I give you two other tables that will be the desired result.
August Input:
TYPE | M1_BRANCH1 | M1_BRANCH4 | M1_BRANCH9 | M2_BRANCH1 | M2_BRANCH4 | M2_BRANCH9 |
CASH | 10 | 20 | 15 | 15 | 20 | 30 |
August desired Output:
TYPE | BRANCH1 | BRANCH4 | M1_BRANCH9 |
CASH | 25 | 40 | 45 |
September input:
TYPE | M1_BRANCH1 | M1_BRANCH4 | M2_BRANCH1 | M2_BRANCH4 |
CASH | 10 | 20 | 15 | 20 |
September desired Output:
TYPE | M1_BRANCH1 | M1_BRANCH4 |
CASH | 25 | 40 |
Thank you so much for your help.
Corrected the desired output for september.
The data structure you have currently is making your task extremely difficult. Having data in the variable name, such as branch and month is the main issue an will likely cause many problems in the long run.
You would be better off to reshape your data for long term use in a form such as
Type Branchname ReportMonth Value(the numbers)
Where your Branchname variable would have values like Branch1 Branch2. The report month would be best for a lot of reasons to be an actual date, such as the first day of the month.
If "M1" "M2" represent a different category then that should also be a different variable hold the value or meaning of M1 M2
And just exactly what are you adding to get those values for September in your example? If we add everything in your example it only totals 75 but you show 80 as the "sum" for M1_Branch4. Example outputs should use example inputs.
Please provide realistic input data what you expect for output for the example. As text Or Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Sorry, i've already corrected the desired output. M1_BRANCH1 should sum with M2_BRANCH1.
I understand your comment regarding the data structure not being adequate, however I am not the owner of the table, so I cannot change the structure of the original table.
@jmrc wrote:
Sorry, i've already corrected the desired output. M1_BRANCH1 should sum with M2_BRANCH1.
I understand your comment regarding the data structure not being adequate, however I am not the owner of the table, so I cannot change the structure of the original table.
That does not matter. You make a properly structured data set from the garbage someone sends you and them make a report from that. The intermediate steps are up to you.
I have had to work with lots of garbage file formats. As long as the output is as desired and matches the users need they generally don't care what goes on in the background as long as everything is repeatable.
And consider if you have nicely structured collection of data sets when the person providing this to you asks "what about a total over the period from Mar 2019 to May2021" or "I only want to see Branches X., Y and Z" for some report then you are golden and use the data you restructured to use the actual dates and branch values (NOTE: Your example has a terrible problem just waiting to happen with Years and order of "months" [if you are actually thinking of using "September"].)
SAS reporting or analysis procedures can do many things with values. Not so much with variable names containing data.
Each sas table observation (row) has the same number of columns (ID and max number of branches).
I suppose yourinput is probaly excel iported into sas. In such case all tyepes have the same number of barnches, either branch1, branch2 or barnch(n). Ii is possible that the value of a specific type,branch(x) is missing. Remember that the sum result of values, where one of them (or more) is missing - the result is also missing value.
Suppose there are maximum N branches in the input, that is the variables are: M1_branch1, M1_branch2, ..., M1_cranchN and M2_branch1, M2_branch2, ..., M2_cranchN. In such case the simplest code will be:
%let N=4; /* maxno of barnches */
data want;
set have;
array m1 {*} M1_branch1 - M1_branch&N;
array m2 {*} M2_branch1 - M2_branch&N;
do i=1 to &N;
m1(i) = m1(i) + m2(i);
end;
keep type M1_branch1 - M1_branch&N;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.