So basically I have was given this massive dataset with 429300 rows of data. This is my job SAS job and I have no idea where to start.
The dataset consist of 30 months of data, each month consist of 2 sectors plus sum of both sectors (so three values, 1 , 2 and 0 to represent sum of both sectors), each sector consists of 4 sizes + sum of all 4 sizes (so 1,2,3,4 + 0 to represent sum of sizes, 5 values in total), each size consists of 8 states/territories in Australia plus 0 to represent entire country (1,2,3,4,5,6,7,8 + 0, so 9 values).
Each of the states/territories consists of 19 industry divisions and subdivisions.
Each division have different number of subdivisions. There are a total of 86 subdivisions, classified into one of the 19 divisions, so each division will have its number of subdiv + 0 to represent total for that division.
For example, Division A. Agriculture has 5 subdivisions. So for division A, it will have consists of 6 values (the 5 subdivisions + 0 to represent the total of the subdivision, so 6 values), subdivision is the lowest level.
So for each states/Territory, there are total of 106 values (86 subdiv, 19 divisions, plus a 0 to represent total of all divisions in that state).
so for each month, there are 106 div/subdiv * 9 states/territory * 5 sizes * 3 sectors = 14,310 rows of data.
I have attached a sample CSV for one month with values in column var1, var2 and var3 that's just randomly generated in excel so you can get an idea of the dataset structure.
How can I find the value of all the sum values represented by 0.
So I need to add up all the subdiv to get the sum for that division, then add up for division to get sum for that state/territory, then add up all the states to get total for that size, add up all the size to get that sector, add up all the sectors to get total for that month cycle. And do that for 30 month cycles. Pretty much I have to find sum of the var1, var2 and var3 for all combinations of sector, size, state, div/subdiv.
If there is a 0 in any of the columns sector, size, state, div or subdiv. I need to fine the correct values in column var1 to var3.
Any help would be highly appreciated.
Hello, @Nietzsche most of us will not try to write a program based on data in an Excel file. Many of us will not even download an Excel file, as it can be a security threat. If we are going to write SAS code, we need data in a SAS data set; we need you to provide the data as working SAS data step code, which you can type in yourself or you can follow these instructions. We need the data in this form and not any other form. Some people ignore this request and choose some other form to provide data, please do not ignore the request and please do NOT provide data in some other form.
Sums for five levels of hierarchical combination are easy to produce (if I am understanding you properly) via PROC SUMMARY.
sas7dbat file uploaded.
This does not meet the requirements that I asked you to follow. I was very specific about what I needed, and I asked you not to ignore the request and provide data some other way.
Some people will download sas7bdat files; I will not download ANY files.
@Nietzsche I'm not sure I understand the problem, and like others, I'm not downloading or opening attachments.
I find that the best approach is to simplify the question, and I suspect you may want to look at PROC SUMMARY (see example below)
/* Create sample "have" data */ data have ; do sector ="A","B","C" ; do subSector = "1","2" ; do id=1 to int(ranuni(1)*3)+1 ; value1 = int(ranuni(2)*10) ; value2 = int(ranuni(3)*10) ; sumOfValues=sum(value1,value2) ; output ; end ; end ; end ; run ; proc summary data=have ; by sector subSector ; var sumOfValues ; output out=want sum=sumOfValue ; ; run ;
If this isn't what you are looking for then I recommend you
I think this seriously needs to be restated:
How can I find the value of all the sum values represented by 0.
The sum of all values represented by 0 should be 0. So you have not described 1) which variable or variables in your example are to be summed (variable names are useful) or 2) which variable containing 0 may be important or how to use them, apparently 5 variables and many of the values are 0. So talk us through a bit more of exactly which junk is supposed to be "summed". Lists of values does not describe actions. Not to mention that your DIV variable is character.
Example data should be small enough that you can walk through an example and show a result by hand.
Or if you are just asking for "which rows have a 0 in one or more of those variables"
data junk; set cycle202110; if (whichn(0,sector,size,state, subdiv)>0 ) or Div='0'; run;
Hi guys, I am sorry I have to admit I did not read PaigeMiller's instruction properly.
I will try to follow the instruction to convert dataset into data step later today.
in the meantime I will have a look at proc summary to see if I can come up with a solution.
As a point of interest, PROC SUMMARY and PROC MEANS are very similar procedures. In fact, the PROC SUMMARY documentation points you to PROC MEANS and says that: "The SUMMARY procedure provides data summarization tools that compute descriptive statistics for variables across all observations or within groups of observations. The SUMMARY procedure is very similar to the MEANS procedure; for full syntax details, see MEANS Procedure. Except for the differences that are discussed here, all the PROC MEANS information also applies to PROC SUMMARY."
We covered PROC MEANS in the Programming 1 course. One of the main differences between MEANS and SUMMARY is where the output goes after it is created. MEANS generates a report table and SUMMARY generates an output dataset. That is the default behavior for each procedure, which you can always change.
How many variables are in this dataset? What are their names? What do they mean? Are they the things you mentioned in the text of your message, like MONTH, SECTOR, SIZE, STATE, DIVISION, SUBDIVISION
How many analysis variables are there? What do they mean?
You didn't attach the CSV file you claimed. But why attach a file to share text? Just open a text box using the INSERT CODE icon on the forum editor and paste a few sample lines from the CSV file into the text box.
There is no need to share example data that has all of levels of each variable (or even all of the variables) to get an idea of how to code the analysis.
Essentially you should be able to use PROC SUMMARY to do the analysis.
proc summary data=have ; class MONTH SECTOR SIZE STATE DIVISION SUBDIVISION; var VAR1-VAR3; output out=want sum= ; run;
If you only want some of the combinations of the CLASS variables then add a TYPES statement. For example to get the total per SUBDIVISION nested inside DIVISION and also the DIVISION total you would use:
types division division*subdivision;
Here is an example using SASHELP.CLASS to get you started.
proc summary data=sashelp.class chartype; class sex age ; types sex sex*age ; var height weight; output out=want sum=; run; proc print data=want; run;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.