Connect two data mappings OR two joins at once.

RVO · Posted 08-26-2024 04:26 AM

I'm stuck in getting the folowing to work:

I have multiple tables in Viya:

company_data: contains companyID (unique) and company details (name, location, etc.)
company_valueChains: contains companyID and valueChain. Note: a companyID can have multiple valueChains.
company_productsCategories: contains companyID and productsCategory. Note: a companyID can have multiple productsCategories.
A few more company_.... tables, along 2. and 3. logic.
hierarchyTable: contains valueChain, productCategory, and the others from 4., left out for simplicity of question.

Now what I want to achieve in Viya:

A dynamic, hierarchiecal filter. So I can click on a valueChain -> productCategory -> etc in a treemap and use it to filter companies. I have managed to get this to work by creating a massive 'flat' table, where tables 1-4 are merged into one (either in EG or by creating multiple joins in Viya, eg. join table1 and table2, then join table3 on 'join1', then join table4 on join2, etc.. Then in Viya a data mapping is made between the hierarchyTable and the relevant columns in the massive new table. But this tables is already >7k rows from just 29 companies. And there will be thousands in there, I expect Viya to crash (from historical experience with large tables in VA/Viya).

So my preferred solution would be: map the hierarchyTable columns to the resepective table (e.g. valueChain column to the same column in the company_valueChain table), and also map the companyID from the (for example) company_valueChain table to the companyID in the company_data table.

So a double-linked mapping of some sorts. But when I then click on a value chain in the tree map, I get the error: data is not mapped. And nothing happens.

Joining the data in Viya by multiple joins (which creates the intermediate join1 table, join2 table, etc). is a bit cumbersome and for large datasets very slow it seems (and many intermediate tables get created that are not used).

Long story short: how can I achieve what I want in Viya? In any program I would just query the server with a SQL with a double left join and be done with it, but as far as I know that cannot be done in Viya.

Any solution that solves the problem works for me; the title were just two solutions I tried but could not get to work.

ballardw · Posted 08-26-2024 09:46 AM

I'm not sure you have provided a clear enough description of your current process or even desired outcome to make relevant suggestions.

If any of your data sources are relatively static, meaning don't change a great deal from run to run, you might be able replace some of that "joining" with custom formats built from the data.

SASKiwi · Posted 08-26-2024 05:38 PM

My thought would be to consolidate this data into two tables which can then be two VA data sources. One for company data with associated value chain ID and one for value chains. The idea would then be to have two report objects that are linked. Selecting a row in the company object would then trigger a filter on the value chain object.

@ballardw 's idea of using custom formats to join in required additional data on the fly is a good one.

RVO · Posted 08-27-2024 07:04 AM

Thanks for the replies; I don't understand however what you mean with custom formats to join? Could you maybe give a simple example?

For better context of my question:

There is a hierarchy of valueChain -> productsCategory -> products.

In Viya, I want to get an overview of all available valueChains, then double-click on one specific one, and then see which productCategories are there available, while at the same time, filtering a list at the bottom that shows all companies that are operating in that valueChain. Then I want to be able to double-click one specific productCategory, see which products are within that category, and further filter the companies that operate in the valueChain and have the productCategory. Finally, I want to click on a product, and only see companies left that have that product.

I hope I made it more clear this way. The double-clicking feature I have achieved by having a treemap, with the valueChain-productsCategory-products hierarchy as the tile-role. For size I have created a custom variable that is always 1, so all 'blocks' are of equal size.

As stated, I can get the desired behaviour by having a massive table that would look like:

But when a company has many valuechains, categories, and products this table explodes. In my case, with a 'sample' of 29 companies, it explodes to 2006 rows.

In my previous experiences programming websites in PHP/MySQL, I would create multiple tables:

companies: (each company 1 record)

valueChain: (each company can have many records)

<companyID> <valueChain> (optionally valueChainID and valueChainName)

productCategory: (same as above)

<companyID> <productCategory> (same comment as above)

products: (same as above)

<companyID> <product> (same comment as above)

filterHierarchy: (this could also be split in 2 tables, but for simplicity it is 1 now)

Then, I would create a dynamic filter, based on filterHierarchy. So when a person clicks on a valueChain, I would query:

SELECT companies.* FROM companies LEFT JOIN valueChain ON companies.companyID = valueChain.companyID

And I would get all relevant companies.

For the filter I would query:

SELECT productCategory FROM filterHierarchy WHERE valueChain = <whatever valueChain was selected>.

The problem in Viya compared to above:

If it were ONLY the valueChain, I could make 1 join from 2 tables, and voila, done. But I have 4 tables to join (so 3 joins, as explained in original post). Which, to my knowledge, cannot be done at once in Viya. If I do the joines in EG to create a massive 'flat' table, it explodes in size (29 companies = 2006 rows). There will be thousands of companies. In my experience > 500K rows lets Viya grind to a halt...

I hope I made things more clear and that there is an elegant soluction I'm not aware of or are blindly glancing over.

ballardw · Posted 08-27-2024 10:28 AM

@RVO wrote:

Thanks for the replies; I don't understand however what you mean with custom formats to join? Could you maybe give a simple example?

Custom formats or informats can return values based on a single variable value. Consider that you have Site identification code that uniquely identifies a site (location/ person/object). You could have a format that returns specific items about that site based on the code value such as street address, city, state, postal code, manager name, business description, just about any piece of information.

Depending on need you can use the format directly in a procedure, such as proc print or create a new variable.

You should have the SASHELP.CLASS data set. This is small enough to demonstrate some use of formats instead of Join. First create the formats. Formats if stored in permanent libraries can be reused just like the formats SAS provides. They are relatively easy to make from data sets that have value pairs like a key, the variable you would join on in a query, and the value.

Proc format library=work;
value classhours
11,12 = '10AM - 2PM'
13,14,15 = '2PM - 4PM'
16  = '9AM - 3PM'
;
value $HistAssignment
'Alice','Jane','John' = 'Fall of Rome'
'Alfred','Henry','Janet','Philip' = 'Renaissance Painters'
other = 'Origins of Chocolate'
;
run;

To add variables with the values from the custom format:

data example;
   set sashelp.class;
   Classhours = put(age,classhours.);
   HistoryAssignment = put(name,$histassignment.);
run;

This is similar to joining on one data set that has Age and classroom hours values and another dataset with name and the history assignment.

Another value of formats comes in analysis and reporting though. Quite often you join data and then count the results. That can be done directly in the SAS reporting procedures.

proc tabulate data=sashelp.class;
   class age name;
   format age classhours. name $histassignment.;
   table age name,
         n
   ;
run;

The reporting, analysis and most graphs can use the formatted values instead of actually adding a variable to the data set. This is a limitation though that only one format can be associated with a variable at the time of analysis or reporting.

Note that in the $Histassignment a special assignment rule 'other' was used. That means that any value not specifically listed, such as the seven names listed, gets a specific value assigned. In an SQL type query that would require adding a CASE statement on top for the "not specifically returned" by the Join values.

You can also use INVALUE statements to convert text to numbers with some more flexibility.

Consider a case where you have field staff that are supposed to enter an F or M for gender, or a . if not recorded. For some other purposes you need that to be a numeric value of 1, 2 or missing. But field staff being human some of them may enter upper case and others lower case letters.

An example of an informat to read such, provide some feed back about incorrect data entry, and assign expected values:

Proc format library=work;
invalue sexnum (upcase)
'F' = 1
'M' = 2
'.' = . 
other = _error_
;
run;

data example;
   input s :sexnum.;
datalines;
f
F
M
m
x
.
;

The option UPCASE on the Invalue says to upper case the text before comparing to the list of values. So that fixes case without a lot of work. Not much of a headache for single letters but some things this may save a lot of time.

Second the Other, which means any value other than the specified F, M or . is treated as invalid data and you get a diagnostic in the LOG showing the value of the line read with a ruler and the values of resulting variables.

I use this OTHER _error_ to read site location codes for a project where the staff will create new sites without telling the analyst. So when I encounter one of these I can contact the staff to get the metadata (that goes into other formats such as site name, location and such) OR correct the entry in the file. Update the invalue statement to add the new value to the acceptable list and rerun the program to validate.

An equivalent "query" approach would have to join on the value of the site (or sex code in the example) and then provide logic in the form of case or separate step to handle the exceptions.

Also, if you have multiple variables that have the same values then the same format can be used with multiple variables without a lot of work.

Connect two data mappings OR two joins at once.

Re: Connect two data mappings OR two joins at once.

Re: Connect two data mappings OR two joins at once.

Re: Connect two data mappings OR two joins at once.

Re: Connect two data mappings OR two joins at once.

Connect two data mappings OR two joins at once.

Re: Connect two data mappings OR two joins at once.

Re: Connect two data mappings OR two joins at once.

Re: Connect two data mappings OR two joins at once.

Re: Connect two data mappings OR two joins at once.

2025 SAS Hackathon: There is still time!