- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What is functional data?
We work with data that is typically represented as discrete measurements at specific points in time or space. An example shown below informs us that for each product/batch ID, we have readings available for variables x1 and x2. We can predict y using the variables x1 and x2 at the ID level using supervised learning techniques.
Table 1: Regular structured data
What if we encounter a scenario where instead of discrete values, we are given continuous curves to predict the same target y? In the table below, we have a scenario where x2 is replaced with continuous curves i.e. multiple measurements for x2 are available for each ID. These continuous curves could be a function of time or any other metric such as wavelength, voltage, etc. These types of data are called functional data.
Table 2: Functional data
The key difference between the two scenarios is that while regular data only provides snapshots at specific points, functional data captures the changes of a phenomena across a continuum. In this article we will investigate functional data analysis for IoT in SAS Viya.
Where do we encounter functional data?
The simple answer is that functional data is encountered whenever measurements are recorded continuously over a domain rather than at isolated points. Consider an IoT use case in the semiconductor manufacturing industry where wafer analysis plays a critical role in ensuring high-quality chips and optimizing production processes. Functional Data Analysis (FDA) offers a powerful framework for examining continuous, high-dimensional data, making it particularly valuable in wafer inspection, metrology, and process control. During semiconductor fabrication, wafers undergo multiple processes, including lithography, deposition, doping, metrology, etc. In the deposition process, sensors continuously record key metrics such as gas flow, temperature, heater modes, power monitors, and others throughout the operation, capturing their variations as continuous functions over time. FDA can leverage these functional inputs from the deposition step to predict wafer thickness during the subsequent metrology stage, enabling more accurate process optimization and defect detection.
Other areas of IoT where functional data occurs quite commonly (but not limited to) include:
- IoT-enabled wearables and medical devices collect real-time functional data on heart rate, blood pressure, glucose levels, oxygen level, etc. which enables early diagnosis of conditions and preventative care.
- In smart agriculture and precision farming, IoT sensors gather functional data on environmental elements such as soil, moisture, humidity, etc. which can be used to reduce resource consumption and improve yield.
- In supply chain and logistics optimization, IoT enabled trackers collect functional data on temperature, humidity, location, and handling conditions for cold chain monitoring in pharmaceutical and food supply chains.
How do we analyze functional data?
The easiest way to work with functional data is perform summarization of the curves such as mean, min, median, max, etc. Additionally, we can summarize the shape of the curve itself such as slope, inflection point, and others. This however comes at the expense of throwing away a lot of the information from the original curves thereby losing temporal trends and inherent structure.
Available as part of SAS Viya 2024.11, functional principal component analysis (FPCA) is a technique appropriate for analyzing functional data. FPCA is an extension of Principal Component Analysis (PCA), a widely used dimensionality reduction technique. PCA identifies principal components—vectors that maximize variance along their direction—where each component captures a portion of the total variance in the data. FPCA, while conceptually similar, differs in that it operates on functions rather than vectors. Consequently, the principal components in FPCA are also functions or curves, reflecting the continuous nature of functional data. There are two CAS actions available for SAS Viya users fPca and fPcaScore. Let’s understand how these CAS actions work on functional data.
Consider a simple example where we have ten curves (x1, x2..., x10) measured at equally spaced timestamps with their own intrinsic characteristic shape features represented in Fig 1 below.
Fig 1: Functional data represented by curves x1-x10
The CAS action fPca trains the data on the input curves such that every individual function/curve above can be reconstructed by a combination of the mean curve and the functional principal component scores along with the eigenfunctions that are generated by the code snippet below.
result = s.fPca(
table={"name": "fpca_train_data"},
output={"casout": {"name": "SCORE", "replace": True}, "npc": 2},
eigenVec={"name": "EIGENVEC", "replace": True},
eigenVal={"name": "EIGENVAL", "replace": True},
saveState={"name": "trainStore", "replace": True}
)
In the above code ‘npc’ specifies the number of principal components which has been set to 2 in this case. The code produces four different output tables
- “output” -contains the principal component scores of the training data
- “eigenVal” – contains the eigenvalue matrix
- “eigenVec” - contains the eigenvector matrix
- “saveState” – contains the state of eigenvector matrix for scoring
Using the output tables, we can approximate the original curves from Fig 1 by using a combination of the mean function with the corresponding eigenfunctions and their respective principal component scores. Figure 2 below shows the reconstruction of the curves x1 and x10 using this technique.
Fig 2: Reconstruction of curves x1 and x10 using FPCA
The two eigenfunctions encapsulate the variability of all individual curves in two distinct directions, while the principal component scores quantify each curve's specific contribution to these eigenfunctions. Throughout this process, the mean function and eigenfunctions remain consistent across all curves, with differentiation arising from the principal component scores. This transformation reduces a function to a few key numerical values, simplifying its representation. For instance, the curve x1 can now be characterized solely by the scores PC_1 Score1 and PC_1Score2. Following this process, one can reconstruct all ten signals from Figure 1 as shown below.
Fig 3: Reconstruction of curves x1 through x10 using FPCA
An additional CAS action available for functional data analysis is fPcaScore. This action scores new functional data based on eigenfunctions that are derived from a prior FPCA training analysis. The table saved from the “savestate” parameter in the fPca CAS action needs to be invoked in this step.
Let’s go back to the original problem we posed at the beginning of the article that stems from working with data of the form below where instead of discrete values, we have continuous curves to predict the target y.
Table 2: Functional Data
We can now use fPca to represent the structural patterns of the curves within variable x2 by their functional principal components (2 components used) and reformat the same dataset into the form below. These fpca scores combined with variable x1 can easily predict y using any of the supervised learning techniques from SAS.
Table 3: Representation of functional data using fpca scores only
Functional Data Analysis (FDA) is critical in IoT because it provides a structured way to handle complex, continuous data streams efficiently. By leveraging FDA techniques, IoT systems can enhance accuracy, reduce storage costs, improve decision-making, and enable real-time monitoring—critical for the success of smart and connected technologies.