Data Management Component (DMC) for Virtual Factories Systems
Our digital world is built on information. Data is everywhere and everybody uses it for their everyday business. Data management has recently seen a surge in popularity across companies, organizations, analysts and advisors. Data is a representation of facts. By placing data in context, information is created. The absence of good data management component often automatically means that information and wisdom may be lost leading organisations of every domain, but especially in the manufacturing domain, to take non-informed decisions. In the long run, this will paralyze an organization to the extent that it can no longer function properly.
vf-OS Data Management Component is concerned about the whole spectrum of activities directed towards the distribution, transformation, storage and analytics of data. As such, the focus of the Data Management Component (DMC) proposed in vf-OS is on providing a set of semi-independent but related services taking the inputs of a variety data at large scale, with different characteristics such as transmission speed, etc., and providing set of nontrivial analytic operators. It is composed by four subcomponents: Data Infrastructure Middleware; Data Storage; Data Harmonisation; the Data Analytics.
Data Infrastructure Middleware
This component specifies and implements a data bus that will support the other subcomponents and the overall vf-OS application for data storage, transformation, and analytic operations. The data infrastructure will contain adapters in order to aggregate data from various enterprise information sources including: machines, hardware sensors (which might include high-precision camera data, accelerometers, vibration, and temperature sensors), software sensors from Enterprise Resource Planner (ERP) systems and external business context data, etc. and take use of other vf-OS activities. Since sensorial data typically generates large amounts of micro measurements, the supporting data infrastructure provides a high throughput technology pipeline for acquisition, pre-processing and aggregation of collected data.
To implement the message-oriented approach the AMQP protocol implemented by a RabbitMQ message broker was selected. AMQP is based on four basic definitions:
Message – basic unit of transmitted data, the main part of which is not interpreted by the server (only headers are processed)
Exchange point – delivery service for messages, all messages are sent into this point to be then distributed into different queues
Queue – here the messages are being kept, until the client request
Bindings – deliver necessary information about routing, in different types of exchange it is provided by different entities.
This component implements the storage services for vf-OS assets. It supports three major dimensions of “Big Data” when dealing with intensive streaming data, namely: Volume (scale of data being processed), Velocity (data transmission speed and optimized reaction time), and Variety (supporting heterogeneous types of data). It implements a scalable data storage system, capable of handling real-time sensor data and events, based on an underlying infrastructure that transparently absorbs very large amounts of data, as well as other types of non-real-time heterogeneous data. So, it is not possible to think of a solution based on a single storage system. Here is where polyglot persistence comes into play. Following this paradigm, four different storage services are provided:
Relational data – to store relational data from vApps as well as other relational information of the vf-OS Platform itself
Time Series data – to allow storage and querying time series data
Document-Oriented data – to store, retrieve and manage document-oriented information, also known as semi-structured data
RDF data – to store and query subject-predicate-object triples.
Data Harmonisation aims at (i) extracting more information from incoming data and (ii) preparing the data in a form suitable for other vf-OS components and vf-OS Assets. The key aim is to encode functional transformations of data to help the data (pre-)processing as long as vApps need it.
Issues related to data extraction from heterogeneous sources with further transformation of them are solved by so called Extract-Transform-Load (ETL) Systems. The main goal of ETL systems is to make data more useful for further processing and analytics. Data extraction aims at getting data from the source, which can be presented in form of flat file or data base source. Transformation is a process of “cleaning” data or bringing them to the form corresponding to the scheme, including some actions such as: normalization, filtering, etc. And the last stage is loading them to the data storage.
Since vf-OS is oriented towards the software developers wanting to develop vApps for the Manufacturing Industry, it is expected that the ETLs would be part of the execution of a vApp. However, vf-OS follows a different approach: instead of programming the ETLs directly in the source code of the vApp, the ETLs will be programmed as independent services and, thus, deployed as standalone services that could be re-used by other vApps and launch several instances of the same service if deemed necessary.
This last component of the DMC covers the creation of building blocks for off-line analytical processing of inputs coming from the Manufacturing environment. This will include machine learning algorithms supporting supervised and unsupervised scenarios. The key research innovation will be provided by using “multi-level” analysis on the top of more traditional machine learning algorithms simultaneously observing the data on multiple aggregation levels.
In a factory there will be two different types of analytics tasks to be performed: those ones depending on historical data sources, e.g. readings of sensors for a given period of time in the past, and those ones that need to be calculated on real-time as they will be involved in triggering alarms conducting to time (or spend) critical actions. As such, two different modules will be integrated into the Data Analytics component focusing each of them in these two areas. Similarly, to the Data Harmonisation, the Data Analytics will be composed of a set of libraries each of which covering complementary functionality. There will be libraries for the various analytics approaches as for modelling and training like Random Forest predictors, which will be built on top of a SOTA analytics tool that offers an API to access to such different algorithms.
For more information please visit the vf-OS Website