Data Mining for industrial 4.0

How to manipulate industry 4.0 data to train classifiers

22 Giugno 2022

To train a machine learning for the recognition of ”topics” I used the data, approximately one year of production, of three companies belonging to the 4.0 industry operating in the manufacturing sector, in particular in the textile sector.

Companies have store production data in relational databases. For each archived data, the timestamp, the value produced by the sensor and the name of the topic are stored.

The values obtained directly from the sensors of industrial machines are stored in the relational database, therefore no coding process of the stored data. From these data all statu messages that are not needed for the correct learning of machine learning were excluded.

Below a summary table of customer in table 1 and in table 2 the list of kind of topic for each customers are obtained.

Client Number of data Number of topic
Customer 1 5788 7
Customer 2 1291 1
Customer 3 51045 12
Client 4 51049 12
Table 1: List of dimension data customers
Topic Customer 1 Customer 2 Customer 3
topic 1 X X
topic 2 X X
topic 3 X X
topic 4 X X
topic 5 X X
topic 6 X X
topic 7 X
topic 8 X
topic 9 X
topic 10 X
topic 11 X
topic 12 X
topic 13 X
Table 2: List of dimension data customers

To provide a better way of dataset to train the machine learning, it was decided to group by the origin data of grouped by ten minutes by timestamp and topic. Then for each topic group it has been obtained the min and max value, the average and the deviation standard.

For each group of topic of ten minutes, it has been saved also the start of timestamp, the endo of timestamp and the topic name.

In table 3 we can see the first ten row of an extract of example of dataset grouped for the machine learning.

topic datetime start datetime end occurs min max average dev stand average dev stand
topic 1 2021-12-17 10:37:01.764000 2021-12-17 10:48:58.493000 35 0 2 0.114286 0.403288 0.114286 0.403288
topic 1 2021-12-17 10:48:58.493000 2021-12-17 10:58:58.987000 86 0 1 0.0581395 0.234007 0.0581395 0.234007
topic 1 2022-02-22 07:29:31.331000 2022-02-22 07:39:43.586000 193 0 1 0.217617 0.416156 0.217617 0.416156
topic 1 2022-02-22 07:39:43.586000 2022-02-22 07:50:46.690000 145 0 1 0.172414 0.37774 0.172414 0.37774
topic 2 2022-02-22 08:41:36.819000 2022-02-22 08:51:48.124000 259 0 3 1.03089 0.842301 103.089 0.842301
topic 2 2022-02-22 08:51:48.124000 2022-02-22 09:01:50.280000 325 0 4 1.11077 0.960079 111.077 0.960079
topic 2 2022-02-22 09:01:50.280000 2022-02-22 11:10:12.621000 165 0 5 1.4 1.42063 1.4 142.063
topic 2 2022-02-22 11:10:12.621000 2022-02-22 11:53:42.168000 7 0 1 0.285714 0.515079 0.285714 0.515079
topic 2 2022-02-22 11:53:42.168000 2022-02-22 12:03:43.412000 229 0 25 2.24891 4.77293 224.891 477.293
Tabele 3: Dataset groupped by timestemp and topic

From the data of production of the 3 client above mentioned, we have obtained 4 dataset to train and test the machine learning, from the data of thirth client we have obtained 2 dataset.

In table 4 we can see the The number of range of dataset par client. The dataset 3 and 4 of the data production of client 3 are distributed equally by topic.

Client Number of data Number of topic
Data set 1 05/11/1915 7
Data set 2 14/07/1903 1
Data set 3 02/10/2039 12
Data set 4 06/10/2039 12
Table 4: The number of range of dataset par client
Copyright © All rights reserved | This template is made with by Colorlib
Contents edited by Giuseppe De Martino
Licenza Creative Commons