intltechventures.blogspot.com: 2019-09-02 Monday - Future Research Project: ML Applied to Enterprise Architecture Diagrams

2019-09-03

2019-09-02 Monday - Future Research Project: ML Applied to Enterprise Architecture Diagrams

After reading the two research papers below, I have some thoughts on a possible future ML research problem I would like to explore.

This posting is a placeholder for me to organize notes, thoughts, and gather additional background reading material.

Problem Space/Conditions/Assumptions:

For a given organization, their collection(s) of Enterprise Architecture diagram images are created with a variety of non-homogeneous tools/icons - over time, by different authors (internally, and externally) - resulting in different file formats being used (e.g. Google Diagrams, PowerPoint, LucidCharts, Visio, JPEG, PNG, BMP, etc.)
Enterprise Architecture diagrams that are stored in image file formats - and do not rely upon a shared repository - and a reusable element-level inventory definition - require extensive manual effort and time to analyze for impact analysis.

Goals:

Leverage ML algorithms to:

Automate the process of transforming disparate Enterprise Architecture diagrams (in various source file formats) - into a standard machine-readable meta file format (e.g. Sparx EA XMI).
Specifying an industry standard modeling notation for use in the target output meta file format (e.g. UML, ArchiMate, etc.)
Automatically analyze diagrams - and identify possible categorization relationships between elements/components (organize? or, suggest categorization?) - by inferred organization/application/system boundaries

e.g. from a collection of dozens, or hundreds of diagrams - identify candidate elements that may represent reusable components, or that may be repeated across diagrams, or that may be candidates for reusable components (or, shared services)

Research Papers (Background Reading): (work-in-progress...)

Paper: New trends on digitisation of complex engineering drawings

Neural Computing and Applications

June 2019, Volume 31, Issue 6, pp 1695–1712

Authors: Carlos Francisco Moreno-García, Eyad Elyan, Chrisina Jayne

https://link.springer.com/article/10.1007/s00521-018-3583-1

https://link.springer.com/content/pdf/10.1007%2Fs00521-018-3583-1.pdf

Abstract: "Engineering drawings are commonly used across different industries such as oil and gas, mechanical engineering and others. Digitising these drawings is becoming increasingly important. This is mainly due to the legacy of drawings and documents that may provide rich source of information for industries. Analysing these drawings often requires applying aset of digital image processing methods to detect and classify symbols and other components. Despite the recent significant advances in image processing, and in particular in deep neural networks, automatic analysis and processing of these engineering drawings is still far from being complete. This paper presents a general framework for complex engineering drawing digitisation. A thorough and critical review of relevant literature, methods and algorithms in machine learning and machine vision is presented. Real-life industrial scenario on how to contextualise the digitised information from specific type of these drawings, namely piping and instrumentation diagrams, is discussed in details. A discussion of how new trends on machine vision such as deep learning could be applied to this domain is presented with conclusions and suggestions for future research directions."

Paper: Learning icons appearance similarity

(Submitted on 1 Feb 2019)

Authors: Manuel Lagunas, Elena Garces, Diego Gutierrez

https://arxiv.org/abs/1902.05378

https://arxiv.org/pdf/1902.05378.pdf

Abstract: "Selecting an optimal set of icons is a crucial step in the pipeline of visual design to structure and navigate through content. However, designing the icons sets is usually a difficult task for which expert knowledge is required. In this work, to ease the process of icon set selection to the users, we propose a similarity metric which captures the properties of style and visual identity. We train a Siamese Neural Network with an online dataset of icons organized in visually coherent collections that are used to adaptively sample training data and optimize the training process. As the dataset contains noise, we further collect human-rated information on the perception of icon's similarity which will be used for evaluating and testing the proposed model. We present several results and applications based on searches, kernel visualizations and optimized set proposals that can be helpful for designers and non-expert users while exploring large collections of icons."

intltechventures.blogspot.com