50 results found
Gan HM, Fernando S, Molina-Solana M, 2021, Scalable object detection pipeline for traffic cameras: Application to Tfl JamCams, Expert Systems with Applications, Vol: 182, Pages: 1-15, ISSN: 0957-4174
With CCTV systems being installed in the transport infrastructures of many cities, there is an abundance of data to be extracted from the footage. This paper explores the application of the YOLOv3 object detection algorithm trained on the COCO dataset to the Transport for London’s (TfL) JamCam feed. The result, open-sourced and publicly available, is a series of easy to deploy Docker pipelines to create, store and serve (through a REST API) data on identified objects on that feed. The pipelines can be deployed to any Linux machine with an NVIDIA GPU to support accelerated computation. We studied how different confidence thresholds affect detections of relevant objects (cars, trucks and pedestrians) in London JamCam scenes. By running the system continuously for 3 weeks, we built a dataset of more than 2200 detection datapoints for each camera (̃6 datapoints an hour). We further visualised the detections on an animated geospatial map, showcasing their effectiveness in identifying traffic patterns typical of an urban city like London, portraying the variation on different object population levels throughout the day.
Huitzil I, Molina-Solana M, Gómez-Romero J, et al., 2021, Minimalistic fuzzy ontology reasoning: An application to Building Information Modeling, Applied Soft Computing, Vol: 103, Pages: 1-15, ISSN: 1568-4946
This paper presents a minimalistic reasoning algorithm to solve imprecise instance retrieval in fuzzy ontologies with application to querying Building Information Models (BIMs)—a knowledge representation formalism used in the construction industry. Our proposal is based on a novel lossless reduction of fuzzy to crisp reasoning tasks, which can be processed by any Description Logics reasoner. We implemented the minimalistic reasoning algorithm and performed an empirical evaluation of its performance in several tasks: interoperation with classical reasoners (Hermit and TrOWL), initialization time (comparing TrOWL and a SPARQL engine), and use of different data structures (hash tables, databases, and programming interfaces). We show that our software can efficiently solve very expressive queries not available nowadays in regular or semantic BIMs tools.
Tajnafoi G, Arcucci R, Mottet L, et al., 2021, Variational Gaussian process for optimal sensor placement, Applications of Mathematics, Vol: 66, Pages: 287-317, ISSN: 0373-6725
Sensor placement is an optimisation problem that has recently gained great relevance. In order to achieve accurate online updates of a predictive model, sensors are used to provide observations. When sensor location is optimally selected, the predictive model can greatly reduce its internal errors. A greedy-selection algorithm is used for locating these optimal spatial locations from a numerical embedded space. A novel architecture for solving this big data problem is proposed, relying on a variational Gaussian process. The generalisation of the model is further improved via the preconditioning of its inputs: Masked Autoregressive Flows are implemented to learn nonlinear, invertible transformations of the conditionally modelled spatial features. Finally, a global optimisation strategy extending the Mutual Information-based optimisation and fine-tuning of the selected optimal location is proposed. The methodology is parallelised to speed up the computational time, making these tools very fast despite the high complexity associated with both spatial modelling and placement tasks. The model is applied to a real three-dimensional test case considering a room within the Clarence Centre building located in Elephant and Castle, London, UK.
Gómez-Romero J, Molina-Solana M, 2021, Towards Data-Driven Simulation Models for Building Energy Management, Pages: 401-407, ISSN: 0302-9743
The computational simulation of physical phenomena is a highly complex and expensive process. Traditional simulation models, based on equations describing the behavior of the system, do not allow generating data in sufficient quantity and speed to predict its evolution and make decisions accordingly automatically. These features are particularly relevant in building energy simulations. In this work, we introduce the idea of deep data-driven simulation models (D3S), a novel approach in terms of the combination of models. A D3S is capable of emulating the behavior of a system in a similar way to simulators based on physical principles but requiring less effort in its construction—it is learned automatically from historical data—and less time to run—no need to solve complex equations.
Soman R, Molina Solana M, Whyte J, 2020, Linked-Data based Constraint-Checking (LDCC) to support look-ahead planning in construction, Automation in Construction, Vol: 120, ISSN: 0926-5805
In the construction sector, complex constraints are not usually modeled in conventional scheduling and 4D building information modeling software, as they are highly dynamic and span multiple domains. The lack of embedded constraint relationships in such software means that, as Automated Data Collection (ADC) technologies become used, it cannot automatically deduce the effect of deviations to schedule. This paper presents a novel method, using semantic web technologies, to model and validate complex scheduling constraints. It presents a Linked-Data based Constraint-Checking (LDCC) approach, using the Shapes Constraint Language (SHACL). A prototype web application is developed using this approach and evaluated using an OpenBIM dataset. Results demonstrate the potential of LDCC to check for constraint violation in distributed construction data. This novel method (LDCC) and its first prototype is a contribution that can be extended in future research in linked-data, BIM based rule-checking, lean construction and ADC.
Ruiz LGB, Pegalajar MC, Arcucci R, et al., 2020, A time-series clustering methodology for knowledge extraction in energy consumption data, Expert Systems with Applications, Vol: 160, ISSN: 0957-4174
In the Energy Efficiency field, the incorporation of intelligent systems in cities and buildings is motivated by the energy savings and pollution reduction that can be attained. To achieve this goal, energy modelling and a better understanding of how energy is consumed are fundamental factors. As a result, this study proposes a methodology for knowledge acquisition in energy-related data through Time-Series Clustering (TSC) techniques. In our experimentation, we utilize data from the buildings at the University of Granada (Spain) and compare several clustering methods to get the optimum model, in particular, we tested k-Means, k-Medoids, Hierarchical clustering and Gaussian Mixtures; as well as several algorithms to obtain the best grouping, such as PAM, CLARA, and two variants of Lloyd’s method, Small and Large. Thus, our methodology can provide non-trivial knowledge from raw energy data. In contrast to previous studies in this field, not only do we propose a clustering methodology to group time series straightforwardly, but we also present an automatic strategy to search and analyse energy periodicity in these series recursively so that we can deepen granularity and extract information at different levels of detail. The results show that k-Medoids with PAM is the best approach in virtually all cases, and the Squared Euclidean distance outperforms the rest of the metrics.
Mack J, Arcucci R, Molina-Solana M, et al., 2020, Attention-based Convolutional Autoencoders for 3D-Variational Data Assimilation, COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, Vol: 372, ISSN: 0045-7825
Fernando S, AmadorDíazLópez J, Şerban O, et al., 2020, Towards a large-scale twitter observatory for political events, Future Generation Computer Systems, Vol: 110, Pages: 976-983, ISSN: 0167-739X
Explosion in usage of social media has made its analysis a relevant topic of interest, and particularly so in the political science area. Within Data Science, no other techniques are more widely accepted and appealing than visualisation. However, with datasets growing in size, visualisation tools also require a paradigm shift to remain useful in big data contexts. This work presents our proposal for a Large-Scale Twitter Observatory that enables researchers to efficiently retrieve, analyse and visualise data from this social network to gain actionable insights and knowledge related with political events. In addition to describing the supporting technologies, we put forward a working pipeline and validate the setup with different examples.
Ruiz LGB, Pegalajar MC, Molina-Solana M, et al., 2020, A case study on understanding energy consumption through prediction and visualization (VIMOEN), Journal of Building Engineering, Vol: 30, Pages: 1-14, ISSN: 2352-7102
Energy efficiency has emerged as an overarching concern due to the high pollution and cost associated with operating heating, ventilation and air-conditioning systems in buildings, which are an essential part of our day to day life. Besides, energy monitoring becomes one of the most important research topics nowadays as it enables us the possibility of understanding the consumption of the facilities. This, along with energy forecasting, represents a very decisive task for energy efficiency. The goal of this study is divided into two parts. First to provide a methodology to predict energy usage every hour. To do so, several Machine Learning technologies were analysed: Trees, Support Vector Machines and Neural Networks. Besides, as the University of Granada lacks a tool to properly monitoring those data, a second aim is to propose an intelligent system to visualize and to use those models in order to predict energy consumption in real-time. To this end, we designed VIMOEN (VIsual MOnitoring of ENergy), a web-based application to provide not only visual information about the energy consumption of a set of geographically-distributed buildings but also expected expenditures in the near future. The system has been designed to be easy-to-use and intuitive for non-expert users. Our system was validated on data coming from buildings of the UGR and the experiments show that the Elman Neural Networks proved to be the most accurate and stable model and since the 5th hour the results maintain accuracy.
Fernando S, Scott-Brown J, Şerban O, et al., 2020, Open Visualization Environment (OVE): A web framework for scalable rendering of data visualizations, Future Generation Computer Systems, Vol: 112, Pages: 785-799, ISSN: 0167-739X
Scalable resolution display environments, including immersive data observatories, are emerging as equitable and socially engaging platforms for collaborative data exploration and decision making. These environments require specialized middleware to drive them, but, due to various limitations, there is still a gap in frameworks capable of scalable rendering of data visualizations. To overcome these limitations, we introduce a new modular open-source middleware, the Open Visualization Environment (OVE). This framework uses web technologies to provide an ecosystem for visualizing data using web browsers that span hundreds of displays. In this paper, we discuss the key design features and architecture of our framework as well as its limitations. This is followed by an extensive study on performance and scalability, which validates its design and compares it to the popular SAGE2 middleware. We show how our framework solves three key limitations in SAGE2. Thereafter, we present two of our projects that used OVE and show how it can extend SAGE2 to overcome limitations and simplify the user experience for common data visualization use-cases.
Martínez V, Fernando S, Molina-Solana M, et al., 2020, Tuoris: A middleware for visualizing dynamic graphics in scalable resolution display environments, Future Generation Computer Systems, Vol: 106, Pages: 559-571, ISSN: 0167-739X
In the era of big data, large-scale information visualization has become an important challenge. Scalable resolution display environments (SRDEs) have emerged as a technological solution for building high-resolution display systems by tiling lower resolution screens. These systems bring serious advantages, including lower construction cost and better maintainability compared to other alternatives. However, they require specialized software but also purpose-built content to suit the inherently complex underlying systems. This creates several challenges when designing visualizations for big data, such that can be reused across several SRDEs of varying dimensions. This is not yet a common practice but is becoming increasingly popular among those who engage in collaborative visual analytics in data observatories. In this paper, we define three key requirements for systems suitable for such environments, point out limitations of existing frameworks, and introduce Tuoris, a novel open-source middleware for visualizing dynamic graphics in SRDEs. Tuoris manages the complexity of distributing and synchronizing the information among different components of the system, eliminating the need for purpose-built content. This makes it possible for users to seamlessly port existing graphical content developed using standard web technologies, and simplifies the process of developing advanced, dynamic and interactive web applications for large-scale information visualization. Tuoris is designed to work with Scalable Vector Graphics (SVG), reducing bandwidth consumption and achieving high frame rates in visualizations with dynamic animations. It scales independent of the display wall resolution and contrasts with other frameworks that transmit visual information as blocks of images.
Dur TH, Arcucci R, Mottet L, et al., 2020, Weak Constraint Gaussian Processes for optimal sensor placement, JOURNAL OF COMPUTATIONAL SCIENCE, Vol: 42, ISSN: 1877-7503
Sanfilippo KRM, Spiro N, Molina-Solana M, et al., 2020, Do the shuffle: exploring reasons for music listening through shuffled play, PLoS One, Vol: 15, ISSN: 1932-6203
Adults listen to music for an average of 18 hours a week (with some people reaching more than double that). With rapidly changing technology, music collections have become overwhelmingly digital ushering in changes in listening habits, especially when it comes to listening on personal devices. By using interactive visualizations, descriptive analysis and thematic analysis, this project aims to explore why people download and listen to music and which aspects of the music listening experience are prioritized when people talk about tracks on their device. Using a newly developed data collection method, Shuffled Play, 397 participants answered open-ended and closed research questions through a short online questionnaire after shuffling their music library and playing two pieces as prompts for reflections. The findings of this study highlight that when talking about tracks on their personal devices, people prioritise characterizing them using sound and musical features and associating them with the informational context around them (artist, album, and genre) over their emotional responses to them. The results also highlight that people listen to and download music because they like it–a straightforward but important observation that is sometimes glossed over in previous research. These findings have implications for future work in understanding music, its uses and its functions in peoples’ everyday lives.
Lim EM, Molina Solana M, Pain C, et al., 2019, Hybrid data assimilation: An ensemble-variational approach, Pages: 633-640
Data Assimilation (DA) is a technique used to quantify and manage uncertainty in numerical models by incorporating observations into the model. Variational Data Assimilation (VarDA) accomplishes this by minimising a cost function which weighs the errors in both the numerical results and the observations. However, large-scale domains pose issues with the optimisation and execution of the DA model. In this paper, ensemble methods are explored as a means of sampling the background error at a reduced rank to condition the problem. The impact of ensemble size on the error is evaluated and benchmarked against other preconditioning methods explored in previous work such as using truncated singular value decomposition (TSVD). Localisation is also investigated as a form of reducing the long-range spurious errors in the background error covariance matrix. Both the mean squared error (MSE) and execution time are used as measure of performance. Experimental results for a 3D case for pollutant dispersion within an urban environment are presented with promise for future work using dynamic ensembles and 4D state vectors.
Oehmichen A, Hua K, Lopez JAD, et al., 2019, Not all lies are equal. A study into the engineering of political misinformation in the 2016 US presidential election, IEEE Access, Vol: 7, Pages: 126305-126314, ISSN: 2169-3536
We investigated whether and how political misinformation is engineered using a datasetof four months worth of tweets related to the 2016 presidential election in the United States. The datacontained tweets that achieved a significant level of exposure and was manually labelled into misinformationand regular information. We found that misinformation was produced by accounts that exhibit differentcharacteristics and behaviour from regular accounts. Moreover, the content of misinformation is more novel,polarised and appears to change through coordination. Our findings suggest that engineering of politicalmisinformation seems to exploit human traits such as reciprocity and confirmation bias. We argue thatinvestigating how misinformation is created is essential to understand human biases, diffusion and ultimatelybetter produce public policy.
Gomez-Romero J, Fernandez-Basso CJ, Cambronero MV, et al., 2019, A probabilistic algorithm for predictive control with full-complexity models in non-residential buildings, IEEE Access, Vol: 7, Pages: 38748-38765, ISSN: 2169-3536
Despite the increasing capabilities of information technologies for data acquisition and processing, building energy management systems still require manual configuration and supervision to achieve optimal performance. Model predictive control (MPC) aims to leverage equipment control – particularly heating, ventilation and air conditioning (HVAC)– by using a model of the building to capture its dynamic characteristics and to predict its response to alternative control scenarios. Usually, MPC approaches are based on simplified linear models, which support faster computation but also present some limitations regarding interpretability, solution diversification and longer-term optimization. In this work, we propose a novel MPC algorithm that uses a full-complexity grey-box simulation model to optimize HVAC operation in non-residential buildings. Our system generates hundreds of candidate operation plans, typically for the next day, and evaluates them in terms of consumption and comfort by means of a parallel simulator configured according to the expected building conditions (weather, occupancy, etc.) The system has been implemented and tested in an office building in Helsinki, both in a simulated environment and in the real building, yielding energy savings around 35% during the intermediate winter season and 20% in the whole winter season with respect to the current operation of the heating equipment.
Rueda R, Cuéllar M, Molina-Solana M, et al., 2019, Generalised regression hypothesis induction for energy consumption forecasting, Energies, Vol: 12, Pages: 1069-1069, ISSN: 1996-1073
This work addresses the problem of energy consumption time series forecasting. In our approach, a set of time series containing energy consumption data is used to train a single, parameterised prediction model that can be used to predict future values for all the input time series. As a result, the proposed method is able to learn the common behaviour of all time series in the set (i.e., a fingerprint) and use this knowledge to perform the prediction task, and to explain this common behaviour as an algebraic formula. To that end, we use symbolic regression methods trained with both single- and multi-objective algorithms. Experimental results validate this approach to learn and model shared properties of different time series, which can then be used to obtain a generalised regression model encapsulating the global behaviour of different energy consumption time series.
Fernando S, Birch D, Molina-Solana M, et al., 2019, Compositional Microservices for Immersive Social Visual Analytics, 23rd International Conference on the Information Visualisation (IV) - Incorporating Biomedical Visualization, Learning Analytics and Geometric Modelling and Imaging, Publisher: IEEE COMPUTER SOC, Pages: 216-223
Gomez-Romero J, Molina-Solana MJ, Oehmichen A, et al., 2018, Visualizing large knowledge graphs: a performance analysis, Future Generation Computer Systems, Vol: 89, Pages: 224-238, ISSN: 0167-739X
Knowledge graphs are an increasingly important source of data and context information in Data Science. A first step in data analysis is data exploration, in which visualization plays a key role. Currently, Semantic Web technologies are prevalent for modelling and querying knowledge graphs; however, most visualization approaches in this area tend to be overly simplified and targeted to small-sized representations. In this work, we describe and evaluate the performance of a Big Data architecture applied to large-scale knowledge graph visualization. To do so, we have implemented a graph processing pipeline in the Apache Spark framework and carried out several experiments with real-world and synthetic graphs. We show that distributed implementations of the graph building, metric calculation and layout stages can efficiently manage very large graphs, even without applying partitioning or incremental processing strategies.
Molina-Solana M, Kennedy M, Amador Diaz Lopez J, 2018, foo.castr: visualising the future AI workforce, Big Data Analytics, Vol: 3, ISSN: 2058-6345
Organization of companies and their HR departments are becoming hugely affected by recent advancements in computational power and Artificial Intelligence, with this trend likely to dramatically rise in the next few years. This work presents foo.castr, a tool we are developing to visualise, communicate and facilitate the understanding of the impact of these advancements in the future of workforce. It builds upon the idea that particular tasks within job descriptions will be progressively taken by computers, forcing the shaping of human jobs. In its current version, foo.castr presents three different scenarios to help HR departments planning potential changes and disruptions brought by the adoption of Artificial Intelligence.
Delgado M, Fajardo W, Molina-Solana M, 2018, A Software Tool for Categorizing Violin Student Renditions by Comparison, CAEPIA, Publisher: Springer International Publishing, Pages: 330-340, ISSN: 0302-9743
Gómez-Romero J, Molina-Solana M, 2018, GraphDL: An Ontology for Linked Data Visualization, CAEPIA, Publisher: Springer International Publishing, Pages: 351-360, ISSN: 0302-9743
Dolan D, Jensen H, Martinez Mediano P, et al., 2018, The improvisational state of mind: a multidisciplinary study of an improvisatory approach to classical music repertoire performance, Frontiers in Psychology, Vol: 9, ISSN: 1664-1078
The recent re-introduction of improvisation as a professional practice within classical music, however cautious and still rare, allows direct and detailed contemporary comparison between improvised and “standard” approaches to performances of the same composition, comparisons which hitherto could only be inferred from impressionistic historical accounts. This study takes an interdisciplinary multi-method approach to discovering the contrasting nature and effects of prepared and improvised approaches during live chamber-music concert performances of a movement from Franz Schubert’s “Shepherd on the Rock”, given by a professional trio consisting of voice, flute, and piano, in the presence of an invited audience of 22 adults with varying levels of musical experience and training. The improvised performances were found to be differ systematically from prepared performances in their timing, dynamic, and timbral features as well as in the degree of risk-taking and “mind reading” between performers including during moments of added extemporised notes. Post-performance critical reflection by the performers characterised distinct mental states underlying the two modes of performance. The amount of overall body movements was reduced in the improvised performances, which showed less unco-ordinated movements between performers when compared to the prepared performance. Audience members, who were told only that the two performances would be different, but not how, rated the improvised version as more emotionally compelling and musically convincing than the prepared version. The size of this effect was not affected by whether or not the audience could see the performers, or by levels of musical training. EEG measurements from 19 scalp locations showed higher levels of Lempel-Ziv complexity (associated with awareness and alertness) in the improvised version in both performers and audience. Results are discussed in terms of their potential
Gómez-Romero J, Molina-Solana M, Ros M, et al., 2018, Comfort as a service: a new paradigm for residential environmental quality control, Sustainability, Vol: 10, ISSN: 1937-0709
This paper introduces the concept of Comfort as a Service (CaaS), a new energy supply paradigm for providing comfort to residential customers. CaaS takes into account the available passive and active elements, the external factors that affect energy consumption and associated costs, and occupants' behaviors to generate optimal control strategies for the domestic equipment automatically. As a consequence, it releases building occupants from operating the equipment, which gives rise to a disruption of the traditional model of paying per consumed energy in favor of a model of paying per provided comfort. In the paper, we envision a realization of CaaS based on several technologies such as ambient intelligence, big data, cloud computing and predictive computing. We discuss the opportunities and the barriers of CaaS-centered business and exemplify the potential of CaaS deployments by quantifying the expected energy savings achieved after limiting occupants' control over the air conditioning system in a test scenario.
Molina-Solana MJ, Guo Y, Birch D, 2017, Improving data exploration in graphs with fuzzy logic and large-scale visualisation, Applied Soft Computing, Vol: 53, Pages: 227-235, ISSN: 1872-9681
This work presents three case-studies of how fuzzy logic can be combined with large-scale immersive visualisation to enhance the process of graph sensemaking, enabling interactive fuzzy filtering of large global views of graphs. The aim is to provide users a mechanism to quickly identify interesting nodes for further analysis. Fuzzy logic allows a flexible framework to ask human-like curiosity-driven questions over the data, and visualisation allows its communication and understanding. Together, these two technologies successfully empower novices and experts to a faster and deeper understanding of the underlying patterns in big datasets compared to traditional means in a desktop screen with crisp queries. Among other examples, we provide evidence of how these two technologies successfully enable the identification of relevant transaction patterns in the Bitcoin network.
De Castro-Santos A, Fajardo W, Molina-Solana M, 2017, A game based e-learning system to teach artificial intelligence in the computer sciences degree, Pages: 25-31
Our students taking the Artificial Intelligence and Knowledge Engineering courses often encounter a large number of problems to solve which are not directly related to the subject to be learned. To solve this problem, we have developed a game based e-learning system. The elected game, that has been implemented as an e-learning system, allows to develop Artificial Intelligence Decision Making Systems of very diverse complexity level. The e-learning system discharges the students of doing work not directly related with the Artificial Intelligence and Knowledge Engineering problems. This way, students can try their development and self-evaluate their progression level. The results obtained after using this e-learning system with the students (during the Artificial Intelligence and Knowledge Engineering course) show a substantial improvement in students' learning outcomes.
Molina-Solana M, Ros M, Ruiz MD, et al., 2016, Data science for building energy management: A review, Renewable and Sustainable Energy Reviews, Vol: 70, Pages: 598-609, ISSN: 1364-0321
The energy consumption of residential and commercial buildings has risen steadily in recent years, an increase largely due to their HVAC systems. Expected energy loads, transportation, and storage as well as user behavior influence the quantity and quality of the energy consumed daily in buildings. However, technology is now available that can accurately monitor, collect, and store the huge amount of data involved in this process. Furthermore, this technology is capable of analyzing and exploiting such data in meaningful ways. Not surprisingly, the use of data science techniques to increase energy efficiency is currently attracting a great deal of attention and interest. This paper reviews how Data Science has been applied to address the most difficult problems faced by practitioners in the field of Energy Management, especially in the building sector. The work also discusses the challenges and opportunities that will arise with the advent of fully connected devices and new computational technologies.
Ruiz MD, Gómez-Romero J, Molina-Solana M, et al., 2016, Information fusion from multiple databases using meta-association rules, International Journal of Approximate Reasoning, Vol: 80, Pages: 185-198, ISSN: 1873-4731
Nowadays, data volume, distribution, and volatility make it difficult to search global patterns by applying traditional Data Mining techniques. In the case of data in a distributed environment, sometimes a local analysis of each dataset separately is adequate but some other times a global decision is needed by the analysis of the entire data. Association rules discovering methods typically require a single uniform dataset and managing with the entire set of distributed data is not possible due to its size. To address the scenarios in which satisfying this requirement is not practical or even feasible, we propose a new method for fusing information, in the form of rules, extracted from multiple datasets. The proposed model produces meta-association rules, i.e. rules in which the antecedent or the consequent may contain rules as well, for finding joint correlations among trends found individually in each dataset. In this paper, we describe the formulation and the implementation of two alternative frameworks that obtain, respectively, crisp meta-rules and fuzzy meta-rules. We compare our proposal with the information obtained when the datasets are not separated, in order to see the main differences between traditional association rules and meta-association rules. We also compare crisp and fuzzy methods for meta-association rule mining, observing that the fuzzy approach offers several advantages: it is more accurate since it incorporates the strength or validity of the previous information, produces a more manageable set of rules for human inspection, and allows the incorporation of contextual information to the mining process expressed in a more human-friendly format.
Ruiz MD, Gómez-Romero J, Molina-Solana M, et al., 2016, Meta-association rules for mining interesting associations in multiple datasets, Applied Soft Computing, Vol: 49, Pages: 212-223, ISSN: 1568-4946
Association rules have been widely used in many application areas to extract new and useful information expressed in a comprehensive way for decision makers from raw data. However, raw data may not always be available, it can be distributed in multiple datasets and therefore there resulting number of association rules to be inspected is overwhelming. In the light of these observations, we propose meta-association rules, a new framework for mining association rules over previously discovered rules in multiple databases. Meta-association rules are a new tool that convey new information from the patterns extracted from multiple datasets and give a “summarized” representation about most frequent patterns. We propose and compare two different algorithms based respectively on crisp rules and fuzzy rules, concluding that fuzzy meta-association rules are suitable to incorporate to the meta-mining procedure the obtained quality assessment provided by the rules in the first step of the process, although it consumes more time than the crisp approach. In addition, fuzzy meta-rules give a more manageable set of rules for its posterior analysis and they allow the use of fuzzy items to express additional knowledge about the original databases. The proposed framework is illustrated with real-life data about crime incidents in the city of Chicago. Issues such as the difference with traditional approaches are discussed using synthetic data.
This work presents a systemic top-down visualization of Bitcoin transaction activity to explore dynamically generated patterns of algorithmic behavior. Bitcoin dominates the cryptocurrency markets and presents researchers with a rich source of real-time transactional data. The pseudonymous yet public nature of the data presents opportunities for the discovery of human and algorithmic behavioral patterns of interest to many parties such as financial regulators, protocol designers, and security analysts. However, retaining visual fidelity to the underlying data to retain a fuller understanding of activity within the network remains challenging, particularly in real time. We expose an effective force-directed graph visualization employed in our large-scale data observation facility to accelerate this data exploration and derive useful insight among domain experts and the general public alike. The high-fidelity visualizations demonstrated in this article allowed for collaborative discovery of unexpected high frequency transaction patterns, including automated laundering operations, and the evolution of multiple distinct algorithmic denial of service attacks on the Bitcoin network.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.