3rd International Conference on Data Mining & Knowledge Management

November 24~25, 2018, Dubai, UAE

Accepted Papers

COMPARISONOF FOURALGORITHMS FOR ONLINECLUSTERING
Xinchun Yang1,2, Wassim Kabbara1,3
1Department of Computer Engineering, CentraleSupelec, Paris, France. 2Department of Electrical Engineering, Tsinghua University, Beijing, China. 3Department of Electrical Engineering, Tsinghua University, Beijing, China

ABSTRACT

This paper concludes and analyses four widely-used algorithms in the field of online clustering: sequential K-means, basic sequential algorithmic scheme, online inverse weighted K-means and online K-harmonic means. All algorithms are applied to the same set of self-generated data in 2-dimension plane with and without noise separately. The performance of different algorithms is compared by means of velocity, accuracy, purity,and robustness. Results show that the basic sequential K-means online performs better on data without noise, and the K-harmonic means online performs is the best choice when noise interferes with the data.


PERCEPTUAL MAPPING OF ELECTRONIC BANKING IN IRAN: DATA MINING APPROACH
Sina Fakharmanesh1
1Faculty of Management,University of Shahid Beheshti, Tehran, Iran

ABSTRACT

Electronic banking has been on rise in recent years and this growth is still traceable in developing countries by increase of internet usage. Early studies in this realm explored the influential factors on adapting this new concept by bank customers. There is paucity of research digging deep on customer perception regarding internet banking. The aim of this study is to extend this area of investigation by exploring perceptual map of electronic banking in Iran. For fulfilling this purpose ten most popular banks of Iran are chosen and by use of principal component analysis dimensions of their perception were analyzed. Results showed that four clusters are detectable among customers which are website functionality, user satisfaction, security and fulfillment. Mangerial implications and directions for future research are presented at the final stage of this article.


IMPUTING ITEM AUXILIARY INFORMATION IN NMF-BASED COLLABRATIVE FILTERING
Fatemah Alghamedy1, Maryam Al-Ghamdi2 Jun Zhang, Ph.D3
1Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA. 2Department of Computer Science, University of Jeddah, Jeddah, Saudi Arabia 3Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA

ABSTRACT

The cold-start items, especially the New-Items which did not receive any ratings, have negative impacts on NMF (Nonnegative Matrix Factorization)-based approaches, particularly the ones that utilize other information besides the rating matrix. We propose an NMF based approach in collaborative filtering based recommendation systems to handle the New-Items issue. The proposed approach utilizes the item auxiliary information to impute missing ratings before NMF is applied. We study two factors with the imputation: (1) the total number of the imputed ratings for each New-Item, and (2) the value and the average of the imputed ratings. To study the influence of these factors, we divide items into three groups and calculate their recommendation errors. Experiments on three different datasets are conducted to examine the proposed approach. The results show that our approach can handle the New-Item’s negative impact and reduce the recommendation errors for the whole dataset.


ENHANCE NMF-BASED RECOMMENDATION SYSTEMS WITH SOCIAL INFORMATION IMPUTATION
Fatemah Alghamedy1and Jun Zhang, Ph.D2
1Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA. 2Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA

ABSTRACT

We propose an NMF (Nonnegative Matrix Factorization)-based approach in collaborative filtering based recommendation systems to improve the Cold-Start-Users predictions since Cold-Start-Users suffer from high error in the results. The proposed method utilizes the trust network information to impute a subset of the missing ratings before NMF is applied. We proposed three strategies to select the subset of missing ratings to impute in order to examine the influence of the imputation with both item groups: Cold-Start-Items and Heavy-Rated-Items; and survey if the trustees’ ratings could improve the results more than the other users. We analyze two factors that may affect results of the imputation:(1)the total number of imputed ratings, and (2) the average of imputed ratings value Experiments on four different datasets are conducted to examine the proposed approach. The results show that our approach improves the prediction rating of the cold-start users and alleviates the impact of imputed ratings


DISASTER INITIAL RESPONSES MINING DAMAGES USING FEATURE EXTRACTION AND BAYESIAN OPTIMISED SUPPORT VECTOR CLASSIFIER
Yasuno Takato1, Amakata Masazumi1, Fujii Junichiro1and Shimamoto Yuri1
1Research Institute for Infrastructure Paradigm Shift, Yachiyo Engineering, Co. Ltd., Tokyo, Japan

ABSTRACT

Whenever any natural disaster occurs around a devastated region, it is valuable to quickly evaluate the damage status in high priority places. Frequently, owing to restriction of disaster management resources, we will predict a spatial information where the infrastructure manager starts to initial responses. It is critical to address the initial response for mitigating social losses. Recently in Japan, several great earthquakes with magnitude around 6 have continued to happen, for example the Great East Japan earthquake 2011 Mar (M9), Kumamoto 2016 Apr (M7), Osaka 2018 Jun (M5.5), and Hokkaido 2018 Sep (M6.7). These huge earthquakes occur not only in Japan but around the world such as the Indonesia earthquake and tsunami 2018 Oct. The initial response towards tomorrow earthquake is one of important problems to discover the natural disaster knowledge and to predict the damage level of infrastructures using multi-mode usable data sources. In Japan, approximately 5 million CCTV cameras are installed. The Ministry of Land, Infrastructure and Transportation uses 23 thousand of these cameras to monitor infrastructures in each region. This paper proposes a feature extracted damage classification model using the disaster images with 5 classes of damage after the occurrence of a huge earthquake. We present a support vector damage classifier whose inputs are extracted the damage features, such as tsunami, bridge collapse, and road damage, with respect to accident risks concerning with user’s loss, initial smoke and fire, and non-disaster damage. The total number of images is 1,117, which we have collected from relevant web-sites that allow us to download records of huge earthquake damages occurring worldwide. Using the ten pre-trained architectures, we have extracted the damage feature and constructed the support vector classification model with a radial basis function, whose hyper parameters optimised results to minimize the loss function value with an accuracy of 97.50%, based on the DenseNet-201. We would get some further opportunities for disaster data mining and localised detection


Tacit Knowledge Management and Challenges in Sudanese Petroleum Sector
Elfatih Wadidi1, Afaf M. Alhassan2
1KM lead, Sudapet, Sudan, 2Assistant professor, U of K, Sudan

ABSTRACT

Knowledge is information resident in people’s minds, used for making decisions in unknown contexts. Organizationally, KM embodies processes that seek synergies in the combination of data and information processing capacity of information technologies, with the creative and innovative capacity of human beings. The Challenges will appear in mind bleeding and satisfaction of employees in addition to knowledge base of leaders within the organization. This paper express the flow of knowledge and relation with fields and decision makers and challenges faces the knowledge workers for managing tacit knowledge and ability to gain more and make profit.


FACTORS AFFECTING CLASSIFICATION ALGORITHMS RECOMMENDATION: A SURVEY

Mariam Moustafa Reda1, Dr. Mohammad Nassef 2 and Dr. Akram Salah 3
1,2,3Computer Science Department, Faculty of Computers and Information, Cairo University, Giza, Egypt

ABSTRACT

A lot of classification algorithms are available in the area of data mining for solving the same kind of problem with a little guidance for recommending the most appropriate algorithm to use which gives best results for the dataset at hand. As a way of optimizing the chances of recommending the most appropriate classification algorithm for a dataset, this paper focuses on the different factors considered by data miners and researchers in different studies when selecting the classification algorithms that will yield desired knowledge for the dataset at hand. A number of factors to be considered when selecting an algorithm are discussed to guide data miners in choosing appropriate algorithms. The paper divided the factors into business and technical factors. The technical factors proposed are measurable and can be exploited by recommendation software tools.

Visual Categorization of Objects into Animal and Plant Groups Using Global Shape Descriptors
Zahra Sadeghi
Department of Electrical and Computer Engineering, University of Tehran, Iran
Computer Vision Center, Universitat Autonomous de Barcelona (UAB), Spain


ABSTRACT

How can humans distinguish between general categories of objects? Are the subcategories of living things visually distinctive? In a number of semantic-category deficits, patients are good at making broad categorization but are unable to remember fine and specific details. It has been well accepted that general information about concepts are more robust to damages related to semantic memory. Results from patients with semantic memory disorders demonstrate the loss of ability in subcategory recognition. While bottom-up feature construction has been studied in detail, little attention has been served to top-down approach and the type of features that could account for general categorization. In this paper, I show that broad categories of animal and plant are visually distinguishable without processing textural information. To this aim I utilize shape descriptors with an additional phase of feature learning. The results are evaluated with both supervised and unsupervised learning mechanisms. The obtained results confirmed that global encoding of visual appearance of objects accounts for high discrimination between animal and plant object categories


Rising Threat Of Mobile Phone As Mobile Health Equipment In African Countries
Ramadile Isaac Moletsane 1 and Keneilwe Zuva2
1 Department of Software Studies, Vaal University of Technology, Vanderbijlpark, South Africa
2University of Botswana, Gaborone, Botswana


ABSTRACT

The widespread introduction of mobile devices has made enabling conditions for the deployment of mobile health activities. Although mobile health is a relatively new concept it is transforming healthcare all over the world. It is a rapidly progressing area with tremendous rate. Fifteen publications were identified from Elsevier, PubMed and Google scholar databases and specific to the purpose of this paper. The search was restricted to humans, date of publication (2014 to 2017) and publication language (English). The aim of this narrative review paper was to analyse possible hazards and benefits of mobile phones as mobile health equipment to the environment and wellbeing respectively and suggest an intervention. Mobile phones were found to be the most mHealth equipment used in Africa. The continent is realizing the benefits from mHealth practices. The issue concerning about mobile phones when they reach their end-of-life is their toxicity to the environment and wellbeing. Africa is found to manage electronic waste in a manner that is not friendly to the environment. Therefore the study suggests that awareness of detrimental effects of this waste be prioritized.


Rate Control Method For Near-Lossless Image Compression with JPEG-Ls
Shigao Li
School of Mathematic & Computer Science, Wuhan Polytechnic University, Wuhan Hubei, China

ABSTRACT

JPEG-LS become the standard of lossless and near-lossless image compression because of its performance and low complexity. However, it can't accurately control code rate when it is applied in near-lossless compression. This paper is thus devoted to rate control for near-lossless image compression with JPEG-LS. A model of coding bit-rate under a high bit-rate with respect to mean absolute difference (MAD) and coding quantization parameters for prediction coding is first proposed. Then a rate control method for near-lossless compression is designed based on the model for JPEG-LS. In the process of a specific image coding, to control the bit-rate, quantitative parameters are adjusted piecewise based on the model. Experiments show that the proposed method can make final code rate close to a preset rate. It's different from other methods that quantitative parameter fluctuating within a wide range can be avoided because of the accurate model of bit-rate. As a result, the proposed control method can achieve approximate optimal rate-distortion performance.


Microscopic Image Compression Using Support Vector
Chahinez Meriem Bentaouza1, 2 and Mohamed Benyettou1
1Department of Computer Science, Faculty of Mathematics and Computer Science, University of Sciences and Technology of Oran, Oran, Algeria
2Department of Mathematics and Computer Science, Faculty of Exact Sciences and Computer Science, University of Mostaganem, Mostaganem, Algeria


ABSTRACT

This paper deals constitution of compressed image after learning by support vector machines applied to microscopic images. The compression is used to reduce medical image size defined by an important acquisition for each exam, so, big size for storage and a lot of time for transmission. The compression ratio is satisfactory, but the result image is different from the original image because the compressed image has only support vectors, so we have loss of visual information.

Learning Trajectory Patterns by Sequential Pattern Mining From Probabilistic Databases
Josky Aizan2, Cina Motamed2 and Eugene C. Ezin3
1Ecole Doctorale Sciences Exactes et Appliquees
22Laboratoire d'Informatique Signal et Image de la Cote d'OpaleUniversite du LittoralCote d'Opale, France
3Institut de Mathematiques et de Sciences Physiques Universite d'Abomey-Calavi, Benin


ABSTRACT

In this paper, we use Sequential Pattern Mining from Probabilistic Databases to learn trajectory patterns. Trajectories which are a succession of points are firstly transformed into a succession of zones by grouping points to build the symbolic sequence database. For each zone we estimate a confidence level according to the amount of observations appearing during trajectory in the zone. The management of this confidence allows to reduce efficiently the volume of useful zones for the learning process. Finally, we applied a Sequential Pattern Mining algorithm on this probabilistic databases to bring out typical trajectories.


A Machine Fault Detection andDiagnosis System Using Sound-to-Image Conversion Feature Representation
Caleb Vununu1, Ki-Ryong Kwon1
1Dept. of IT Convergence and Applications Engineering,Pukyong National University, Busan, Korea

ABSTRACT

The present work proposes a sound-based machine fault detection system for the assessment of the drilling machines in industry sites. The main contribution of this work is to represent the sounds as images and then apply on the newly created images some transformations in order to reveal the hidden heath patterns originally absent in the sounds. The sounds are first recorded from faultless and defective drills for the analysis. The recorded sounds are converted to 8-bit grayscale images by using some 1D-to-2D transformations. Secondly, after a contrast enhancement process carried out to correct the poor contrast of the images, a low-pass filtering in the spatial domain is applied to the images in order to attenuate their gray variation. The filtered images are used as the features for the diagnosis assessment. A final step consists of feeding the images to a nonlinear classifier whose outputs will be the final assessment decision. We demonstrate that the proposed feature extraction method seize and reveals the health patterns carried out by the sounds.

The Reuse Challenge in Evolutionary Computing, The Added Value of Software Product Lines
Abdelghani Alidra1 and Mohamed Tahar Kimour2
120 aout 55 University, Algeria and 2Badji Mokhtar University, Algeria

ABSTRACT

Evolutionary computing (EC) designates the computing science discipline involved in developing biology inspired algorithms for solving hard search-based problems. Evolutionary computing suites well to various engineering problems and has been successfully applied to many of them. Adopting EC in practice, however, has uncovered several challenging issues, such as the efficient reuse of the evolutionary code, the correct tuning of the algorithm and the dynamic evolution of its behavior to balance divergent requirements. To address these issues, we propose in the present article, a new approach to evolutionary algorithms development based on product line engineering. Our approach is centered on the feature model of the evolutionary algorithms software family. This is notable in that it offers a rigorous way to the relevant identification and implementation of the reusable parts. Moreover, it allows the exploitation of existing model-based techniques for automatic code generation and reasoning. It also opens promising perspectives to the intelligent tuning and dynamic reconfiguration of the evolutionary algorithm through the exploitation of the most recent advances in the field of dynamic software product lines.


Detecting Chronic Diseases from Sleep-Wake Behaviour and Clinical Features
Sarah Fallmann and Liming Chen
De Montfort University, UK

ABSTRACT

Many chronic diseases show evidence of correlations with sleep-wake behaviour, and there is an increasing interest in making use of such correlations for early warning systems. This research presents an approach towards early chronic disease detection by mining sleep-wake measurements using deep learning. Specifcally, a Long-Short-Term-Memory network is applied on actigraph data enriched with clinical history of patients. Experiments and analysis are performed targeting detection at an early and advanced disease stage based on diferent clinical data features. The results show for disease detection an averaged accuracy of 0:62, 0:73, 0:81, 0:77 for hypertension, diabetes, sleep apnea and chronic kidney disease, respectively. Early detection performs with an averaged accuracy of 0:49 for sleep apnea and 0:56 for diabetes. Nevertheless, compared to existing work, our approach shows an improvement in performance and demonstrates that predicting chronic diseases from sleep-wake behavior is feasible, though further investigation will be needed for early prediction.


Artificial Intelligence-Fusing the Future in Medical Cannabis Production
Everton H. Flemmings
AOE Gropu of Companis Ltd, UK

ABSTRACT

The active compounds in cannabis are called cannabinoids, and there are at least 85 of them, of which the most commonly discussed are tetrahydrocannabinol (THC) and cannabidol (CBD). While the majority of states in the USA have opted to legalise THC for either medical (that is , with prescription for a limited range of conditions) or adult usage, some have chosen to instead allow CDBD usage only as is not psychoactive, meaning it doesn’t provide the traditional high associated with cannabis. Low THC/high CBD laws are commonly viewed as a way to allow access to legal cannabis for the neediest patients, such as children with seizure disorders, without creating a fully fledge industry in the state.


PROVER: AN SMT-BASED FRAMWORK FOR PROCESS VERIFICATION
Souheib Baarir, Reda Bendraou, Hakan Metin
Laboratoire d'Informatique de Paris 6, Paris, France

ABSTRACT

Business processes are used to represent the company's business and services it delivers. They are also means to create an added value to the company as well as to its customers. It is then more than critical to seriously consider the design of such processes and to make sure that they are free of any kind of inconsistencies. This paper introduces our new framework called ProVer. Three of its design decisions will be motivated: (1) the use of UML Activity Diagrams (AD) as a process modeling language, (2) the formalization of the UML AD concepts for process verification as well as a well-identified set of properties in first-order logic (FOL) and (3) the use of SMT (Satisfiability Modulo Theories) as means to verify properties spanning different process's perspectives in an optimal way. The originality of ProVer is the ability for non-experts to express properties to be verified on processes that span the control, data, time, and resource perspectives using the same tool.

ASK LESS-SCALE MARKET RESEARCH WITHOUT ANNOYING YOUR CUSTOMERS
Venkatesh Umaashankar1* and Girish Shanmugam S2
1Ericsson Research, Chennai, India.2Machine Learning Consultant, E3, Jains Green Acres, Chennai, India

ABSTRACT

Market research is generally performed by surveying a representative sample of customers with questions that includes contexts such as psycho-graphics, demographics, attitude and product preferences. Survey responses are used to segment the customers into various groups that are useful for targeted marketing and communication. Reducing the number of questions asked to the customer has utility for businesses to scale the market research to a large number of customers. In this work, we model this task using Bayesian networks. We demonstrate the effectiveness of our approach using an example market segmentation of broadband customers.

AN APPROACH FOR WEB APPLICATIONS TEST DATA GENERATION BASED ON ANALYZING CLIENT SIDE USER INPUT FIELDS
Samer Hanna1 and Hayat Jaber2
1Department of Software Engineering, Faculty of Information Technology, Philadelphia University, Jordan. 2Department of Computer Science, Faculty of Information Technology, Philadelphia University, Jordan

ABSTRACT

It is time consuming to manually generate test data for Web applications; therefore, automating this task is an important task for both practitioners and researchers in this domain. To achieve this goal, the research in this paper depends on an ontology that categorizes Web applications inputs according to input types such as number, text, and date. This research presents rules for Test Data Generation for Web Applications (TDGWA) based on the input categories specified by the ontology. Following this paper’s approach, Web applications testers will need less time and effort to accomplish the task of TDGWA. The approach had successfully been used to generate test data for different experimental and real life Web applications.

SEGMENTING RETAIL CUSTOMERS WITH ENHANCED RFM DATA USING AHYBRID REGRESSION/CLUSTERING METHOD
Fahed Yoseph, Professor Markku Heikkilä, Mohammed Malaily
Åbo Akademi University, Finalnd, Turku

ABSTRACT

Targeted marketing strategies attract interest from both industry and academia. A viable approach for gaining insight into the heterogeneity of customer purchase lifecycle is market segmentation. Conventional market segmentation models often ignore the evolution of customers’ behavior over time. Therefore, retailers often end up spendingtheir limited resources attempting to serve unprofitable customers. This study looks into the integration of Recency, Frequency, Monetary scoresand Customer Lifetime Value model, and applies the resulting data to segment customers of a medium-sized clothing and fashion accessory retailer in Kuwait. A modified regression algorithm is implemented for finding the slope for customer purchase curve. Then K-means and Expectation Maximization clustering algorithms are used to findthe sign of the curve. The purpose is to gain knowledgefrom point-of-sales data and help the retailer to make informed decisions. Cluster quality assessment concludes that the EM algorithm outperformed k-means algorithm in finding relevant segments. Finally, appropriate marketing strategies are suggested in accordance with the results generated by EM clustering algorithm

BIRD SWARM ALGORITHM FOR SOLVING THE LONG-TERM CAR POOLING PROBLEM
Zakaria BENDAOUD1 and Sidahmed BENNACEF2
1GeCode Laboratory, Department of Computer Science, Dr. Moulay Tahar University of Saida Algeria. 2Department of Computer Science, Moulay Tahar University of Saida Algeria

ABSTRACT

Carpooling consists in sharing personal vehicles to make a joint trip, in order to share the costs of fuel, toll (soon in Algeria) or simply to exchange. The purpose of this work is to benefit from web 2.0 tools in order to adopt the ideal strategy for carpooling we treated the family of long-term carpool, the problem is to find the best groups between a set of individuals who make the same trip every day and in a regular way. in order to reach our goal, we adapted a bio-inspired meta-heuristics, this technique allowed us to have very satisfying results.

VISUALISATION OF MULTI-SERVICE SYSTEM NETWORK WITH D3.JS & KDB+/Q USING WEBSOCKET
Ali Kapadiya
Kdb+ Tick-data and Analytics developer, London, UK

ABSTRACT

Visualisation of complex web of services running in a multi-service system using D3.js as frontend, KDB+/q as backend and WebSocket & JSON for communication