Recently, neural networks have shown promising results for named entity recognition (NER), which needs a number of labeled data to for model training. When meeting a new domain (target domain) for NER, there is no or a few labeled data, which makes domain NER much more difficult. As NER has been researched for a long time, some similar domain already has well labelled data (source domain). Therefore, in this paper, we focus on domain NER by studying how to utilize the labelled data from such similar source domain for the new target domain. We design a kernel function based instance transfer strategy by getting similar labelled sentences from a source domain. Moreover, we propose an enhanced recurrent neural network (ERNN) by adding an additional layer that combines the source domain labelled data into traditional RNN structure. Comprehensive experiments are conducted on two datasets. The comparison results among HMM, CRF and RNN show that RNN performs bette than others. When there is no labelled data in domain target, compared to directly using the source domain labelled data without selecting transferred instances, our enhanced RNN approach gets improvement from 0.8052 to 0.9328 in terms of F1 measure.
Generative Adversarial Networks (GANs) have experienced a recent surge in popularity, performing competitively in a variety of tasks, especially in computer vision. However, GAN training has shown limited success in natural language processing. This is largely because sequences of text are discrete, and thus gradients cannot propagate from the discriminator to the generator. Recent solutions use reinforcement learning to propagate approximate gradients to the generator, but this is inefficient to train. We propose to utilize an autoencoder to learn a low-dimensional representation of sentences. A GAN is then trained to generate its own vectors in this space, which decode to realistic utterances. We report both random and interpolated samples from the generator. Visualization of sentence vectors indicate our model correctly learns the latent space of the autoencoder. Both human ratings and BLEU scores show that our model generates realistic text against competitive baselines.
The theory introduced, presented and developed in this paper, is concerned with Rough Concept Analysis. This theory is a synthesis of the theory of Rough Sets pioneered by Zdzislaw Pawlak with the theory of Formal Concept Analysis pioneered by Rudolf Wille. The central notion in this paper of a rough formal concept combines in a natural fashion the notion of a rough set with the notion of a formal concept: ‘rough set + formal concept = rough formal concept’. A follow-up paper will provide a synthesis of the two important data modeling techniques: conceptual scaling of Formal Concept Analysis and Entity-Relationship database modeling.
In this paper, we review the state-of-the-art results in evolutionary computation and observe that we do not evolve non trivial software from scratch and with no human intervention. A number of possible explanations are considered, but we conclude that computational complexity of the problem prevents it from being solved as currently attempted. A detailed analysis of necessary and available computational resources is provided to support our findings.
We revisit the question (most famously) initiated by Turing:Can human intelligence be completely modelled by a Turing machine? To give away the ending we show here that the answer is \emph{no}. More specifically we show that at least some thought processes of the brain cannot be Turing computable. In particular some physical processes are not Turing computable, which is not entirely expected. The main difference of our argument with the well known Lucas-Penrose argument is that we do not use G\’odel’s incompleteness theorem, (although our argument seems related to G\’odel’s) and we do not need to assume fundamental consistency of human reasoning powers, (which is controversial) we also side-step some meta-logical issues with their argument, which have also been controversial. The argument is via a thought experiment and at least partly physical, but no serious physical assumptions are made. Furthermore the argument can be reformed as an actual (likely future) experiment.
This paper compares different forecast methods and models to predict average values of solar irradiance with a sampling time of 15 min over a prediction horizon of up to 3 h. The methods considered only require historic solar irradiance values, the current time and geographical location, i.e., no exogenous inputs are required. Nearest neighbor regression (NNR) and autoregressive integrated moving average (ARIMA) models are tested using different hyperparameters (e.g., the number of autoregressive lags, or the size of the training data set) and data from different locations and seasons. Based on a high number of different models, NNR is identified to be the more promising approach. The hyperparameters and their effect on the forecast quality are analyzed to identify properties which are likely to lead to good forecasts. Using these properties, a reduced search space is derived which can be used to identify good forecast models much faster. In a case study, the use of this search space is demonstrated by finding forecast models for different climatic situations.
We briefly introduce herein a new form of distributed, multi-agent artificial intelligence, which we refer to as ‘tentacular.’ Tentacular AI is distinguished by six attributes, which among other things entail a capacity for reasoning and planning based in highly expressive calculi (logics), and which enlists subsidiary agents across distances circumscribed only by the reach of one or more given networks.
We theoretically and empirically explore the explainability benefits of adversarial learning in logistic regression models on structured datasets. In particular we focus on improved explainability due to significantly higher _feature-concentration_ in adversarially-learned models: Compared to natural training, adversarial training tends to much more efficiently shrink the weights of non-predictive and weakly-predictive features, while model performance on natural test data only degrades slightly (and even sometimes improves), compared to that of a naturally trained model. We provide a theoretical insight into this phenomenon via an analysis of the expectation of the logistic model weight updates by an SGD-based adversarial learning algorithm, where examples are drawn from a random binary data-generation process. We empirically demonstrate the feature-pruning effect on a synthetic dataset, some datasets from the UCI repository, and real-world large-scale advertising response-prediction data-sets from MediaMath. In several of the MediaMath datasets there are 10s of millions of data points, and on the order of 100,000 sparse categorical features, and adversarial learning often results in model-size reduction by a factor of 20 or higher, and yet the model performance on natural test data (measured by AUC) is comparable to (and sometimes even better) than that of the naturally trained model. We also show that traditional $\ell_1$ regularization does not even come close to achieving this level of feature-concentration. We measure ‘feature concentration’ using the Integrated Gradients-based feature-attribution method of Sundararajan et. al (2017), and derive a new closed-form expression for 1-layer networks, which substantially speeds up computation of aggregate feature attributions across a large dataset.
Optimization techniques play a significant role in improving description logic reasoners covering the Web Ontology Language (OWL). These techniques are essential to speed up these reasoners. Many of the optimization techniques are based on heuristic choices. Optimal heuristic selection makes these techniques more effective. The FaCT++ OWL reasoner and its Java version JFact implement an optimization technique called ToDo list which is a substitute for a traditional top-down approach in tableau-based reasoners. The ToDo list mechanism allows one to arrange the order of applying different rules by giving each a priority. Compared to a top-down approach, the ToDo list technique has a better control over the application of expansion rules. Learning the proper heuristic order for applying rules in ToDo lis} will have a great impact on reasoning speed. We use a binary SVM technique to build our learning model. The model can help to choose ontology-specific order sets to speed up OWL reasoning. On average, our learning approach tested with 40 selected ontologies achieves a speedup of two orders of magnitude when compared to the worst rule ordering choice.
Deep learning methods are often difficult to apply in the legal domain due to the large amount of labeled data required by deep learning methods. A recent new trend in the deep learning community is the application of multi-task models that enable single deep neural networks to perform more than one task at the same time, for example classification and translation tasks. These powerful novel models are capable of transferring knowledge among different tasks or training sets and therefore could open up the legal domain for many deep learning applications. In this paper, we investigate the transfer learning capabilities of such a multi-task model on a classification task on the publicly available Kaggle toxic comment dataset for classifying illegal comments and we can report promising results.
Deep neural networks are data hungry models and thus they face difficulties when used for training on small size data. Transfer learning is a method that could potentially help in such situations. Although transfer learning achieved great success in image processing, its effect in the text domain is yet to be well established especially due to several intricacies that arise in the context of document analysis and understanding. In this paper, we study the problem of transfer learning for text summarization and discuss why the existing state-of-the-art models for this problem fail to generalize well on other (unseen) datasets. We propose a reinforcement learning framework based on self-critic policy gradient method which solves this problem and achieves good generalization and state-of-the-art results on a variety of datasets. Through an extensive set of experiments, we also show the ability of our proposed framework in fine-tuning the text summarization model only with a few training samples. To the best of our knowledge, this is first work that studies transfer learning in text summarization and provides a generic solution that works well on unseen data.
In the legal domain it is important to differentiate between words in general, and afterwards to link the occurrences of the same entities. The topic to solve these challenges is called Named-Entity Linking (NEL). Current supervised neural networks designed for NEL use publicly available datasets for training and testing. However, this paper focuses especially on the aspect of applying transfer learning approach using networks trained for NEL to legal documents. Experiments show consistent improvement in the legal datasets that were created from the European Union law in the scope of this research. Using transfer learning approach, we reached F1-score of 98.90\% and 98.01\% on the legal small and large test dataset.
We present trellis networks, a new architecture for sequence modeling. On the one hand, a trellis network is a temporal convolutional network with special structure, characterized by weight tying across depth and direct injection of the input into deep layers. On the other hand, we show that truncated recurrent networks are equivalent to trellis networks with special sparsity structure in their weight matrices. Thus trellis networks with general weight matrices generalize truncated recurrent networks. We leverage these connections to design high-performing trellis networks that absorb structural and algorithmic elements from both recurrent and convolutional models. Experiments demonstrate that trellis networks outperform the current state of the art on a variety of challenging benchmarks, including word-level language modeling on Penn Treebank and WikiText-103, character-level language modeling on Penn Treebank, and stress tests designed to evaluate long-term memory retention. The code is available at https://…/trellisnet .
We observe a rapid increase in machine learning models for learning data representations that remove the semantics of protected characteristics, and are therefore able to mitigate unfair prediction outcomes. This is indeed a positive proliferation. All available models however learn latent embeddings, therefore the produced representations do not have the semantic meaning of the input. Our aim here is to learn fair representations that are directly interpretable in the original input domain. We cast this problem as a data-to-data translation; to learn a mapping from data in a source domain to a target domain such that data in the target domain enforces fairness definitions, such as statistical parity or equality of opportunity. Unavailability of fair data in the target domain is the crux of the problem. This paper provides the first approach to learn a highly unconstrained mapping from source to target by maximizing (conditional) dependence of residuals – the difference between data and its translated version – and protected characteristics. The usage of residual statistics ensures that our generated fair data should only be an adjustment of the input data, and this adjustment should reveal the main difference between protected characteristic groups. When applied to CelebA face image dataset with gender as protected characteristic, our model enforces equality of opportunity by adjusting eyes and lips regions. In Adult income dataset, also with gender as protected characteristic, our model achieves equality of opportunity by, among others, obfuscating wife and husband relationship. Visualizing those systematic changes will allow us to scrutinize the interplay of fairness criterion, chosen protected characteristics, and the prediction performance.
In this paper, we focus on solving a distributed convex optimization problem in a network, where each agent has its own convex cost function and the goal is to minimize the sum of the agents’ cost functions while obeying the network connectivity structure. In order to minimize the sum of the cost functions, we consider new distributed gradient-based methods where each node maintains two estimates, namely, an estimate of the optimal decision variable and an estimate of the gradient for the average of the agents’ objective functions. From the viewpoint of an agent, the information about the decision variable is pushed to the neighbors, while the information about the gradients is pulled from the neighbors hence giving the name ‘push-pull gradient methods’. This name is also due to the consideration of the implementation aspect: the push-communication-protocol and the pull-communication-protocol are respectively employed to implement certain steps in the numerical schemes. The methods utilize two different graphs for the information exchange among agents, and as such, unify the algorithms with different types of distributed architecture, including decentralized (peer-to-peer), centralized (master-slave), and semi-centralized (leader-follower) architecture. We show that the proposed algorithms and their many variants converge linearly for strongly convex and smooth objective functions over a network (possibly with unidirectional data links) in both synchronous and asynchronous random-gossip settings. We numerically evaluate our proposed algorithm for both static and time-varying graphs, and find that the algorithms are competitive as compared to other linearly convergent schemes.
We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly. We then examine where those strict assumptions break down and design a practical algorithm – called Discriminator Rejection Sampling (DRS) – that can be used on real data-sets. Finally, we demonstrate the efficacy of DRS on a mixture of Gaussians and on the state of the art SAGAN model. On ImageNet, we train an improved baseline that increases the best published Inception Score from 52.52 to 62.36 and reduces the Frechet Inception Distance from 18.65 to 14.79. We then use DRS to further improve on this baseline, improving the Inception Score to 76.08 and the FID to 13.75.
A deep learning approach to blind denoising of images without complete knowledge of the noise statistics is considered. We propose DN-ResNet, which is a deep convolutional neural network (CNN) consisting of several residual blocks (ResBlocks). With cascade training, DN-ResNet is more accurate and more computationally efficient than the state of art denoising networks. An edge-aware loss function is further utilized in training DN-ResNet, so that the denoising results have better perceptive quality compared to conventional loss function. Next, we introduce the depthwise separable DN-ResNet (DS-DN-ResNet) utilizing the proposed Depthwise Seperable ResBlock (DS-ResBlock) instead of standard ResBlock, which has much less computational cost. DS-DN-ResNet is incrementally evolved by replacing the ResBlocks in DN-ResNet by DS-ResBlocks stage by stage. As a result, high accuracy and good computational efficiency are achieved concurrently. Whereas previous state of art deep learning methods focused on denoising either Gaussian or Poisson corrupted images, we consider denoising images having the more practical Poisson with additive Gaussian noise as well. The results show that DN-ResNets are more efficient, robust, and perform better denoising than current state of art deep learning methods, as well as the popular variants of the BM3D algorithm, in cases of blind and non-blind denoising of images corrupted with Poisson, Gaussian or Poisson-Gaussian noise. Our network also works well for other image enhancement task such as compressed image restoration.
In this paper, we introduce a novel methodology for characterising the performance of deep learning networks (ResNets and DenseNet) with respect to training convergence and generalisation as a function of mini-batch size and learning rate for image classification. This methodology is based on novel measurements derived from the eigenvalues of the approximate Fisher information matrix, which can be efficiently computed even for high capacity deep models. Our proposed measurements can help practitioners to monitor and control the training process (by actively tuning the mini-batch size and learning rate) to allow for good training convergence and generalisation. Furthermore, the proposed measurements also allow us to show that it is possible to optimise the training process with a new dynamic sampling training approach that continuously and automatically change the mini-batch size and learning rate during the training process. Finally, we show that the proposed dynamic sampling training approach has a faster training time and a competitive classification accuracy compared to the current state of the art.
We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyper-parameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures.
Most previous research treats named entity extraction and classification as an end-to-end task. We argue that the two sub-tasks should be addressed separately. Entity extraction lies at the level of syntactic analysis while entity classification lies at the level of semantic analysis. According to Noam Chomsky’s ‘Syntactic Structures,’ pp. 93-94 (Chomsky 1957), syntax is not appealed to semantics and semantics does not affect syntax. We analyze two benchmark datasets for the characteristics of named entities, finding that uncommon words can distinguish named entities from common text; where uncommon words are the words that hardly appear in common text and they are mainly the proper nouns. Experiments validate that lexical and syntactic features achieve state-of-the-art performance on entity extraction and that semantic features do not further improve the extraction performance, in both of our model and the state-of-the-art baselines. With Chomsky’s view, we also explain the failure of joint syntactic and semantic parsings in other works.
Principal component analysis (PCA) is widely used for dimension reduction and embedding of real data in social network analysis, information retrieval, and natural language processing, etc. In this work we propose a fast randomized PCA algorithm for processing large sparse data. The algorithm has similar accuracy to the basic randomized SVD (rPCA) algorithm (Halko et al., 2011), but is largely optimized for sparse data. It also has good flexibility to trade off runtime against accuracy for practical usage. Experiments on real data show that the proposed algorithm is up to 9.1X faster than the basic rPCA algorithm without accuracy loss, and is up to 20X faster than the svds in Matlab with little error. The algorithm computes the first 100 principal components of a large information retrieval data with 12,869,521 persons and 323,899 keywords in less than 400 seconds on a 24-core machine, while all conventional methods fail due to the out-of-memory issue.
Recently, so-called treebased phylogenetic networks have gained considerable interest in the literature, where a treebased network is a network that can be constructed from a phylogenetic tree, called the \emph{base tree}, by adding additional edges. The main aim of this manuscript is to provide some sufficient criteria for treebasedness by reducing phylogenetic networks to related graph structures. While it is generally known that deciding whether a network is treebased is NP-complete, one of these criteria, namely \emph{edgebasedness}, can be verified in polynomial time. Next to these edgebased networks, we introduce further classes of treebased networks and analyze their relationships.
Valuable training data is often owned by independent organizations and located in multiple data centers. Most deep learning approaches require to centralize the multi-datacenter data for performance purpose. In practice, however, it is often infeasible to transfer all data to a centralized data center due to not only bandwidth limitation but also the constraints of privacy regulations. Model averaging is a conventional choice for data parallelized training, but its ineffectiveness is claimed by previous studies as deep neural networks are often non-convex. In this paper, we argue that model averaging can be effective in the decentralized environment by using two strategies, namely, the cyclical learning rate and the increased number of epochs for local model training. With the two strategies, we show that model averaging can provide competitive performance in the decentralized mode compared to the data-centralized one. In a practical environment with multiple data centers, we conduct extensive experiments using state-of-the-art deep network architectures on different types of data. Results demonstrate the effectiveness and robustness of the proposed method.
Cyber-Physical Systems (CPS) are systems composed by a physical component that is controlled or monitored by a cyber-component, a computer-based algorithm. Advances in CPS technologies and science are enabling capability, adaptability, scalability, resiliency, safety, security, and usability that will far exceed the simple embedded systems of today. CPS technologies are transforming the way people interact with engineered systems. New smart CPS are driving innovation in various sectors such as agriculture, energy, transportation, healthcare, and manufacturing. They are leading the 4-th Industrial Revolution (Industry 4.0) that is having benefits thanks to the high flexibility of production. The Industry 4.0 production paradigm is characterized by high intercommunicating properties of its production elements in all the manufacturing processes. This is the reason it is a core concept how the systems should be structurally optimized to have the adequate level of redundancy to be satisfactorily resilient. This goal can benefit from formal methods well known in various scientific domains such as artificial intelligence. So, the current research concerns the proposal of a CPS meta-model and its instantiation. In this way it lists all kind of relationships that may occur between the CPSs themselves and between their (cyber-and physical-) components. Using the CPS meta-model formalization, with an adaptation of the Formal Concept Analysis (FCA) formal approach, this paper presents a way to optimize the modelling of CPS systems emphasizing their redundancy and their resiliency.
Network representation learning (NRL) methods aim to map each vertex into a low dimensional space by preserving the local and global structure of a given network, and in recent years they have received a significant attention thanks to their success in several challenging problems. Although various approaches have been proposed to compute node embeddings, many successful methods benefit from random walks in order to transform a given network into a collection of sequences of nodes and then they target to learn the representation of nodes by predicting the context of each vertex within the sequence. In this paper, we introduce a general framework to enhance the embeddings of nodes acquired by means of the random walk-based approaches. Similar to the notion of topical word embeddings in NLP, the proposed method assigns each vertex to a topic with the favor of various statistical models and community detection methods, and then generates the enhanced community representations. We evaluate our method on two downstream tasks: node classification and link prediction. The experimental results demonstrate that the incorporation of vertex and topic embeddings outperform widely-known baseline NRL methods.
Artificial neural networks that learn to perform Principal Component Analysis (PCA) and related tasks using strictly local learning rules have been previously derived based on the principle of similarity matching: similar pairs of inputs should map to similar pairs of outputs. However, the operation of these networks (and of similar networks) requires a fixed-point iteration to determine the output corresponding to a given input, which means that dynamics must operate on a faster time scale than the variation of the input. Further, during these fast dynamics such networks typically ‘disable’ learning, updating synaptic weights only once the fixed-point iteration has been resolved. Here, we derive a network for PCA-based dimensionality reduction that avoids this fast fixed-point iteration. The key novelty of our approach is a modification of the similarity matching objective to encourage near-diagonality of a synaptic weight matrix. We then approximately invert this matrix using a Taylor series approximation, replacing the previous fast iterations. In the offline setting, our algorithm corresponds to a dynamical system, the stability of which we rigorously analyze. In the online setting (i.e., with stochastic gradients), we map our algorithm to a familiar neural network architecture and give numerical results showing that our method converges at a competitive rate. The computational complexity per iteration of our online algorithm is linear in the total degrees of freedom, which is in some sense optimal.
Ranking algorithms are the information gatekeepers of the Internet era. We develop a stylized model to study the effects of ranking algorithms on opinion dynamics. We consider a search engine that uses an algorithm based on popularity and on personalization. We find that popularity-based rankings generate an advantage of the fewer effect: fewer websites reporting a given signal attract relatively more traffic overall. This highlights a novel, ranking-driven channel that explains the diffusion of misinformation, as websites reporting incorrect information may attract an amplified amount of traffic precisely because they are few. Furthermore, when individuals provide sufficiently positive feedback to the ranking algorithm, popularity-based rankings tend to aggregate information while personalization acts in the opposite direction.
While deep neural networks (DNNs) can perform complex classification tasks, most of their natural inputs do not necessitate the depth of the modern architectures. This leads to wasted computation, as the network overthinks on the simpler inputs. The overthinking problem could be prevented if standard DNNs could produce early predictions. However, prior work suggests that this is challenging in existing architectures, such as ResNet, as their internal layers are not trained for classification and optimizing them for accurate predictions hurts the end performance. In this paper, we explore the overthinking problem, and, as a remedy, we propose a generic modification to off-the-shelf DNNs—the Shallow-Deep Network (SDN). With this modification, a DNN can efficiently produce predictions from either shallow or deep layers, as appropriate for the given input. We employ feature reduction and a layer-wise objective function to train these progressively deeper internal classifiers while preserving the end-performance. We can apply the SDN modification either by training from scratch or by tuning a pre-trained model. Experiments on four architectures (VGG, ResNet, WideResNet, and MobileNet) and three image classifications tasks suggest that, for an average input, an SDN can produce a correct prediction before its middle layer. By avoiding unnecessary computation, the SDN can reduce the required number of operations for an input by 41% over the original network. Finally, we observe that disagreements among the early classifiers reliably indicate inputs where the network is likely to make a mistake. Building on this observation we propose an internal confusion metric and a method to diagnose misclassifications by visualizing these disagreements.
Traditional data quality control methods are based on users experience or previously established business rules, and this limits performance in addition to being a very time consuming process with lower than desirable accuracy. Utilizing deep learning, we can leverage computing resources and advanced techniques to overcome these challenges and provide greater value to users. In this paper, we, the authors, first review relevant works and discuss machine learning techniques, tools, and statistical quality models. Second, we offer a creative data quality framework based on deep learning and statistical model algorithm for identifying data quality. Third, we use data involving salary levels from an open dataset published by the state of Arkansas to demonstrate how to identify outlier data and how to improve data quality via deep learning. Finally, we discuss future work.
A machine learning model may exhibit discrimination when used to make decisions involving people. One potential cause for such outcomes is that the model uses a statistical proxy for a protected demographic attribute. In this paper we formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies. Our definition follows recent work on proxies in classification models, and characterizes a model’s constituent behavior that: 1) correlates closely with a protected random variable, and 2) is causally influential in the overall behavior of the model. We show that proxies in linear regression models can be efficiently identified by solving a second-order cone program, and further extend this result to account for situations where the use of a certain input variable is justified as a `business necessity’. Finally, we present empirical results on two law enforcement datasets that exhibit varying degrees of racial disparity in prediction outcomes, demonstrating that proxies shed useful light on the causes of discriminatory behavior in models.
We propose a novel Bayesian approach to modelling multimodal data generated by multiple independent processes, simultaneously solving the data association and induced supervised learning problems. Underpinning our approach is the use of Gaussian process priors which encode structure both on the functions and the associations themselves. The association of samples and functions are determined by taking both inputs and outputs into account while also obtaining a posterior belief about the relevance of the global components throughout the input space. We present an efficient learning scheme based on doubly stochastic variational inference and discuss how it can be applied to deep Gaussian process priors. We show results for an artificial data set, a noise separation problem, and a multimodal regression problem based on the cart-pole benchmark.
Applying Machine Learning (ML) to business applications for automation usually faces difficulties when integrating diverse ML dependencies and services, mainly because of the lack of a common ML framework. In most cases, the ML models are developed for applications which are targeted for specific business domain use cases, leading to duplicated effort, and making reuse impossible. This paper presents Acumos, an open platform capable of packaging ML models into portable containerized microservices which can be easily shared via the platform’s catalog, and can be integrated into various business applications. We present a case study of packaging sentiment analysis and classification ML models via the Acumos platform, permitting easy sharing with others. We demonstrate that the Acumos platform reduces the technical burden on application developers when applying machine learning models to their business applications. Furthermore, the platform allows the reuse of readily available ML microservices in various business domains.