Legal Document Retrieval using Document Vector Embeddings and Deep Learning

Domain specific information retrieval process has been a prominent and ongoing research in the field of natural language processing. Many researchers have incorporated different techniques to overcome the technical and domain specificity and provide a mature model for various domains of interest. The main bottleneck in these studies is the heavy coupling of domain experts, that makes the entire process to be time consuming and cumbersome. In this study, we have developed three novel models which are compared against a golden standard generated via the on line repositories provided, specifically for the legal domain. The three different models incorporated vector space representations of the legal domain, where document vector generation was done in two different mechanisms and as an ensemble of the above two. This study contains the research being carried out in the process of representing legal case documents into different vector spaces, whilst incorporating semantic word measures and natural language processing techniques. The ensemble model built in this study, shows a significantly higher accuracy level, which indeed proves the need for incorporation of domain specific semantic similarity measures into the information retrieval process. This study also shows, the impact of varying distribution of the word similarity measures, against varying document vector dimensions, which can lead to improvements in the process of legal information retrieval.

Contextual Policy Optimisation

Policy gradient methods have been successfully applied to a variety of reinforcement learning tasks. However, while learning in a simulator, these methods do not utilise the opportunity to improve learning by adjusting certain environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but that are controllable in a simulator. This can lead to slow learning, or convergence to highly suboptimal policies. In this paper, we present contextual policy optimisation (CPO). The central idea is to use Bayesian optimisation to actively select the distribution of the environment variable that maximises the improvement generated by each iteration of the policy gradient method. To make this Bayesian optimisation practical, we contribute two easy-to-compute low-dimensional fingerprints of the current policy. We apply CPO to a number of continuous control tasks of varying difficulty and show that CPO can efficiently learn policies that are robust to significant rare events, which are unlikely to be observable under random sampling but are key to learning good policies.

Distributed Big-Data Optimization via Block Communications

We study distributed multi-agent large-scale optimization problems, wherein the cost function is composed of a smooth possibly nonconvex sum-utility plus a DC (Difference-of-Convex) regularizer. We consider the scenario where the dimension of the optimization variables is so large that optimizing and/or transmitting the entire set of variables could cause unaffordable computation and communication overhead. To address this issue, we propose the first distributed algorithm whereby agents optimize and communicate only a portion of their local variables. The scheme hinges on successive convex approximation (SCA) to handle the nonconvexity of the objective function, coupled with a novel block-signal tracking scheme, aiming at locally estimating the average of the agents’ gradients. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Numerical results on a sparse regression problem show the effectiveness of the proposed algorithm and the impact of the block size on its practical convergence speed and communication cost.

Fast K-Means Clustering with Anderson Acceleration

We propose a novel method to accelerate Lloyd’s algorithm for K-Means clustering. Unlike previous acceleration approaches that reduce computational cost per iterations or improve initialization, our approach is focused on reducing the number of iterations required for convergence. This is achieved by treating the assignment step and the update step of Lloyd’s algorithm as a fixed-point iteration, and applying Anderson acceleration, a well-established technique for accelerating fixed-point solvers. Classical Anderson acceleration utilizes m previous iterates to find an accelerated iterate, and its performance on K-Means clustering can be sensitive to choice of m and the distribution of samples. We propose a new strategy to dynamically adjust the value of m, which achieves robust and consistent speedups across different problem instances. Our method complements existing acceleration techniques, and can be combined with them to achieve state-of-the-art performance. We perform extensive experiments to evaluate the performance of the proposed method, where it outperforms other algorithms in 106 out of 120 test cases, and the mean decrease ratio of computational time is more than 33%.

Contextual Graph Markov Model: A Deep and Generative Approach to Graph Processing

We introduce the Contextual Graph Markov Model, an approach combining ideas from generative models and neural networks for the processing of graph data. It founds on a constructive methodology to build a deep architecture comprising layers of probabilistic models that learn to encode the structured information in an incremental fashion. Context is diffused in an efficient and scalable way across the graph vertexes and edges. The resulting graph encoding is used in combination with discriminative models to address structure classification benchmarks.

Anomaly Detection and Localization in Crowded Scenes by Motion-field Shape Description and Similarity-based Statistical Learning

In crowded scenes, detection and localization of abnormal behaviors is challenging in that high-density people make object segmentation and tracking extremely difficult. We associate the optical flows of multiple frames to capture short-term trajectories and introduce the histogram-based shape descriptor referred to as shape contexts to describe such short-term trajectories. Furthermore, we propose a K-NN similarity-based statistical model to detect anomalies over time and space, which is an unsupervised one-class learning algorithm requiring no clustering nor any prior assumption. Firstly, we retrieve the K-NN samples from the training set in regard to the testing sample, and then use the similarities between every pair of the K-NN samples to construct a Gaussian model. Finally, the probabilities of the similarities from the testing sample to the K-NN samples under the Gaussian model are calculated in the form of a joint probability. Abnormal events can be detected by judging whether the joint probability is below predefined thresholds in terms of time and space, separately. Such a scheme can adapt to the whole scene, since the probability computed as such is not affected by motion distortions arising from perspective distortion. We conduct experiments on real-world surveillance videos, and the results demonstrate that the proposed method can reliably detect and locate the abnormal events in the video sequences, outperforming the state-of-the-art approaches.

Metric-Optimized Example Weights

Real-world machine learning applications often have complex test metrics, and may have training and test data that follow different distributions. We propose addressing these issues by using a weighted loss function with a standard convex loss, but with weights on the training examples that are learned to optimize the test metric of interest on the validation set. These metric-optimized example weights can be learned for any test metric, including black box losses and customized metrics for specific applications. We illustrate the performance of our proposal with public benchmark datasets and real-world applications with domain shift and custom loss functions that balance multiple objectives, impose fairness policies, and are non-convex and non-decomposable.

Robust Accelerated Gradient Method

We study the trade-off between rate of convergence and robustness to gradient errors in designing a first-order algorithm. In particular, we focus on gradient descent (GD) and Nesterov’s accelerated gradient (AG) method for strongly convex quadratic objectives when the gradient has random errors in the form of additive white noise. To characterize robustness, we consider the asymptotic normalized variance of the centered iterate sequence which measures the asymptotic accuracy of the iterates. Using tools from robust control theory, we develop a tractable algorithm that allows us to set the parameters of each algorithm to achieve a particular trade-off between these two performance objectives. Our results show that there is a fundamental lower bound on the robustness level of an algorithm for any achievable rate. For the same achievable rate, we show that AG with tuned parameters is always more robust than GD to gradient errors. Similarly, for the same robustness level, we show that AG can be tuned to be always faster than GD. Our results show that AG can achieve acceleration while being more robust to random gradient errors. This behavior is quite different than previously reported in the deterministic gradient noise setting.

NetSim — The framework for complex network generator

Networks are everywhere and their many types, including social networks, the Internet, food webs etc., have been studied for the last few decades. However, in real-world networks, it’s hard to find examples that can be easily comparable, i.e. have the same density or even number of nodes and edges. We propose a flexible and extensible NetSim framework to understand how properties in different types of networks change with varying number of edges and vertices. Our approach enables to simulate three classical network models (random, small-world and scale-free) with easily adjustable model parameters and network size. To be able to compare different networks, for a single experimental setup we kept the number of edges and vertices fixed across the models. To understand how they change depending on the number of nodes and edges we ran over 30,000 simulations and analysed different network characteristics that cannot be derived analytically. Two of the main findings from the analysis are that the average shortest path does not change with the density of the scale-free network but changes for small-world and random networks; the apparent difference in mean betweenness centrality of the scale-free network compared with random and small-world networks.

A Survey of Parallel Sequential Pattern Mining

With the growing popularity of resource sharing and shared resources, large volumes of complex data of different types are collected automatically. Traditional data mining algorithms generally have problems and challenges including huge memory cost, low processing speed, and inadequate hard disk space. For sequential pattern mining (SPM), it is used in a wide variety of real-life applications. However, it is more complex and challenging than frequent itemset mining, and also suffers from the above challenges when handling the large-scale data. To solve these problems, mining sequential patterns in a parallel computing environment has emerged as an important issue with many applications. In this paper, an in-depth survey of the current status of parallel sequential pattern mining (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state of the art parallel SPM. We review the related work of PSPM in detail, including partition-based algorithms for PSPM, Apriori-based PSPM, pattern growth based PSPM, and hybrid algorithms for PSPM, and provide deep description (i.e., characteristics, advantages, and disadvantages) of each parallel approach of PSPM. Some advanced topics for PSPM and the related open-source software are further reviewed in details. Finally, we summarize some challenges and opportunities of PSPM in the big data era.

A Survey of Utility-Oriented Pattern Mining

The main purpose of data mining and analytics is to find novel, potentially useful patterns that can be utilized in real-world applications to derive beneficial knowledge. For identifying and evaluating the usefulness of different kinds of patterns, many techniques/constraints have been proposed, such as support, confidence, sequence order, and utility parameters (e.g., weight, price, profit, quantity, etc.). In recent years, there has been an increasing demand for utility-oriented pattern mining (UPM). UPM is a vital task, with numerous high-impact applications, including cross-marketing, e-commerce, finance, medical, and biomedical applications. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods of UPM. First, we introduce an in-depth understanding of UPM, including concepts, examples, and comparisons with related concepts. A taxonomy of the most common and state-of-the-art approaches for mining different kinds of high-utility patterns is presented, including Apriori-based, tree-based, projection-based, vertical-/horizontal-data-format-based, and other hybrid approaches. A comprehensive review of advanced topics of existing high-utility pattern mining techniques is offered, with a discussion of their pros and cons. Finally, we present several well-known open-source software packages for UPM. We conclude our survey with a discussion on open and practical challenges in this field.

Nonlinear Inductive Matrix Completion based on One-layer Neural Networks

The goal of a recommendation system is to predict the interest of a user in a given item by exploiting the existing set of ratings as well as certain user/item features. A standard approach to modeling this problem is Inductive Matrix Completion where the predicted rating is modeled as an inner product of the user and the item features projected onto a latent space. In order to learn the parameters effectively from a small number of observed ratings, the latent space is constrained to be low-dimensional which implies that the parameter matrix is constrained to be low-rank. However, such bilinear modeling of the ratings can be limiting in practice and non-linear prediction functions can lead to significant improvements. A natural approach to introducing non-linearity in the prediction function is to apply a non-linear activation function on top of the projected user/item features. Imposition of non-linearities further complicates an already challenging problem that has two sources of non-convexity: a) low-rank structure of the parameter matrix, and b) non-linear activation function. We show that one can still solve the non-linear Inductive Matrix Completion problem using gradient descent type methods as long as the solution is initialized well. That is, close to the optima, the optimization function is strongly convex and hence admits standard optimization techniques, at least for certain activation functions, such as Sigmoid and tanh. We also highlight the importance of the activation function and show how ReLU can behave significantly differently than say a sigmoid function. Finally, we apply our proposed technique to recommendation systems and semi-supervised clustering, and show that our method can lead to much better performance than standard linear Inductive Matrix Completion methods.

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Recent years have witnessed the enormous success of low-dimensional vector space representations of knowledge graphs to predict missing facts or find erroneous ones. Currently, however, it is not yet well-understood how ontological knowledge, e.g. given as a set of (existential) rules, can be embedded in a principled way. To address this shortcoming, in this paper we introduce a framework based on convex regions, which can faithfully incorporate ontological knowledge into the vector space embedding. Our technical contribution is two-fold. First, we show that some of the most popular existing embedding approaches are not capable of modelling even very simple types of rules. Second, we show that our framework can represent ontologies that are expressed using so-called quasi-chained existential rules in an exact way, such that any set of facts which is induced using that vector space embedding is logically consistent and deductively closed with respect to the input ontology.

Geometric Understanding of Deep Learning

Deep learning is the mainstream technique for many machine learning tasks, including image recognition, machine translation, speech recognition, and so on. It has outperformed conventional methods in various fields and achieved great successes. Unfortunately, the understanding on how it works remains unclear. It has the central importance to lay down the theoretic foundation for deep learning. In this work, we give a geometric view to understand deep learning: we show that the fundamental principle attributing to the success is the manifold structure in data, namely natural high dimensional data concentrates close to a low-dimensional manifold, deep learning learns the manifold and the probability distribution on it. We further introduce the concepts of rectified linear complexity for deep neural network measuring its learning capability, rectified linear complexity of an embedding manifold describing the difficulty to be learned. Then we show for any deep neural network with fixed architecture, there exists a manifold that cannot be learned by the network. By empirical evidences, we also demonstrate the learning accuracies of the-state-of-art autoencoders are reasonably good but still leave large spaces to be improved. Finally, we propose to apply optimal mass transportation theory to control the probability distribution in the latent space.

Fast Policy Learning through Imitation and Reinforcement

Imitation learning (IL) consists of a set of tools that leverage expert demonstrations to quickly learn policies. However, if the expert is suboptimal, IL can yield policies with inferior performance compared to reinforcement learning (RL). In this paper, we aim to provide an algorithm that combines the best aspects of RL and IL. We accomplish this by formulating several popular RL and IL algorithms in a common mirror descent framework, showing that these algorithms can be viewed as a variation on a single approach. We then propose LOKI, a strategy for policy learning that first performs a small but random number of IL iterations before switching to a policy gradient RL method. We show that if the switching time is properly randomized, LOKI can learn to outperform a suboptimal expert and converge faster than running policy gradient from scratch. Finally, we evaluate the performance of LOKI experimentally in several simulated environments.

Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance

Large amounts of labeled data are typically required to train deep learning models. For many real-world problems, however, acquiring additional data can be expensive or even impossible. We present semi-supervised deep kernel learning (SSDKL), a semi-supervised regression model based on minimizing predictive variance in the posterior regularization framework. SSDKL combines the hierarchical representation learning of neural networks with the probabilistic modeling capabilities of Gaussian processes. By leveraging unlabeled data, we show improvements on a diverse set of real-world regression tasks over supervised deep kernel learning and semi-supervised methods such as VAT and mean teacher adapted for regression.

Enhancing the Accuracy and Fairness of Human Decision Making

Societies often rely on human experts to take a wide variety of decisions affecting their members, from jail-or-release decisions taken by judges and stop-and-frisk decisions taken by police officers to accept-or-reject decisions taken by academics. In this context, each decision is taken by an expert who is typically chosen uniformly at random from a pool of experts. However, these decisions may be imperfect due to limited experience, implicit biases, or faulty probabilistic reasoning. Can we improve the accuracy and fairness of the overall decision making process by optimizing the assignment between experts and decisions In this paper, we address the above problem from the perspective of sequential decision making and show that, for different fairness notions from the literature, it reduces to a sequence of (constrained) weighted bipartite matchings, which can be solved efficiently using algorithms with approximation guarantees. Moreover, these algorithms also benefit from posterior sampling to actively trade off exploitation—selecting expert assignments which lead to accurate and fair decisions—and exploration—selecting expert assignments to learn about the experts’ preferences and biases. We demonstrate the effectiveness of our algorithms on both synthetic and real-world data and show that they can significantly improve both the accuracy and fairness of the decisions taken by pools of experts.

One-ended spanning trees in amenable unimodular graphs
A generating polynomial for the pretzel knot
Adaptive algorithms for mirror descent in convex programming problems with Lipschitz constraints
ADMM for combinatorial graph problems
On a stochastic model of epidemic spread with an application to competing infections
Powers of Hamiltonian cycles in randomly augmented graphs
Existence and exponential estimates for the solutions to Neutral Stochastic Functional Differential Equations with Infinite Delay
A note on belief structures and S-approximation spaces
Adversarial Deformation Regularization for Training Image Registration Neural Networks
Towards Multifocal Displays with Dense Focal Stacks
High fidelity GHZ generation within nearby nodes
Existence and asymptotic properties for the solutions to nonlinear SFDEs driven by G-Brownian motion with infinite delay
Property Testing of Planarity in the CONGEST model
Preferential Attachment When Stable
Defending Against Adversarial Attacks by Leveraging an Entire GAN
Palindromes in starlike trees
Comparison of VCA and GAEE algorithms for Endmember Extraction
Sparse Antenna and Pulse Placement for Colocated MIMO Radar
BIC extensions for order-constrained model selection
Assessing monotonicity of transfer functions in nonlinear dynamical control systems
A Simple Riemannian Manifold Network for Image Set Classification
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
Spectral Efficiency Analysis in Cell-Free Massive MIMO Systems with Zero-Forcing Detector
Understanding and Monitoring Human Trafficking via Social Sensors: A Sociological Approach
Dynamic Network Model from Partial Observations
A Local Information Criterion for Dynamical Systems
Robust Hypothesis Testing Using Wasserstein Uncertainty Sets
Reduction of the Pareto Set in Bicriteria Asymmetric Traveling Salesman Problem
Deployment of Customized Deep Learning based Video Analytics On Surveillance Cameras
Generative Adversarial Image Synthesis with Decision Tree Latent Controller
Spectral Clustering for Multiple Sparse Networks: I
Semantic Explanations of Predictions
Convolutional neural networks for chemical-disease relation extraction are improved with character-based word embeddings
A Rigorous Result about Gibbs Measure
Dual Swap Disentangling
Adaptive Signal Inclusion With Genomic Applications
Settling some sum suppositions
Generating Fine-Grained Open Vocabulary Entity Type Descriptions
Minimum Distance of New Generalizations of the Punctured Binary Reed-Muller Codes
cpSGD: Communication-efficient and differentially-private distributed SGD
DPW-SDNet: Dual Pixel-Wavelet Domain Deep CNNs for Soft Decoding of JPEG-Compressed Images
Hierarchical Representation Learning for Kinship Verification
Deep Watershed Detector for Music Object Recognition
Using Syntax to Ground Referring Expressions in Natural Images
A Nonlocal InSAR Filter for High-Resolution DEM Generation from TanDEM-X Interferograms
Reliability Estimation in Coherent Systems
Video Summarization Using Fully Convolutional Sequence Networks
Product-Closing Approximation for Nonparametric Choice Network Revenue Management
Unsupervised Learning with Stein’s Unbiased Risk Estimator
Dependent Gated Reading for Cloze-Style Question Answering
Calibrating Deep Convolutional Gaussian Processes
Joint Frequency Regulation and Economic Dispatch Using Limited Communication
Cutoff for the cyclic adjacent transposition shuffle
Cookie Synchronization: Everything You Always Wanted to Know But Were Afraid to Ask
Reduction of the Pareto Set in Multicriteria Economic Problem with CES Functions
Data-Aware Approximate Workflow Scheduling
Automatic context window composition for distant speech recognition
Interaction-enhanced integer quantum Hall effect in disordered systems
Stable Geodesic Update on Hyperbolic Space and its Application to Poincare Embeddings
Vehicle Instance Segmentation from Aerial Image and Video Using a Multi-Task Learning Residual Fully Convolutional Network
Look at Boundary: A Boundary-Aware Face Alignment Algorithm
L1-(2D)2PCANet: A Deep Learning Network for Face Recognition
Reply to Comment on ‘Replica symmetry breaking in trajectories of a driven Brownian particle’
Revisiting Reweighted Wake-Sleep
A short note on the multiplicative energy of the spectrum of a set
SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term Embeddings
A Storage-Computation-Communication Tradeoff for Distributed Computing
Intensive Preprocessing of KDD Cup 99 for Network Intrusion Classification Using Machine Learning Techniques
Weil sums of binomials: properties and applications
Fine-Grained Age Estimation in the wild with Attention LSTM Networks
Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere
A novel hybrid score level and decision level fusion scheme for cancelable multi-biometric verification
Environment-Aware Deployment of Wireless Drones Base Stations with Google Earth Simulator
Sphere Decoder with Box Optimization for FTN Non-orthogonal FDM System
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Human Action Generation with Generative Adversarial Networks
Connecting Distant Entities with Induction through Conditional Random Fields for Named Entity Recognition: Precursor-Induced CRF
Online Advance Admission Scheduling for Services with Customer Preferences
On the non-existence of linear perfect Lee codes: The Zhang-Ge condition and a new polynomial criterion
The Singular Values of Convolutional Layers
Robust Nonparametric Regression under Huber’s $ε$-contamination Model
Deep Convolutional Neural Networks for Map-Type Classification
Nilpotent Morse algebra and time evolution of certain associated coherent states
Toward Abstractive Summarization Using Semantic Representations
Three-Dimensional Radiotherapy Dose Prediction on Head and Neck Cancer Patients with a Hierarchically Densely Connected U-net Deep Learning Architecture
An Improved Phrase-based Approach to Annotating and Summarizing Student Course Responses
Automatic Summarization of Student Course Feedback
Modeling Language Vagueness in Privacy Policies using Deep Neural Networks
Reinforced Extractive Summarization with Question-Focused Rewards
Estimating Shell-Index in a Graph with Local Information
Toward Extractive Summarization of Online Forum Discussions via Hierarchical Attention Networks
A Study of Question Effectiveness Using Reddit ‘Ask Me Anything’ Threads
OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models
Analysis of Ergodic Rate for Transmit Antenna Selection in Low-Resolution ADC Systems
Large-scale Distance Metric Learning with Uncertainty
UWB Channel Sounding and Modeling for UAV Air-to-Ground Propagation Channels
Gradient Coding via the Stochastic Block Model
Variational Measure Preserving Flows
Less is More: Simultaneous View Classification and Landmark Detection for Abdominal Ultrasound Images
Origami Inspired Reconfigurable Antenna for Wireless Communication Systems
When Recurrent Models Don’t Need To Be Recurrent
Heterogeneous Bitwidth Binarization in Convolutional Neural Networks
Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization
Analysing Symbolic Regression Benchmarks under a Meta-Learning Approach
Detecting Deceptive Reviews using Generative Adversarial Networks
What Face and Body Shapes Can Tell About Height
A Scalable Approach to Multi-Context Continual Learning via Lifelong Skill Encoding
Nonparametric estimation of service time characteristics in infinite-server queues with nonstationary Poisson input
Tensorized Spectrum Preserving Compression for Neural Networks
The Architectural Implications of Microservices in the Cloud
A general construction of Ordered Orthogonal Arrays using LFSRs
Guaranteed Simultaneous Asymmetric Tensor Decomposition via Orthogonalized Alternating Least Squares
Pathology Segmentation using Distributional Differences to Images of Healthy Origin
The On-Line Encyclopedia of Integer Sequences
An end-to-end Differentially Private Latent Dirichlet Allocation Using a Spectral Algorithm
Zero-Shot Dual Machine Translation
Extended Formulations for Radial Cones
On the Relative Gain Array (RGA) with Singular and Rectangular Matrices
Learning Self-Imitating Diverse Policies
Forecasting the successful execution of horizontal strategy in a diversified corporation via a DEMATEL-supported artificial neural network – A case study
Using transfer learning to detect galaxy mergers
Randomized Robust Matrix Completion for the Community Detection Problem
Scalable Methods for 8-bit Training of Neural Networks
Personalized Influence Estimation Technique
Think Visually: Question Answering through Virtual Imagery
Proof of logarithmic stake in block-chain cash system
Dynamicity and Durability in Scalable Visual Instance Search
Scalable Spectral Clustering Using Random Binning Features
Predicting Electron Paths
A Double Machine Learning Approach to Estimate the Effects of Musical Practice on Student’s Skills