The semantic web has received many contributions of researchers as ontologies which, in this context, i.e. within RDF linked data, are formalized conceptualizations that might use different protocols, such as RDFS, OWL DL and OWL FULL. In this article, we describe new expressive techniques which were found necessary after elaborating dozens of OWL ontologies for the scientific academy, the State and the civil society. They consist in: 1) stating possible uses a property might have without incurring into axioms or restrictions; 2) assigning a level of priority for an element (class, property, triple); 3) correct depiction in diagrams of relations between classes, between individuals which are imperative, and between individuals which are optional; 4) a convenient association between OWL classes and SKOS concepts. We propose specific rules to accomplish these enhancements and exemplify both its use and the difficulties that arise because these techniques are currently not established as standards to the ontology designer.
Conjugate gradient methods are a class of important methods for solving linear equations and nonlinear optimization. In our work, we propose a new stochastic conjugate gradient algorithm with variance reduction (CGVR) and prove its linear convergence with the Fletcher and Revves method for strongly convex and smooth functions. We experimentally demonstrate that the CGVR algorithm converges faster than its counterparts for six large-scale optimization problems that may be convex, non-convex or non-smooth, and its AUC (Area Under Curve) performance with $L2$-regularized $L2$-loss is comparable to that of LIBLINEAR but with significant improvement in computational efficiency.
We propose a likelihood ratio test framework for testing normal mean vectors in high-dimensional data under two common scenarios: the one-sample test and the two-sample test with equal covariance matrices. We derive the test statistics under the assumption that the covariance matrices follow a diagonal matrix structure. In comparison with the diagonal Hotelling’s tests, our proposed test statistics display some interesting characteristics. In particular, they are a summation of the log-transformed squared t-statistics rather than a direct summation of those components. More importantly, to derive the asymptotic normality of our test statistics under the null and local alternative hypotheses, we do not require the assumption that the covariance matrix follows a diagonal matrix structure. As a consequence, our proposed test methods are very flexible and can be widely applied in practice. Finally, simulation studies and a real data analysis are also conducted to demonstrate the advantages of our likelihood ratio test method.
The goal of regression and classification methods in supervised learning is to minimize the empirical risk, that is, the expectation of some loss function quantifying the prediction error under the empirical distribution. When facing scarce training data, overfitting is typically mitigated by adding regularization terms to the objective that penalize hypothesis complexity. In this paper we introduce new regularization techniques using ideas from distributionally robust optimization, and we give new probabilistic interpretations to existing techniques. Specifically, we propose to minimize the worst-case expected loss, where the worst case is taken over the ball of all (continuous or discrete) distributions that have a bounded transportation distance from the (discrete) empirical distribution. By choosing the radius of this ball judiciously, we can guarantee that the worst-case expected loss provides an upper confidence bound on the loss on test data, thus offering new generalization bounds. We prove that the resulting regularized learning problems are tractable and can be tractably kernelized for many popular loss functions. We validate our theoretical out-of-sample guarantees through simulated and empirical experiments.
Deep learning (DL) advances state-of-the-art reinforcement learning (RL), by incorporating deep neural networks in learning representations from the input to RL. However, the conventional deep neural network architecture is limited in learning representations for multi-task RL (MT-RL), as multiple tasks can refer to different kinds of representations. In this paper, we thus propose a novel deep neural network architecture, namely generalization tower network (GTN), which can achieve MT-RL within a single learned model. Specifically, the architecture of GTN is composed of both horizontal and vertical streams. In our GTN architecture, horizontal streams are used to learn representation shared in similar tasks. In contrast, the vertical streams are introduced to be more suitable for handling diverse tasks, which encodes hierarchical shared knowledge of these tasks. The effectiveness of the introduced vertical stream is validated by experimental results. Experimental results further verify that our GTN architecture is able to advance the state-of-the-art MT-RL, via being tested on 51 Atari games.
Processing of streaming time series data from sensors with lower latency and limited computing resource comes to a critical problem as the growth of Industry 4.0 and Industry Internet of Things(IIoT). To tackle the real world challenge in this area, like equipment health monitoring by comparing the incoming data stream with known fault patterns, we formulate a new problem, called ‘fine-grained pattern matching’. It allows users to define varied deviations to different segments of a given pattern, and fuzzy breakpoint of adjunct segments, which urges the dramatically increased complexity against traditional pattern matching problem over stream. In this paper, we propose a novel 2-phase approach to solve this problem. In pruning phase, we propose ELB(Equal Length Block) Representation and BSP (Block-Skipping Pruning) policy, which efficiently filter the unmatched subsequence with the guarantee of no-false dismissals. In post-processing phase, we provide an algorithm to further examine the possible matches in linear complexity. We conducted an extensive experimental evaluation on synthetic and real-world datasets, which illustrates that our algorithm outperforms the brute-force method and MSM, a multi-step filter mechanism over the multi-scaled representation, by orders of magnitude.
We consider the problem of performing inverse reinforcement learning when the trajectory of the expert is not perfectly observed by the learner. Instead, a noisy continuous-time observation of the trajectory is provided to the learner. This problem exhibits wide-ranging applications and the specific application we consider here is the scenario in which the learner seeks to penetrate a perimeter patrolled by a robot. The learner’s field of view is limited due to which it cannot observe the patroller’s complete trajectory. Instead, we allow the learner to listen to the expert’s movement sound, which it can also use to estimate the expert’s state and action using an observation model. We treat the expert’s state and action as hidden data and present an algorithm based on expectation maximization and maximum entropy principle to solve the non-linear, non-convex problem. Related work considers discrete-time observations and an observation model that does not include actions. In contrast, our technique takes expectations over both state and action of the expert, enabling learning even in the presence of extreme noise and broader applications.
We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024^2. We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CelebA dataset.
Long short-term memory (LSTM) is normally used in recurrent neural network (RNN) as basic recurrent unit. However,conventional LSTM assumes that the state at current time step depends on previous time step. This assumption constraints the time dependency modeling capability. In this study, we propose a new variation of LSTM, advanced LSTM (A-LSTM), for better temporal context modeling. We employ A-LSTM in weighted pooling RNN for emotion recognition. The A-LSTM outperforms the conventional LSTM by 5.5% relatively. The A-LSTM based weighted pooling RNN can also complement the state-of-the-art emotion classification framework. This shows the advantage of A-LSTM.
We propose a new statistical model suitable for machine learning tasks of systems with long distance correlations such as human languages. The model is based on directed acyclic graph decorated by multi-linear tensor maps in the vertices and vector spaces in the edges, called tensor network. Such tensor networks have been previously employed for effective numerical computation of the renormalization group flow on the space of effective quantum field theories and lattice models of statistical mechanics. We provide explicit algebro-geometric analysis of the parameter moduli space for tree graphs, discuss model properties and applications such as statistical translation.
In this paper we develop new methods for estimating causal effects in settings with panel data, where a subset of units are exposed to a treatment during a subset of periods, and the goal is estimating counterfactual (untreated) outcomes for the treated unit/period combinations. We develop a class of estimators that uses the observed elements of the matrix of control outcomes corresponding to untreated unit/periods to predict the ‘missing’ elements of the matrix, corresponding to treated units/periods. The approach estimates a matrix that well-approximates the original (incomplete) matrix, but has lower complexity according to a matrix norm, where we consider the family of Schatten norms based on the singular values of the matrix. The proposed methods have attractive computational properties. From a technical perspective, we generalize results from the matrix completion literature by allowing the patterns of missing data to have a time series dependency structure. We also present new insights concerning the connections between the interactive fixed effects models and the literatures on program evaluation under unconfoundedness as well as on synthetic control methods. If there are few time periods and many units, our method approximates a regression approach where counterfactual outcomes are estimated through a regression of current outcomes on lagged outcomes for the same unit. In contrast, if there are few units and many periods, our proposed method approximates a synthetic control estimator where counterfactual outcomes are estimated through a regression of the lagged outcomes for the treated unit on lagged outcomes for the control units. The advantage of our proposed method is that it moves seamlessly between these two different approaches, utilizing both cross-sectional and within-unit patterns in the data.