Smoothing the parameter of IBM word alignment models: the framework and its learning approaches

IBM models are very popular word alignment models in Machine Translation. They play critical roles in the systems of this field. These models follow Maximum Likelihood principle to estimate their parameters. However, in many case, the models will be too fit the training data that may result in wrong word alignments on testing data. Smoothing is a popular solution to the overfitting problem when the causes are rare events. While this technique is very common in Language Model which is another problem in Machine Translation, there is still lack of studies for the problem of word alignment. \cite{moore2004improving} reported a study on a simple method of additive smoothing, in which the amount to add is learnt from annotated data. This basic technique gives a significant improvement over the unsmoothed version. With such a good motivation, in this paper, we propose a more general framework by varying the amount to add rather than adding only a constant amount as the original additive smoothing. In term of learning method, we also experience a method to learn the parameter of smoothing from unannotated data with a deep analysis and comparision between different learning methods.

Improved Relation Classification by Deep Recurrent Neural Networks with Data Augmentation

Nowadays, neural networks play an important role in the task of relation classification. By designing different neural architectures, researchers have improved the performance to a large extent, compared with traditional methods. However, existing neural networks for relation classification are usually of shallow architectures (e.g., one-layer convolution neural networks or recurrent networks). They may fail to explore the potential representation space in different abstraction levels. In this paper, we propose deep recurrent neural networks (DRNNs) to tackle this challenge. Further, we propose a data augmentation method by leveraging the directionality of relations. We evaluate our DRNNs on the SemEval-2010 Task 8, and achieve an $F_1$-score of 85.81%, outperforming state-of-the-art recorded results.

Point-Shift Foliation of a Point Process

A point-shift $F$ maps each point of a point process $\Phi$ to some point of $\Phi$. For all translation invariant point-shifts $F$, the $F$-foliation of $\Phi$ is a partition of the support of $\Phi$ which is the discrete analogue of the stable manifold of $F$ on $\Phi$. It is first shown that foliations lead to a classification of the behavior of point-shifts on point processes. Both qualitative and quantitative properties of foliations are then established. It is shown that for all point-shifts $F$, there exists a point-shift $F_\bot$, the orbits of which are the $F$-foils of $\Phi$, and which are measure-preserving. The foils are not always stationary point processes. Nevertheless, they admit relative intensities with respect to one another.

On the number of union-free families

A family of sets is union-free if there are no three distinct sets in the family such that the union of two of the sets is equal to the third set. Kleitman proved that every union-free family has size at most $(1+o(1))\binom{n}{n/2}$. Later, Burosch–Demetrovics-Katona-Kleitman-Sapozhenko asked for the number $\alpha(n)$ of such families, and they proved that $2^{\binom{n}{n/2}}\leq \alpha(n) \leq 2^{2\sqrt{2}\binom{n}{n/2}(1+o(1))}$. They conjectured that the constant $2\sqrt{2}$ can be removed in the exponent of the right hand side. We prove their conjecture by formulating a new container-type theorem for rooted hypergraphs.

Smooth Principal Component Analysis over two-dimensional manifolds with an application to Neuroimaging

Motivated by the analysis of high-dimensional neuroimaging signals located over the cortical surface, we introduce a novel Principal Component Analysis technique that can handle functional data located over a two-dimensional manifold. For this purpose a regularization approach is adopted, introducing a smoothing penalty coherent with the geodesic distance over the manifold. The model introduced can be applied to any manifold topology, can naturally handle missing data and functional samples evaluated in different grids of points. We approach the discretization task by means of finite element analysis and propose an efficient iterative algorithm for its resolution. We compare the performances of the proposed algorithm with other approaches classically adopted in literature. We finally apply the proposed algorithm to resting state functional magnetic resonance imaging data, showing that a cross-validation approach justifies the presence of the penalization term in the analysis of human brain connectome data.

Arbitrary Overlap Constraints in Graph Packing Problems

In earlier versions of the community discovering problem, the overlap between communities was restricted by a simple count upper-bound [17,5,11,8]. In this paper, we introduce the $\Pi$-Packing with $\alpha()$-Overlap problem to allow for more complex constraints in the overlap region than those previously studied. Let $\mathcal{V}^r$ be all possible subsets of vertices of $V(G)$ each of size at most $r$, and $\alpha: \mathcal{V}^r \times \mathcal{V}^r \to \{0,1\}$ be a function. The $\Pi$-Packing with $\alpha()$-Overlap problem seeks at least $k$ induced subgraphs in a graph $G$ subject to: (i) each subgraph has at most $r$ vertices and obeys a property $\Pi$, and (ii) for any pair $H_i,H_j$, with $i\neq j$, $\alpha(H_i, H_j) = 0$ (i.e., $H_i,H_j$ do not conflict). We also consider a variant that arises in clustering applications: each subgraph of a solution must contain a set of vertices from a given collection of sets $\mathcal{C}$, and no pair of subgraphs may share vertices from the sets of $\mathcal{C}$. In addition, we propose similar formulations for packing hypergraphs. We give an $O(r^{rk} k^{(r+1)k} n^{cr})$ algorithm for our problems where $k$ is the parameter and $c$ and $r$ are constants, provided that: i) $\Pi$ is computable in polynomial time in $n$ and ii) the function $\alpha()$ satisfies specific conditions. Specifically, $\alpha()$ is hereditary, applicable only to overlapping subgraphs, and computable in polynomial time in $n$. Motivated by practical applications we give several examples of $\alpha()$ functions which meet those conditions.

Eigenvectors of random matrices: A survey

Eigenvectors of large matrices (and graphs) play an essential role in combinatorics and theoretical computer science. The goal of this survey is to provide an up-to-date account on properties of eigenvectors when the matrix (or graph) is random.