Most state-of-the-art neural machine translation systems, despite being different in architectural skeletons (e.g. recurrence, convolutional), share an indispensable feature: the Attention. However, most existing attention methods are token-based and ignore the importance of phrasal alignments, the key ingredient for the success of phrase-based statistical machine translation. In this paper, we propose novel phrase-based attention methods to model n-grams of tokens as attention entities. We incorporate our phrase-based attentions into the recently proposed Transformer network, and demonstrate that our approach yields improvements of 1.3 BLEU for English-to-German and 0.5 BLEU for German-to-English translation tasks on WMT newstest2014 using WMT’16 training data.
The task of event detection involves identifying and categorizing event triggers. Contextual information has been shown effective on the task. However, existing methods which utilize contextual information only process the context once. We argue that the context can be better exploited by processing the context multiple times, allowing the model to perform complex reasoning and to generate better context representation, thus improving the overall performance. Meanwhile, dynamic memory network (DMN) has demonstrated promising capability in capturing contextual information and has been applied successfully to various tasks. In light of the multi-hop mechanism of the DMN to model the context, we propose the trigger detection dynamic memory network (TD-DMN) to tackle the event detection problem. We performed a five-fold cross-validation on the ACE-2005 dataset and experimental results show that the multi-hop mechanism does improve the performance and the proposed model achieves best $F_1$ score compared to the state-of-the-art methods.
We explore active learning (AL) utterance selection for improving the accuracy of new underrepresented domains in a natural language understanding (NLU) system. Moreover, we propose an AL algorithm called Majority-CRF that uses an ensemble of classification and sequence labeling models to guide utterance selection for annotation. Experiments with three domains show that Majority-CRF achieves 6.6%-9% relative error rate reduction compared to random sampling with the same annotation budget, and statistically significant improvements compared to other AL approaches. Additionally, case studies with human-in-the-loop AL on six new domains show 4.6%-9% improvement on an existing NLU system.
Anomaly detection is an important step in the management and monitoring of data centers and cloud computing platforms. The ability to detect anomalous virtual machines before real failures occur results in reduced downtime while operations engineers urgently recover malfunctioning virtual machines, efficient root cause analysis, and improved customer optics in the event said malfunction lead to an outage. Virtual machines could fail at any time, whether in a lab or production system. If there is no anomaly detection system, and a virtual machine in a lab environment fails, the QA and DEV team will have to switch to another environment while the OPS team fixes the failure. The potential impact of failing to detect anomalous virtual machines can result in financial ramifications, both when developing new features and servicing existing ones. This paper presents a model that can efficiently detect anomalous virtual machines both in production and testing environments.
Cooperation information sharing is important to theories of human learning and has potential implications for machine learning. Prior work derived conditions for achieving optimal Cooperative Inference given strong, relatively restrictive assumptions. We relax these assumptions by demonstrating convergence for any discrete joint distribution, robustness through equivalence classes and stability under perturbation, and effectiveness by deriving bounds from structural properties of the original joint distribution. We provide geometric interpretations, connections to and implications for optimal transport, and connections to importance sampling, and conclude by outlining open questions and challenges to realizing the promise of Cooperative Inference.
Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters. Appropriately scheduling the optimization of a task objective or a set of parameters is usually crucial to the quality of convergence. In this paper, we present AutoLoss, a meta-learning framework that automatically learns and determines the optimization schedule. AutoLoss provides a generic way to represent and learn the discrete optimization schedule from metadata, allows for a dynamic and data-driven schedule in ML problems that involve alternating updates of different parameters or from different loss objectives. We apply AutoLoss on four ML tasks: d-ary quadratic regression, classification using a multi-layer perceptron (MLP), image generation using GANs, and multi-task neural machine translation (NMT). We show that the AutoLoss controller is able to capture the distribution of better optimization schedules that result in higher quality of convergence on all four tasks. The trained AutoLoss controller is generalizable — it can guide and improve the learning of a new task model with different specifications, or on different datasets.
In the era of deep learning, understanding over-fitting phenomenon becomes increasingly important. It is observed that carefully designed deep neural networks achieve small testing error even the training error is close to zero. One possible explanation is that for many modern machine learning algorithms, over-fitting can greatly reduces the estimation bias, while not increases the estimation variance too much. To illustrate the above idea, we prove that our interpolated nearest neighbors algorithm achieves the minimax optimal rate in both regression and classification regimes, and observe that they are empirically better than the traditional k nearest neighbor method in some cases.
Today we see the use of the Internet of Things (IoT) in various application domains such as healthcare, smart homes, smart cars, and smart-x applications in smart cities. The number of applications based on IoT and cloud computing is projected to increase rapidly over the next few years. IoT-based services must meet the guaranteed levels of quality of service (QoS) to match users’ expectations. Ensuring QoS through specifying the QoS constraints using Service Level Agreements (SLAs) is crucial. Therefore, as a first step toward SLA management, it is essential to provide an SLA specification in a machine-readable format. In this paper, we demonstrate a toolkit for creating SLA specifications for IoT applications. The toolkit is used to simplify the process of capturing the requirements of IoT applications. We present a demonstration of the toolkit using a Remote Health Monitoring Service (RHMS) usecase. The toolkit supports the following: (1) specifying the Service-Level Objectives (SLO) of an IoT application at the application level; (2) specifying the workflow activities of the IoT application; (3) mapping each activity to the required software and hardware resources and specifying the constraints of SLOs and other configuration- related metrics of the required hardware and software; and (4) creating the composed SLA in JSON format.
Supervised dimension reduction for time series is challenging as there may be temporal dependence between the response $y$ and the predictors $\boldsymbol x$. Recently a time series version of sliced inverse regression, TSIR, was suggested, which applies approximate joint diagonalization of several supervised lagged covariance matrices to consider the temporal nature of the data. In this paper we develop this concept further and propose a time series version of sliced average variance estimation, TSAVE. As both TSIR and TSAVE have their own advantages and disadvantages, we consider furthermore a hybrid version of TSIR and TSAVE. Based on examples and simulations we demonstrate and evaluate the differences between the three methods and show also that they are superior to apply their iid counterparts to when also using lagged values of the explaining variables as predictors.
Recruitment of appropriate people for certain positions is critical for any companies or organizations. Manually screening to select appropriate candidates from large amounts of resumes can be exhausted and time-consuming. However, there is no public tool that can be directly used for automatic resume quality assessment (RQA). This motivates us to develop a method for automatic RQA. Since there is also no public dataset for model training and evaluation, we build a dataset for RQA by collecting around 10K resumes, which are provided by a private resume management company. By investigating the dataset, we identify some factors or features that could be useful to discriminate good resumes from bad ones, e.g., the consistency between different parts of a resume. Then a neural-network model is designed to predict the quality of each resume, where some text processing techniques are incorporated. To deal with the label deficiency issue in the dataset, we propose several variants of the model by either utilizing the pair/triplet-based loss, or introducing some semi-supervised learning technique to make use of the abundant unlabeled data. Both the presented baseline model and its variants are general and easy to implement. Various popular criteria including the receiver operating characteristic (ROC) curve, F-measure and ranking-based average precision (AP) are adopted for model evaluation. We compare the different variants with our baseline model. Since there is no public algorithm for RQA, we further compare our results with those obtained from a website that can score a resume. Experimental results in terms of different criteria demonstrate the effectiveness of the proposed method. We foresee that our approach would transform the way of future human resources management.
We propose a new recurrent generative model for generating images from text captions while attending on specific parts of text captions. Our model creates images by incrementally adding patches on a ‘canvas’ while attending on words from text caption at each timestep. Finally, the canvas is passed through an upscaling network to generate images. We also introduce a new method for generating visual-semantic sentence embeddings based on self-attention over text. We compare our model’s generated images with those generated Reed et. al.’s model and show that our model is a stronger baseline for text to image generation tasks.
This work is done as part of a master’s thesis project. The goal is to integrate two or more ontologies (of the same or close domains) in a new consistent and coherent OWL ontology to insure semantic interoperability between them. To do this, we have chosen to create a bridge ontology that includes all source ontologies and their bridging axioms in a customized way. In addition, we introduced a new criterion for obtaining an ontology of better quality (having the minimum of semantic/logical conflicts). We have also proposed new terminology and definitions that clarify the unclear and misplaced ‘integration’ and ‘merging’ notions that are randomly used in state-of-the-art works. Finally, we tested and evaluated our OIA2R tool using ontologies and reference alignments of the OAEI campaign. It turned out that it is generic, efficient and powerful enough.
Unsupervised learning makes manifest the underlying structure of data without curated training and specific problem definitions. However, the inference of relationships between data points is frustrated by the `curse of dimensionality’ in high-dimensions. Inspired by replica theory from statistical mechanics, we consider replicas of the system to tune the dimensionality and take the limit as the number of replicas goes to zero. The result is the intensive embedding, which is not only isometric (preserving local distances) but allows global structure to be more transparently visualized. We develop the Intensive Principal Component Analysis (InPCA) and demonstrate clear improvements in visualizations of the Ising model of magnetic spins, a neural network, and the dark energy cold dark matter ({\Lambda}CDM) model as applied to the Cosmic Microwave Background.
We present a framework for generating natural language description from structured data such as tables. Motivated by the need to approach this problem in a manner that is scalable and easily adaptable to newer domains, unlike existing related systems, our system does not require parallel data; it rather relies on monolingual corpora and basic NLP tools which are easily accessible. The system employs a 3-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain dataset curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other data types such as Knowledge-Graphs and Key-Value dictionaries.
We study the problem of learning conditional average treatment effects (CATE) from observational data with unobserved confounders. The CATE function maps baseline covariates to individual causal effect predictions and is key for personalized assessments. Recent work has focused on how to learn CATE under unconfoundedness, i.e., when there are no unobserved confounders. Since CATE may not be identified when unconfoundedness is violated, we develop a functional interval estimator that predicts bounds on the individual causal effects under realistic violations of unconfoundedness. Our estimator takes the form of a weighted kernel estimator with weights that vary adversarially. We prove that our estimator is sharp in that it converges exactly to the tightest bounds possible on CATE when there may be unobserved confounders. Further, we study personalized decision rules derived from our estimator and prove that they achieve optimal minimax regret asymptotically. We assess our approach in a simulation study as well as demonstrate its application in the case of hormone replacement therapy by comparing conclusions from a real observational study and clinical trial.
Text mining and analytics software has become popular, but little attention has been paid to the software architectures of such systems. Often they are built from scratch using special-purpose software and data structures, which increases their cost and complexity. This demo paper describes Sifaka, a new open-source text mining application constructed above a standard search engine index using existing application programmer interface (API) calls. Indexing integrates popular annotation software libraries to augment the full-text index with noun phrase and named-entities; n-grams are also provided. Sifaka enables a person to quickly explore and analyze large text collections using search, frequency analysis, and co-occurrence analysis; and import existing document labels or interactively construct document sets that are positive or negative examples of new concepts, perform feature selection, and export feature vectors compatible with popular machine learning software. Sifaka demonstrates that search engines are good platforms for text mining applications while also making common IR text mining capabilities accessible to researchers in disciplines where programming skills are less common.
This text discusses several explanatory methods that go beyond the error measurements and plots traditionally used to assess machine learning models. Some of the methods are tools of the trade while others are rigorously derived and backed by long-standing theory. The methods, decision tree surrogate models, individual conditional expectation (ICE) plots, local interpretable model-agnostic explanations (LIME), partial dependence plots, and Shapley explanations, vary in terms of scope, fidelity, and suitable application domain. Along with descriptions of these methods, this text presents real-world usage recommendations supported by a use case and in-depth software examples.
In many domains, there is significant interest in capturing novel relationships between time series that represent activities recorded at different nodes of a highly complex system. In this paper, we introduce multipoles, a novel class of linear relationships between more than two time series. A multipole is a set of time series that have strong linear dependence among themselves, with the requirement that each time series makes a significant contribution to the linear dependence. We demonstrate that most interesting multipoles can be identified as cliques of negative correlations in a correlation network. Such cliques are typically rare in a real-world correlation network, which allows us to find almost all multipoles efficiently using a clique-enumeration approach. Using our proposed framework, we demonstrate the utility of multipoles in discovering new physical phenomena in two scientific domains: climate science and neuroscience. In particular, we discovered several multipole relationships that are reproducible in multiple other independent datasets and lead to novel domain insights.
At present, the state-of-the-art computational models across a range of sequential data processing tasks, including language modeling, are based on recurrent neural network architectures. This paper begins with the observation that most research on developing computational models capable of processing sequential data fails to explicitly analyze the long distance dependencies (LDDs) within the datasets the models process. In this context, in this paper, we make five research contributions. First, we argue that a key step in modeling sequential data is to understand the characteristics of the LDDs within the data. Second, we present a method to compute and analyze the LDD characteristics of any sequential dataset, and demonstrate this method on a number of sequential datasets that are frequently used for model benchmarking. Third, based on the analysis of the LDD characteristics within the benchmarking datasets, we observe that LDDs are far more complex than previously assumed, and depend on at least four factors: (i) the number of unique symbols in a dataset, (ii) size of the dataset, (iii) the number of interacting symbols within an LDD, and (iv) the distance between the interacting symbols. Fourth, we verify these factors by using synthetic datasets generated using Strictly k-Piecewise (SPk) languages. We then demonstrate how SPk languages can be used to generate benchmarking datasets with varying degrees of LDDs. The advantage of these synthesized datasets being that they enable the targeted testing of recurrent neural architectures. Finally, we demonstrate how understanding the characteristics of the LDDs in a dataset can inform better hyper-parameter selection for current state-of-the-art recurrent neural architectures and also aid in understanding them…
In this paper, we focus on approaches to parallelizing stochastic gradient descent (SGD) wherein data is farmed out to a set of workers, the results of which, after a number of updates, are then combined at a central master node. Although such synchronized SGD approaches parallelize well in idealized computing environments, they often fail to realize their promised computational acceleration in practical settings. One cause is slow workers, termed stragglers, who can cause the fusion step at the master node to stall, which greatly slowing convergence. In many straggler mitigation approaches work completed by these nodes, while only partial, is discarded completely. In this paper, we propose an approach to parallelizing synchronous SGD that exploits the work completed by all workers. The central idea is to fix the computation time of each worker and then to combine distinct contributions of all workers. We provide a convergence analysis and optimize the combination function. Our numerical results demonstrate an improvement of several factors of magnitude in comparison to existing methods.
Time series data that are not measured at regular intervals are commonly discretized as a preprocessing step. For example, data about customer arrival times might be simplified by summing the number of arrivals within hourly intervals, which produces a discrete-time time series that is easier to model. In this abstract, we show that discretization introduces a bias that affects models trained for decision-making. We refer to this phenomenon as discretization bias, and show that we can avoid it by using continuous-time models instead.
We consider a longitudinal data structure consisting of baseline covariates, time-varying treatment variables, intermediate time-dependent covariates, and a possibly time dependent outcome. Previous studies have shown that estimating the variance of asymptotically linear estimators using empirical influence functions in this setting result in anti-conservative estimates with increasing magnitudes of positivity violations, leading to poor coverage and uncontrolled Type I errors. In this paper, we present two alternative approaches of estimating the variance of these estimators: (i) a robust approach which directly targets the variance of the influence function as a counterfactual mean outcome, and (ii) a non-parametric bootstrap based approach that is theoretically valid and lowers the computational cost, thereby increasing the feasibility in non-parametric settings using complex machine learning algorithms. The performance of these approaches are compared to that of the empirical influence function in simulations across different levels of positivity violations and treatment effect sizes.
The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow. To improve the checkpointing performance, we design and implement a burst buffer. We find that increasing the number of threads increases TensorFlow bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use of the tensorFlow prefetcher results in a complete overlap of computation on accelerator and input pipeline on CPU eliminating the effective cost of I/O on the overall performance. The use of a burst buffer to checkpoint to a fast small capacity storage and copy asynchronously the checkpoints to a slower large capacity storage resulted in a performance improvement of 2.6x with respect to checkpointing directly to slower storage on our benchmark environment.
We present the first PAC optimal algorithm for Bayes-Adaptive Markov Decision Processes (BAMDPs) in continuous state and action spaces, to the best of our knowledge. The BAMDP framework elegantly addresses model uncertainty by incorporating Bayesian belief updates into long-term expected return. However, computing an exact optimal Bayesian policy is intractable. Our key insight is to compute a near-optimal value function by covering the continuous state-belief-action space with a finite set of representative samples and exploiting the Lipschitz continuity of the value function. We prove the near-optimality of our algorithm and analyze a number of schemes that boost the algorithm’s efficiency. Finally, we empirically validate our approach on a number of discrete and continuous BAMDPs and show that the learned policy has consistently competitive performance against baseline approaches.
We propose deep convolutional Gaussian processes, a deep Gaussian process architecture with convolutional structure. The model is a principled Bayesian framework for detecting hierarchical combinations of local features for image classification. We demonstrate greatly improved image classification performance compared to current Gaussian process approaches on the MNIST and CIFAR-10 datasets. In particular, we improve CIFAR-10 accuracy by over 10 percentage points.
One of the most notable contributions of deep learning is the application of convolutional neural networks (ConvNets) to structured signal classification, and in particular image classification. Beyond their impressive performances in supervised learning, the structure of such networks inspired the development of deep filter banks referred to as scattering transforms. These transforms apply a cascade of wavelet transforms and complex modulus operators to extract features that are invariant to group operations and stable to deformations. Furthermore, ConvNets inspired recent advances in geometric deep learning, which aim to generalize these networks to graph data by applying notions from graph signal processing to learn deep graph filter cascades. We further advance these lines of research by proposing a geometric scattering transform using graph wavelets defined in terms of random walks on the graph. We demonstrate the utility of features extracted with this designed deep filter bank in graph classification, and show its competitive performance relative to other methods, including graph kernel methods and geometric deep learning ones, on both social and biochemistry data.
Finding neighbourhood structures is very useful in extracting valuable relationships among data samples. This paper presents a survey of recent neighbourhood construction algorithms for pattern clustering and classifying data points. Extracting neighbourhoods and connections among the points is extremely useful for clustering and classifying the data. Many applications such as detecting social network communities, bundling related edges, and solving location and routing problems all indicate the usefulness of this problem. Finding data point neighbourhood in data mining and pattern recognition should generally improve knowledge extraction from databases. Several algorithms of data point neighbourhood construction have been proposed to analyse the data in this sense. They will be described and discussed from different aspects in this paper. Finally, the future challenges concerning the title of the present paper will be outlined.
The importance of an efficient and scalable document similarity detection system is undeniable nowadays. Search engines need batch text similarity measures to detect duplicated and near-duplicated web pages in their indexes in order to prevent indexing a web page multiple times. Furthermore, in the scoring phase, search engines need similarity measures to detect duplicated contents on web pages so as to increase the quality of their results. In this paper, a new approach to batch text similarity detection is proposed by combining some ideas from dimensionality reduction techniques and information gain theory. The new approach is focused on search engines need to detect duplicated and near-duplicated web pages. The new approach is evaluated on the NEWS20 dataset and the results show that the new approach is faster than the cosine text similarity algorithm in terms of speed and performance. On top of that, It is faster and more accurate than the other rival method, Simhash similarity algorithm.
One of the important factors that make a search engine fast and accurate is a concise and duplicate free index. In order to remove duplicate and near-duplicate documents from the index, a search engine needs a swift and reliable duplicate and near-duplicate text document detection system. Traditional approaches to this problem, such as brute force comparisons or simple hash-based algorithms are not suitable as they are not scalable and are not capable of detecting near-duplicate documents effectively. In this paper, a new signature-based approach to text similarity detection is introduced which is fast, scalable, reliable and needs less storage space. The proposed method is examined on popular text document data-sets such as CiteseerX, Enron, Gold Set of Near-duplicate News Articles and etc. The results are promising and comparable with the best cutting-edge algorithms, considering the accuracy and performance. The proposed method is based on the idea of using reference texts to generate signatures for text documents. The novelty of this paper is the use of genetic algorithms to generate better reference texts.
This paper proposes an accelerated proximal stochastic variance reduced gradient (ASVRG) method, in which we design a simple and effective momentum acceleration trick. Unlike most existing accelerated stochastic variance reduction methods such as Katyusha, ASVRG has only one additional variable and one momentum parameter. Thus, ASVRG is much simpler than those methods, and has much lower per-iteration complexity. We prove that ASVRG achieves the best known oracle complexities for both strongly convex and non-strongly convex objectives. In addition, we extend ASVRG to mini-batch and non-smooth settings. We also empirically verify our theoretical results and show that the performance of ASVRG is comparable with, and sometimes even better than that of the state-of-the-art stochastic methods.
Within the statistical and machine learning literature, regularization techniques are often used to construct sparse (predictive) models. Most regularization strategies only work for data where all predictors are of the same type, such as Lasso regression for continuous predictors. However, many predictive problems involve different predictor types. We propose a multi-type Lasso penalty that acts on the objective function as a sum of subpenalties, one for each predictor type. As such, we perform predictor selection and level fusion within a predictor in a data-driven way, simultaneous with the parameter estimation process. We develop a new estimation strategy for convex predictive models with this multi-type penalty. Using the theory of proximal operators, our estimation procedure is computationally efficient, partitioning the overall optimization problem into easier to solve subproblems, specific for each predictor type and its associated penalty. The proposed SMuRF algorithm improves on existing solvers in both accuracy and computational efficiency. This is demonstrated with an extensive simulation study and the analysis of a case-study on insurance pricing analytics.
A typical predictive regression employs a multitude of potential regressors with various degrees of persistence while their signal strength in explaining the dependent variable is often low. Variable selection in such context is of great importance. In this paper, we explore the pitfalls and possibilities of the LASSO methods in this predictive regression A typical predictive regression employs a multitude of potential regressors with various degrees of persistence while their signal strength in explaining the dependent variable is often low. Variable selection in such context is of great importance. In this paper, we explore the pitfalls and possibilities of the LASSO methods in this predictive regression framework with mixed degrees of persistence. With the presence of stationary, unit root and cointegrated predictors, we show that the adaptive LASSO maintains the consistent variable selection and the oracle property due to its penalty scheme that accommodates the system of regressors. On the contrary, conventional LASSO does not have this desirable feature as the penalty its imposed according to the marginal behavior of each individual regressor. We demonstrate this theoretical property via extensive Monte Carlo simulations, and evaluate its empirical performance for short- and long-horizon stock return predictability.
Classifying human cognitive states from behavioral and physiological signals is a challenging problem with important applications in robotics. The problem is challenging due to the data variability among individual users, and sensor artefacts. In this work, we propose an end-to-end framework for real-time cognitive workload classification with mixture Hyper Long Short Term Memory Networks, a novel variant of HyperNetworks. Evaluating the proposed approach on an eye-gaze pattern dataset collected from simulated driving scenarios of different cognitive demands, we show that the proposed framework outperforms previous baseline methods and achieves 83.9\% precision and 87.8\% recall during test. We also demonstrate the merit of our proposed architecture by showing improved performance over other LSTM-based methods.
In the conventional cloud service model, computing resources are allocated for tenants on a pay-per-use basis. However, the performance of applications that communicate inside this network is unpredictable because network resources are not guaranteed. To mitigate this issue, the virtual cluster (VC) model has been developed in which network and compute units are guaranteed. Thereon, many algorithms have been developed that are based on novel extensions of the VC model in order to solve the online virtual cluster embedding problem (VCE) with additional parameters. In the online VCE, the resource footprint is greedily minimized per request which is connected with maximizing the profit for the provider per request. However, this does not imply that a global maximization of the profit over the whole sequence of requests is guaranteed. In fact, these algorithms do not even provide a worst case guarantee on a fraction of the maximum achievable profit of a certain sequence of requests. Thus, these online algorithms do not provide a competitive ratio on the profit. In this thesis, two competitive online VCE algorithms and two heuristic algorithms are presented. The competitive online VCE algorithms have different competitive ratios on the objective function and the capacity constraints whereas the heuristic algorithms do not violate the capacity constraints. The worst case competitive ratios are analyzed. After that, the evaluation shows the advantages and disadvantages of these algorithms in several scenarios with different request patterns and profit metrics on the fat-tree and MDCube datacenter topologies. The results show that for different scenarios, different algorithms have the best performance with respect to certain metrics.
Accurately and efficiently crowdsourcing complex, open-ended tasks can be difficult, as crowd participants tend to favor short, repetitive ‘microtasks’. We study the crowdsourcing of large networks where the crowd provides the network topology via microtasks. Crowds can explore many types of social and information networks, but we focus on the network of causal attributions, an important network that signifies cause-and-effect relationships. We conduct experiments on Amazon Mechanical Turk (AMT) testing how workers propose and validate individual causal relationships and introduce a method for independent crowd workers to explore large networks. The core of the method, Iterative Pathway Refinement, is a theoretically-principled mechanism for efficient exploration via microtasks. We evaluate the method using synthetic networks and apply it on AMT to extract a large-scale causal attribution network, then investigate the structure of this network as well as the activity patterns and efficiency of the workers who constructed this network. Worker interactions reveal important characteristics of causal perception and the network data they generate can improve our understanding of causality and causal inference.