Yearly Archives: 2015

PENCIL: a Platform-Neutral Compute Intermediate Language for Accelerator Programming

PENCIL: a Platform-Neutral Compute Intermediate Language for Accelerator Programming

  • Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser, Michael Kruse, Chandan Reddy, Sven Verdoolaege, Mohammed Javed Absar, Sven Van Haastregt, Alexey Kravets, Anton Lokhmotov, Robert David, Elnar Hajiyev, Adam Betts, Alastair Donaldson, and Jeroen Ketema. PENCIL: a Platform-Neutral Compute Intermediate Language for Accelerator Programming. In Proceedings of The 24th International Conference on Parallel Architectures and Compilation Techniques (PACT 2015), pages 138-149, San Francisco, California, USA, 2015. doi:10.1109/PACT.2015.17
    [BibTeX] [Abstract] [Download PDF]

    Programming accelerators such as GPUs with low-level APIs and languages such as OpenCL and CUDA is difficult, error-prone, and not performance-portable. Automatic parallelization and domain specific languages (DSLs) have been proposed to hide complexity and regain performance portability. We present PENCIL, a rigorously-defined subset of GNU C99—enriched with additional language constructs—that enables compilers to exploit parallelism and produce highly optimized code when targeting accelerators. PENCIL aims to serve both as a portable implementation language for libraries, and as a target language for DSL compilers. We implemented a PENCIL-to-OpenCL backend using a state-of-the-art polyhedral compiler. The polyhedral compiler, extended to handle data-dependent control flow and non-affine array accesses, generates optimized OpenCL code. To demonstrate the potential and performance portability of PENCIL and the PENCIL-to-OpenCL compiler, we consider a number of image processing kernels, a set of benchmarks from the Rodinia and SHOC suites, and DSL embedding scenarios for linear algebra (BLAS) and signal processing radar applications (SpearDE), and present experimental results for four GPU platforms: AMD Radeon HD 5670 and R9 285, NVIDIA GTX 470, and ARM Mali-T604.

    @InProceedings{2015-10-BAGHDADI,
    author = {Riyadh Baghdadi and Ulysse Beaugnon and Albert Cohen and Tobias Grosser and Michael Kruse and Chandan Reddy and Sven Verdoolaege and Mohammed Javed Absar and Sven Van Haastregt and Alexey Kravets and Anton Lokhmotov and Robert David and Elnar Hajiyev and Adam Betts and Alastair Donaldson and Jeroen Ketema},
    title = {{PENCIL: a Platform-Neutral Compute Intermediate Language for Accelerator Programming}},
    booktitle = {{Proceedings of The 24th International Conference on Parallel Architectures and Compilation Techniques (PACT 2015)}},
    date = {2015-10-18/2015-10-21},
    address = {San Francisco, California, USA},
    url = {http://www.ketema.eu/publ/pencil.pdf},
    abstract = {Programming accelerators such as GPUs with low-level APIs and languages such as OpenCL and CUDA is difficult, error-prone, and not performance-portable. Automatic parallelization and domain specific languages (DSLs) have been proposed to hide complexity and regain performance portability. We present PENCIL, a rigorously-defined subset of GNU C99—enriched with additional language constructs—that enables compilers to exploit parallelism and produce highly optimized code when targeting accelerators. PENCIL aims to serve both as a portable implementation language for libraries, and as a target language for DSL compilers.
    We implemented a PENCIL-to-OpenCL backend using a state-of-the-art polyhedral compiler. The polyhedral compiler, extended to handle data-dependent control flow and non-affine array accesses, generates optimized OpenCL code. To demonstrate the potential and performance portability of PENCIL and the PENCIL-to-OpenCL compiler, we consider a number of image processing kernels, a set of benchmarks from the Rodinia and SHOC suites, and DSL embedding scenarios for linear algebra (BLAS) and signal processing radar applications (SpearDE), and present experimental results for four GPU platforms: AMD Radeon HD 5670 and R9 285, NVIDIA GTX 470, and ARM Mali-T604.},
    doi = {10.1109/PACT.2015.17},
    pages = {138-149},
    year = {2015}
    }

Posted in Dissemination | Leave a comment
Using Transactional Memory to Avoid Blocking in OpenMP Synchronization Directives

Using Transactional Memory to Avoid Blocking in OpenMP Synchronization Directives

  • Lars Bonnichsen and Artur Podobas. Using Transactional Memory to Avoid Blocking in OpenMP Synchronization Directives. In Christian Terboven, Bronis R. de Supinski, Pablo Reble, Barbara M. Chapman, and Matthias S. Müller, editors, OpenMP: Heterogenous Execution and Data Movements, Proceedings of the 11th International Workshop on OpenMP (IWOMP), pages 149-161, Aachen, Germany, 2015. Springer. doi:10.1007/978-3-319-24595-9_11
    [BibTeX] [Abstract]

    OpenMP applications with abundant parallelism are often characterized by their high-performance. Unfortunately, OpenMP applications with a lot of synchronization or serialization-points perform poorly because of blocking, i.e. the threads have to wait for each other. In this paper, we present methods based on hardware transactional memory (HTM) for executing OpenMP barrier, critical, and taskwait directives without blocking. Although HTM is still relatively new in the Intel and IBM architectures, we experimentally show a 73 % performance improvement over traditional locking approaches, and 23 % better than other HTM approaches on critical sections. Speculation over barriers can decrease execution time by up-to 41 %. We expect that future systems with HTM support and more cores will have a greater benefit from our approach as they are more likely to block.

    @InProceedings{2015-10-BONNICHSEN,
    author = {Lars Bonnichsen and Artur Podobas},
    editor = {Christian Terboven and Bronis R. de Supinski and Pablo Reble and Barbara M. Chapman and Matthias S. M{\"u}ller},
    title = {{Using Transactional Memory to Avoid Blocking in OpenMP Synchronization Directives}},
    booktitle = {{OpenMP: Heterogenous Execution and Data Movements, Proceedings of the 11th International Workshop on OpenMP (IWOMP)}},
    date = {2015-10-01/2015-10-02},
    year = {2015},
    publisher = {Springer},
    address = {Aachen, Germany},
    pages = {149-161},
    doi = {10.1007/978-3-319-24595-9_11},
    abstract = {OpenMP applications with abundant parallelism are often characterized by their high-performance. Unfortunately, OpenMP applications with a lot of synchronization or serialization-points perform poorly because of blocking, i.e. the threads have to wait for each other. In this paper, we present methods based on hardware transactional memory (HTM) for executing OpenMP barrier, critical, and taskwait directives without blocking. Although HTM is still relatively new in the Intel and IBM architectures, we experimentally show a 73 % performance improvement over traditional locking approaches, and 23 % better than other HTM approaches on critical sections. Speculation over barriers can decrease execution time by up-to 41 %. We expect that future systems with HTM support and more cores will have a greater benefit from our approach as they are more likely to block.}
    }

Posted in Dissemination | Leave a comment
Distributed Vision-Based Flying Cameras to Film a Moving Target

Distributed Vision-Based Flying Cameras to Film a Moving Target

  • Fabio Poiesi and Andrea Cavallaro. Distributed Vision-Based Flying Cameras to Film a Moving Target. In Proceedings of International Conference on Intelligent Robots and Systems (IROS 2015), pages 2453-2459, Hambourg, Germany, 2015. doi:10.1109/IROS.2015.7353710
    [BibTeX] [Abstract] [Download PDF]

    Formations of camera-equipped quadrotors (flying cameras) have the actuation agility to track moving targets from multiple viewing angles. In this paper we propose an infrastructure-free distributed control method for multiple flying cameras tracking a moving object. The proposed visionbased servoing can deal with noisy and missing target observations, accounts for quadrotor oscillations and does not require an external positioning system. The flight direction of each camera is inferred via geometric derivation, and the formation is maintained by employing a distributed algorithm that uses the target position information on the camera plane and the position of neighboring flying cameras. Simulations show that the proposed solution enables the tracking of a moving target by the cameras flying in formation despite noisy target detections and when the target is outside some of the fields of view.

    @InProceedings{2015-09-POIESI,
    author = {Fabio Poiesi and Andrea Cavallaro},
    title = {{Distributed Vision-Based Flying Cameras to Film a Moving Target}},
    booktitle = {{Proceedings of International Conference on Intelligent Robots and Systems (IROS 2015)}},
    address = {Hambourg, Germany},
    date = {2015-09-28/2015-10-02},
    url = {http://www.fabio-poiesi.com/files/papers/conferences/2015_IROS_DistributedFlyingCamerasFilmTarget_Poiesi_Cavallaro.pdf},
    abstract = {Formations of camera-equipped quadrotors (flying cameras) have the actuation agility to track moving targets from multiple viewing angles. In this paper we propose an infrastructure-free distributed control method for multiple flying cameras tracking a moving object. The proposed visionbased servoing can deal with noisy and missing target observations, accounts for quadrotor oscillations and does not require an external positioning system. The flight direction of each camera is inferred via geometric derivation, and the formation is maintained by employing a distributed algorithm that uses the target position information on the camera plane and the position of neighboring flying cameras. Simulations show that the proposed solution enables the tracking of a moving target by the cameras flying in formation despite noisy target detections and when the target is outside some of the fields of view.},
    year = {2015},
    doi = {10.1109/IROS.2015.7353710},
    pages = {2453-2459}
    }

Posted in Dissemination | Leave a comment
Cross-layer Theoretical Analysis of NC-aided Cooperative ARQ Protocols in Correlated Shadowed Environments

Cross-layer Theoretical Analysis of NC-aided Cooperative ARQ Protocols in Correlated Shadowed Environments

  • Angelos Antonopoulos, Aris S. Lalos, Marco Di Renzo, and Christos Verikoukis. Cross-layer Theoretical Analysis of NC-aided Cooperative ARQ Protocols in Correlated Shadowed Environments. IEEE Transactions on Vehicular Technology, 64(9):4074-4087, 2015. doi:10.1109/TVT.2014.2361670
    [BibTeX]
    @Article{2015-09-ANTONOPOULOS,
    author = {Angelos Antonopoulos and Aris S. Lalos and Marco Di Renzo and Christos Verikoukis},
    title = {{Cross-layer Theoretical Analysis of NC-aided Cooperative ARQ Protocols in Correlated Shadowed Environments}},
    journal = {{IEEE Transactions on Vehicular Technology}},
    date = {2015-09},
    volume = {64},
    number = {9},
    pages = {4074-4087},
    year = {2015},
    doi = {10.1109/TVT.2014.2361670}
    }

Posted in Dissemination | Leave a comment
Pawlak’s Flow Graph Extensions for Video Surveillance Systems

Pawlak’s Flow Graph Extensions for Video Surveillance Systems

  • Karol Lisowski and Andrzej Czyżewski. Pawlak’s Flow Graph Extensions for Video Surveillance Systems. In Proceedings of the 10th International Symposium Advances in Artificial Intelligence and Applications (AAIA 2015), pages 81-87, Lodz, Poland, 2015. doi:10.15439/2015F384
    [BibTeX] [Abstract]

    The idea of the Pawlak’s flow graphs is applicable to many problems in various fields related to decision algorithms or data mining. The flow graphs can be used also in the video surveillance systems. Especially in distributed multi-camera systems which are problematic to be handled by human operators because of their limited perception. In such systems automated video analysis needs to be implemented. Important part of this analysis is tracking object within a single camera and between cameras’ fields of vision. One of element needed to re-identify the single real object besides object’s visual features and spatiotemporal dependencies between cameras is a behaviour model. The flow graph after some modifications, is a suitable data structure, which concept is based on the rough set theory, to contained as a behaviour model in it. Additionally, the flow graph can be used to predict the future movement of given object. In this paper a survey of authors research works related to employing flowgraphs in video surveillance systems is contained. The flow graph creation based on the paths of objects inside supervised area will presented. Moreover, a method of building a probability tree on the basis of the flow graph and a method for adapting the flowgraph to the changing topology of the camera network are also discussed.

    @InProceedings{2015-09-LISOWSKI,
    author = {Karol Lisowski and Andrzej Czy\.zewski},
    title = {{Pawlak's Flow Graph Extensions for Video Surveillance Systems}},
    booktitle = {{Proceedings of the 10th International Symposium Advances in Artificial Intelligence and Applications (AAIA 2015)}},
    address = {Lodz, Poland},
    date = {2015-09-13/2015-09-16},
    pages = {81-87},
    doi = {10.15439/2015F384},
    abstract = {The idea of the Pawlak’s flow graphs is applicable to many problems in various fields related to decision algorithms or data mining. The flow graphs can be used also in the video surveillance systems. Especially in distributed multi-camera systems which are problematic to be handled by human operators because of their limited perception. In such systems automated video analysis needs to be implemented. Important part of this analysis is tracking object within a single camera and between cameras’ fields of vision. One of element needed to re-identify the single real object besides object’s visual features and spatiotemporal dependencies between cameras is a behaviour model. The flow graph after some modifications, is a suitable data structure, which concept is based on the rough set theory, to contained as a behaviour model in it. Additionally, the flow graph can be used to predict the future movement of given object. In this paper a survey of authors research works related to employing flowgraphs in video surveillance systems is contained. The flow graph creation based on the paths of objects inside supervised area will presented. Moreover, a method of building a probability tree on the basis of the flow graph and a method for adapting the flowgraph to the changing topology of the camera network are also discussed.},
    year = {2015}
    }

Posted in Dissemination | Leave a comment
Zdalny zintegrowany modul nadzoru radiowo-wizyjnego (An integrated module for remote radio and visual monitoring)

Zdalny zintegrowany modul nadzoru radiowo-wizyjnego (An integrated module for remote radio and visual monitoring)

  • Janusz Cichowski, Lisowski Karol, Szczuko Piotr, and Andrzej Czyżewski. Zdalny zintegrowany moduł nadzoru radiowo-wizyjnego (An integrated module for remote radio and visual monitoring). In Krajowe Sympozjum Telekomunikacji i Teleinformatyki (KSTiT 2015), Kraków, Poland, 2015. doi:10.15199/59.2015.8-9.24
    [BibTeX]
    @InProceedings{2015-09-CICHOWSKI,
    author = {Janusz Cichowski and Lisowski Karol and Szczuko Piotr and Andrzej Czy\.zewski},
    title = {{Zdalny zintegrowany modu\l{} nadzoru radiowo-wizyjnego (An integrated module for remote radio and visual monitoring)}},
    booktitle = {{Krajowe Sympozjum Telekomunikacji i Teleinformatyki (KSTiT 2015)}},
    address = {Krak\'{o}w, Poland},
    date = {2015-09-13/2015-09-16},
    doi = {10.15199/59.2015.8-9.24},
    year = {2015}
    }

Posted in Dissemination | Leave a comment
Efficient Algorithm for Blinking LED Detection Dedicated to Embedded Systems Equipped with High Performance Cameras

Efficient Algorithm for Blinking LED Detection Dedicated to Embedded Systems Equipped with High Performance Cameras

  • Michal Tarkowski, Przemysł, and Lukasz Kulas. Efficient Algorithm for Blinking LED Detection Dedicated to Embedded Systems Equipped with High Performance Cameras. In Proceedings of International Conference on Computer as a Tool (EUROCON 2015), pages 1-4, Salamanca, Spain, 2015. doi:10.1109/EUROCON.2015.7313723
    [BibTeX]
    @InProceedings{2015-09-TARKOWSKI,
    author = {Michal Tarkowski and Przemys\l{}aw Wo\'{z}nica and \Lukasz Kulas},
    booktitle = {{Proceedings of International Conference on Computer as a Tool (EUROCON 2015)}},
    title = {{Efficient Algorithm for Blinking LED Detection Dedicated to Embedded Systems Equipped with High Performance Cameras}},
    date = {2015-09-08/2015-09-11},
    address = {Salamanca, Spain},
    doi = {10.1109/EUROCON.2015.7313723},
    pages = {1-4},
    year = {2015}
    }

Posted in Dissemination | Leave a comment
Energy Management via PI Control for Data Parallel Applications with Throughput Constraints

Energy Management via PI Control for Data Parallel Applications with Throughput Constraints

  • Anca Molnos, Warody Lombardi, Suzanne Lesecq, Julien Mottin, Diego Puschini, and Arnaud Tonda. Energy Management via PI Control for Data Parallel Applications with Throughput Constraints. In Proceedings of the 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS 2015), pages 63-70, Salvador da Bahia, Brazil, 2015. doi:10.1109/PATMOS.2015.7347588
    [BibTeX]
    @InProceedings{2015-09-MOLNOS,
    author = {Anca Molnos and Warody Lombardi and Suzanne Lesecq and Julien Mottin and Diego Puschini and Arnaud Tonda},
    title = {{Energy Management via PI Control for Data Parallel Applications with Throughput Constraints}},
    booktitle = {{Proceedings of the 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS 2015)}},
    address = {Salvador da Bahia, Brazil},
    date = {2015-09-01/2015-09-04},
    doi = {10.1109/PATMOS.2015.7347588},
    pages = {63-70},
    year = {2015}
    }

Posted in Dissemination | Leave a comment
Polyhedral AST Generation is More than Scanning Polyhedra

Polyhedral AST Generation is More than Scanning Polyhedra

  • Sven Verdoolaege, Tobias Grosser, and Albert Cohen. Polyhedral AST Generation is More than Scanning Polyhedra. Acm transactions on programming languages and systems, 37(4):12:1-12:50, 2015. doi:10.1145/2743016
    [BibTeX] [Abstract]

    Abstract mathematical representations such as integer polyhedra have been shown to be useful to precisely analyze computational kernels and to express complex loop transformations. Such transformations rely on abstract syntax tree (AST) generators to convert the mathematical representation back to an imperative program. Such generic AST generators avoid the need to resort to transformation-specific code generators, which may be very costly or technically difficult to develop as transformations become more complex. Existing AST generators have proven their effectiveness, but they hit limitations in more complex scenarios. Specifically, (1) they do not support or may fail to generate control flow for complex transformations using piecewise schedules or mappings involving modulo arithmetic; (2) they offer limited support for the specialization of the generated code exposing compact, straightline, vectorizable kernels with high arithmetic intensity necessary to exploit the peak performance of modern hardware; (3) they offer no support for memory layout transformations; and (4) they provide insufficient control over the AST generation strategy, preventing their application to complex domain-specific optimizations. We present a new AST generation approach that extends classical polyhedral scanning to the full generality of Presburger arithmetic, including existentially quantified variables and piecewise schedules, and introduce new optimizations for the detection of components and shifted strides. Not limiting ourselves to control flow generation, we expose functionality to generate AST expressions from arbitrary piecewise quasi-affine expressions, which enables the use of our AST generator for data-layout transformations. We complement this with support for specialization by polyhedral unrolling, user-directed versioning, and specialization of AST expressions according to the location at which they are generated, and we complete this work with fine-grained user control over the AST generation strategies used. Using this generalized idea of AST generation, we present how to implement complex domain-specific transformations without the need to write specialized code generators, but instead relying on a generic AST generator parametrized to a specific problem domain.

    @Article{2015-08-VERDOOLAEGE,
    author = {Sven Verdoolaege and Tobias Grosser and Albert Cohen},
    title = {{Polyhedral {AST} Generation is More than Scanning Polyhedra}},
    journal = {ACM Transactions on Programming Languages and Systems},
    date = {2015-08},
    volume = {37},
    number = {4},
    year = {2015},
    pages = {12:1-12:50},
    doi = {10.1145/2743016},
    publisher = {ACM},
    address = {New York, NY, USA},
    abstract = {Abstract mathematical representations such as integer polyhedra have been shown to be useful to precisely analyze computational kernels and to express complex loop transformations. Such transformations rely on abstract syntax tree (AST) generators to convert the mathematical representation back to an imperative program. Such generic AST generators avoid the need to resort to transformation-specific code generators, which may be very costly or technically difficult to develop as transformations become more complex. Existing AST generators have proven their effectiveness, but they hit limitations in more complex scenarios. Specifically, (1) they do not support or may fail to generate control flow for complex transformations using piecewise schedules or mappings involving modulo arithmetic; (2) they offer limited support for the specialization of the generated code exposing compact, straightline, vectorizable kernels with high arithmetic intensity necessary to exploit the peak performance of modern hardware; (3) they offer no support for memory layout transformations; and (4) they provide insufficient control over the AST generation strategy, preventing their application to complex domain-specific optimizations.
    We present a new AST generation approach that extends classical polyhedral scanning to the full generality of Presburger arithmetic, including existentially quantified variables and piecewise schedules, and introduce new optimizations for the detection of components and shifted strides. Not limiting ourselves to control flow generation, we expose functionality to generate AST expressions from arbitrary piecewise quasi-affine expressions, which enables the use of our AST generator for data-layout transformations. We complement this with support for specialization by polyhedral unrolling, user-directed versioning, and specialization of AST expressions according to the location at which they are generated, and we complete this work with fine-grained user control over the AST generation strategies used. Using this generalized idea of AST generation, we present how to implement complex domain-specific transformations without the need to write specialized code generators, but instead relying on a generic AST generator parametrized to a specific problem domain.}
    }

Posted in Dissemination | Leave a comment
Acoustic Direction Finding In Highly Reverberant Environment With Single Acoustic Vector Sensor

Acoustic Direction Finding In Highly Reverberant Environment With Single Acoustic Vector Sensor

  • Metin Aktas, Toygar Akgün, and Hüseyin Özkan. Acoustic Direction Finding In Highly Reverberant Environment With Single Acoustic Vector Sensor. In Proceedings of the 23rd European Signal Processing Conference (EUSIPCO 2015), pages 2301-2305, Nice, France, 2015. doi:10.1109/EUSIPCO.2015.7362795
    [BibTeX] [Abstract]

    We propose a novel wideband acoustic direction finding method for highly reverberant environments using measurements from a single Acoustic Vector Sensor (AVS). Since an AVS is small in size and can be effectively used within the full acoustic frequency bands, the proposed solution is suitable for wideband acoustic source localization. In particular, we introduce a novel approach to extract the signal portions that are not distorted with multipath signals and noise. We do not make any stochastic and sparseness assumptions regarding the underlying signal source. Hence, our approach can be applied to a wide range of wideband acoustic signals. We present experiments with acoustic signals that are specially exposed to long reverberations, where the Signal-to-Noise Ratio is as low as 0 dB. In these experiments, the proposed method reliably estimates the source direction with less than 5 degrees of error even under the introduced significantly high reverberation conditions.

    @InProceedings{2015-08-AKTAS,
    author = {Metin Aktas and Toygar Akg\"{u}n and H\"{u}seyin \"{O}zkan},
    title = {{Acoustic Direction Finding In Highly Reverberant Environment With Single Acoustic Vector Sensor}},
    booktitle = {{Proceedings of the 23rd European Signal Processing Conference (EUSIPCO 2015)}},
    date = {2015-08-31/2015-09-04},
    pages = {2301-2305},
    doi = {10.1109/EUSIPCO.2015.7362795},
    address = {Nice, France},
    abstract = {We propose a novel wideband acoustic direction finding method for highly reverberant environments using measurements from a single Acoustic Vector Sensor (AVS). Since an AVS is small in size and can be effectively used within the full acoustic frequency bands, the proposed solution is suitable for wideband acoustic source localization. In particular, we introduce a novel approach to extract the signal portions that are not distorted with multipath signals and noise. We do not make any stochastic and sparseness assumptions regarding the underlying signal source. Hence, our approach can be applied to a wide range of wideband acoustic signals. We present experiments with acoustic signals that are specially exposed to long reverberations, where the Signal-to-Noise Ratio is as low as 0 dB. In these experiments, the proposed method reliably estimates the source direction with less than 5 degrees of error even under the introduced significantly high reverberation conditions.},
    year = {2015}
    }

Posted in Dissemination | Leave a comment