Multi-Processor System-on-Chip 1. Liliana Andrade

Чтение книги онлайн.

Читать онлайн книгу Multi-Processor System-on-Chip 1 - Liliana Andrade страница 23

Multi-Processor System-on-Chip 1 - Liliana Andrade

Скачать книгу

Embedded Systems, CASES, San Jose, USA, 93–102.

      Dupont de Dinechin, B., Ayrignac, R., Beaucamps, P., Couvert, P., Ganne, B., de Massas, P. G., Jacquet, F., Jones, S., Chaisemartin, N. M., Riss, F., and Strudel, T. (2013). A clustered manycore processor architecture for embedded and accelerated applications. IEEE High Performance Extreme Computing Conference, Waltham, USA, 1–6.

      Dupont de Dinechin, B., van Amstel, D., Poulhiès, M., and Lager, G. (2014). Time-critical computing on a single-chip massively parallel processor. Design, Automation and Test in Europe Conference and Exhibition, Dresden, Germany, 1–6.

      Dupont de Dinechin, M., Schuh, M., Moy, M., and Maïza, C. (2020). Scaling up the memory interference analysis for hard real-time many-core systems. Design, Automation and Test in Europe Conference and Exhibition, Grenoble, France, 1–4.

      Firesmith, D. (2017). Multicore Processing [Online]. Available: https://insights.sei.cmu.edu/ sei_blog/2017/08/multicore-processing.html.

      Fisher, J. A., Faraboschi, P., and Young, C. (2005). Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools. Morgan Kaufmann Publishers Inc., San Francisco, USA.

      Forsberg, B., Palossi, D., Marongiu, A., and Benini, L. (2017). GPU-accelerated real-time path planning and the predictable execution model. Procedia Computer Science – International Conference on Computational Science, Zurich, Switzerland, 108, 2428–2432.

      Graillat, A., Moy, M., Raymond, P., and Dupont de Dinechin, B. (2018). Parallel code generation of synchronous programs for a many-core architecture. Design, Automation and Test in Europe Conference and Exhibition, Dresden, Germany, 1139–1142.

      Graillat, A., Maiza, C., Moy, M., Raymond, P., and Dupont de Dinechin, B. (2019). Response time analysis of dataflow applications on a many-core processor with shared-memory and network-on-chip. Proceedings of the 27th International Conference on Real-Time Networks and Systems. Toulouse, France, 61–69.

      Gschwind, M. (2016). Workload acceleration with the IBM POWER vector–scalar architecture. IBM Journal of Research and Development, 60(2–3).

      Gustafson, J.L. (2017). Beyond floating point: Next-generation computer arithmetic [Online]. Available: https://web.stanford.edu/class/ee380/Abstracts/170201-slides.pdf.

      Halbwachs, N., Caspi, P., Raymond, P., and Pilaud, D. (1991). The synchronous data flow programming language LUSTRE. Proceedings of the IEEE, 79(9), 1305–1320.

      Hascoët, J., Dupont de Dinechin, B., de Massas, P.G., and Ho, M.Q. (2017). Asynchronous one-sided communications and synchronizations for a clustered manycore processor. Proceedings of the 15th IEEE/ACM Symposium on Embedded Systems for Real-Time Multimedia, Seoul, Republic of Korea, 51–60.

      Hascoët, J., Dupont de Dinechin, B., Desnos, K., and Nezan, J. (2018). A distributed framework for low-latency openVX over the RDMA NoC of a clustered manycore. 2018 IEEE High Performance Extreme Computing Conference HPEC, Waltham, USA, 1–7.

      Huang, M., Men, L., and Lai, C. (2013). Accelerating mean shift segmentation algorithm on hybrid CPU/GPU platforms. In Modern Accelerator Technologies for Geographic Information Science, Shi, X., Kindratenko, V. and Yang, C. (eds). Springer, New York.

      Intel (2018). BFLOAT16 – Hardware Numerics Definition Revision 1.0. November 2018.

      Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A.G., Adam, H., and Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2704–2713.

      Jia, Z., Maggioni, M., Staiger, B., and Scarpazza, D.P. (2018). Dissecting the NVIDIA volta GPU architecture via microbenchmarking. ArXiv, abs/1804.06826.

      Johnson, J. (2018). Rethinking floating point for deep learning. ArXiv, abs/1811. 01721.

      Kanduri, A., Rahmani, A.M., Liljeberg, P., Hemani, A., Jantsch, A., and Tenhunen, H. (2017). A Perspective on Dark Silicon. Springer International Publishing.

      Kästner, D., Pister, M., Gebhard, G., Schlickling, M., and Ferdinand, C. (2013). Confidence in timing. SAFECOMP 2013 - Workshop SASSUR (Next Generation of System Assurance Approaches for Safety-Critical Systems) of the 32nd International Conference on Computer Safety, Reliability and Security, Toulouse, France.

      Krishnamoorthi, R. (2018). Quantizing deep convolutional networks for efficient inference: A whitepaper. ArXiv abs/1806.08342.

      Lee, E.A., Reineke, J., and Zimmer, M. (2017). Abstract PRET Machines. IEEE Real-Time Systems Symposium, RTSS, Paris, France, December 5–8, 1–11.

      NVIDIA (2020). Programming Tensor Cores in CUDA 9 [Online]. Available: https://devblogs.nvidia.com/programming-tensor-cores-cuda-9/.

      Perret, Q., Maurère, P., Noulard, E., Pagetti, C., Sainrat, P., and Triquet, B. (2016). Temporal isolation of hard real-time applications on many-core processors. IEEE Real-Time and Embedded Technology and Applications Symposium. Vienna, Austria, April 11-14, 37–47.

      Resmerita, D., Farias, R.C., Dupont de Dinechin, B., and Fillatre, L. (2020). Benchmarking alternative floating-point formats for deep learning inference. Conférence francophone d’informatique en Parallélisme, Architecture et Système.

      Rihani, H., Moy, M., Maiza, C., Davis, R.I., and Altmeyer, S. (2016). Response time analysis of synchronous data flow programs on a many-core processor. Proceedings of the 24th International Conference on Real-Time Networks and Systems. Brest, France, 67–76.

      Rodriguez, A., Ziv, B., Fomenko, E., Meiri, E., and Shen, H. (2018). Lower numerical precision deep learning inference and training. Intel AI Developer Program, 1–19 [Online]. Available: https://software.intel.com/content/www/us/en/develop/articles/lower-numerical-precision-deep-learning-inference-and-training.html.

      Rovder, S., Cano, J., and O’Boyle, M. (2019). Optimising convolutional neural networks inference on low-powered GPUs. 12th International Workshop on Programmability and Architectures for Heterogeneous Multicores. Valencia, Spain.

      Saidi, S., Ernst, R., Uhrig, S., Theiling, H., and Dupont de Dinechin, B. (2015). The shift to multicores in real-time and safety-critical systems. International Conference on Hardware/Software Codesign and System Synthesis. Amsterdam, The Netherlands, October 4–9, 220–229.

      Wilhelm, R. and Reineke, J. (2012). Embedded systems: Many cores - Many problems. 7th IEEE International Symposium on Industrial Embedded Systems. Karlsruhe, Germany, June 20–22, 176–180.

Скачать книгу