Multi-Processor System-on-Chip 1. Liliana Andrade

Чтение книги онлайн.

Читать онлайн книгу Multi-Processor System-on-Chip 1 - Liliana Andrade страница 22

Multi-Processor System-on-Chip 1 - Liliana Andrade

Скачать книгу

computation graph

      2.4.3. High-integrity computing

Schematic illustration of ROSACE harmonic multi-periodic case study.

      Figure 2.18. ROSACE harmonic multi-periodic case study (Graillat et al. 2018)

       – providing workers, each able to execute a set of tasks sequentially;

       – scheduling and mapping the set of tasks on the workers;

       – implementing the communication channels and their SEND/RECV methods;

       – compiling C-code with the CompCert formally verified compiler.

      In the MPPA workflow, the workers are the PEs associated with a memory bank.

      Timing verification follows the principles of the multi-core response time analysis (MRTA) framework (Davis et al. 2018). Starting from the task graph, its mapping to PEs, and given the worst-case execution time (WCET) of each task in isolation, the multi-core inference analysis (MIA) tool (Rihani et al. 2016; Dupont de Dinechin et al. 2020) refines the execution intervals of each task while updating its WCET for interference on the shared resources. The MIA tool relies on the property that the PEs, the memory hierarchy and the interconnects are timing-compositional. The refined release dates are used to activate a fast hardware release mechanism for each task. A task then executes when its input channels are data-ready (Figure 2.19).

Schematic illustration of MCG code generation of the MPPA processor.

      Figure 2.19. MCG code generation of the MPPA processor

      We introduced the third-generation MPPA processor, which implements a many-core architecture that targets intelligent systems, defined as cyber-physical systems enhanced with high-performance machine learning capabilities and strong cyber-security support. As with the GPGPU architecture, the MPPA3 architecture is composed of a number of multi-core compute units that share the processor external memory and I/O through on-chip global interconnects. However, the MPPA architecture is able to host standard software, offers excellent time predictability and provides strong partitioning capabilities. This enables us to consolidate, on a single or dual processor platform, the high-performance machine learning and computer vision functions implied by vehicle perception, the high-integrity functions developed through model-based design, and the cyber-security functions required by secured communications.

      Bodin, B., Munier-Kordon, A., and Dupont de Dinechin, B. (2013). Periodic schedules for cyclo-static dataflow. The 11th IEEE Symposium on Embedded Systems for Real-time Multimedia, Montreal, QC, Canada, 105–114.

      Bodin, B., Munier-Kordon, A., and Dupont de Dinechin, B. (2016). Optimal and fast throughput evaluation of CSDF. Proceedings of the 53rd Annual Design Automation Conference. Austin, USA, 160:1–160:6.

      Brunie, N. (2017). Modified fused multiply and add for exact low precision product accumulation. 24th IEEE Symposium on Computer Arithmetic. London, United Kingdom, 106–113.

      Carmichael, Z., Langroudi, H.F., Khazanov, C., Lillie, J., Gustafson, J.L., and Kudithipudi, D. (2019). Performance-efficiency trade-off of low-precision numerical formats in deep neural networks. Proceedings of the Conference for Next Generation Arithmetic. New York, USA, 3:1–3:9.

      CAST (2016). Multi-core Processors, Technical Report CAST-32A, FAA [Online]. Available: https://www.faa.gov/aircraft/air_cert/design_approvals/air_software/cast/cast_papers/.

      Cavicchioli, R., Capodieci, N., Solieri, M., and Bertogna, M. (2019). Novel methodologies for predictable CPU-To-GPU command offloading. Proceedings of the 31st Euromicro Conference on Real-Time Systems. Stuttgart, Germany, vol. 133 of LIPIcs, 22:1–22:22.

      Chung, E., Fowers, J., Ovtcharov, K., Papamichael, M., Caulfield, A., Massengill, T., Liu, M., Ghandi, M., Lo, D., Reinhardt, S., Alkalay, S., Angepat, H., Chiou, D., Forin, A., Burger, D., Woods, L., Weisz, G., Haselman, M., and Zhang, D. (2018). Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro, 38, 8–20.

      CNX (2019). Autoware.AI-Software-Architecture [Online]. Available: https://www.cnx-software.com/wp-content/uploads/2019/02/Autoware.AI-Software-Architecture.png.

      Davis, R.I., Altmeyer, S., Indrusiak, L.S., Maiza, C., Nélis, V., and Reineke, J. (2018). An extensible framework for multicore response time analysis. Real-Time Systems, 54(3), 607–661.

      de Dinechin, F., Forget, L., Muller, J.-M., and Uguen, Y. (2019). Posits: The good, the bad and the ugly. Proceedings of the Conference for Next Generation Arithmetic. Association for Computing Machinery, New York, USA.

      Dupont de Dinechin, B. (2004). From machine scheduling to VLIW instruction scheduling. ST Journal of Research, 1(2).

      Dupont de Dinechin, B. (2014). Using the SSA-Form in a code generator. 23rd International Conference on Compiler Construction, vol. 8409 of Lecture Notes in Computer Science, Springer, 1–17.

      Dupont de Dinechin, B., de Ferrière, F., Guillon, C., and Stoutchinin, A. (2000). Code generator optimizations for the ST120 DSP-MCU core. Proceedings of the 2000 International

Скачать книгу