  1. Roberto Peñaranda, Maria E Gomez and Pedro Lopez. A Fault-Tolerant Routing Strategy for KNS Topologies Based on Intermediate Nodes. Concurrency and Computation Practice and Experience 29(SI HiPINEB 2016), 2017. BibTeX

    	author = "Pe{\~n}aranda, Roberto and Gomez, Maria E. and Lopez, Pedro",
    	abstract = "Exascale computing systems are being built with thousands of nodes. The high number of components of these systems significantly increases the probability of failure. A key component for them is the interconnection network. If failures occur in the interconnection network, they may isolate a large fraction of the machine. For this reason, an efficient fault-tolerant mechanism is needed to keep the system interconnected, even in the presence of faults. A recently proposed topology for these large systems is the hybrid k-ary n-direct s-indirect (KNS) family that provides optimal performance and connectivity at a reduced hardware cost. This paper presents a fault-tolerant routing methodology for the KNS topology that degrades performance gracefully in presence of faults and tolerates a large number of faults without disabling any healthy computing node. In order to tolerate network failures, the methodology uses a simple mechanism. For any source-destination pair, if necessary, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network) with the aim of circumventing faults. The evaluation results shows that the proposed methodology tolerates a large number of faults. For instance, it is able to tolerate more than 99.5% of fault combinations when there are ten faults in a 3-D network with 1,000 nodes using only one intermediate node and more than 99.98% if two intermediate nodes are used. Furthermore, the methodology offers a gracious performance degradation. As an example, performance degrades only by 1% for a 2-D network with 1,024 nodes and 1% faulty links.",
    	journal = "Concurrency and Computation Practice and Experience 29(SI HiPINEB 2016)",
    	title = "{A} {F}ault-{T}olerant {R}outing {S}trategy for {KNS} {T}opologies {B}ased on {I}ntermediate {N}odes",
    	year = 2017
  2. Roberto Peñaranda, Pedro Lopez and Maria E Gomez. A New Fault-Tolerant Routing Methodology for KNS Topologies. 2016 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), 2016. BibTeX

    	author = "Pe{\~n}aranda, Roberto and Lopez, Pedro and Gomez, Maria E.",
    	journal = "2016 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB)",
    	title = "{A} {N}ew {F}ault-{T}olerant {R}outing {M}ethodology for {KNS} {T}opologies",
    	year = 2016
  3. Roberto Peñaranda, Crispín Gomez, Maria E Gomez and Pedro Lopez. XORAdap: A HoL-Blocking Aware Adaptive Routing Algorithm. 2015 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2015. BibTeX

    	author = "Pe{\~n}aranda, Roberto and Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro",
    	journal = "2015 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)",
    	title = "{XORA}dap: {A} {H}o{L}-{B}locking {A}ware {A}daptive {R}outing {A}lgorithm",
    	year = 2015
  4. Salvador Petit, Rafael Ubal, Julio Sahuquillo and Pedro Lopez. Efficient Register Renaming and Recovery for High-Performance Processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 7(22):1506-1514, 2014. BibTeX

    	author = "Petit, Salvador and Ubal, Rafael and Sahuquillo, Julio and Lopez, Pedro",
    	abstract = "Modern superscalar processors implement register renaming using either random access memory (RAM) or content-addressable memories (CAM) tables. The design of these structures should address both access time and misprediction recovery penalty. Although direct-mapped RAMs provide faster access times, CAMs are more appropriate to avoid recovery penalties. The presence of associative ports in CAMs, however, prevents them from scaling with the number of physical registers and pipeline width, negatively impacting performance, area, and energy consumption at the rename stage. In this paper, we present a new hybrid RAM–CAM register renaming scheme, which combines the best of both approaches. In a steady state, a RAM provides fast and energy-efficient access to register mappings. On misspeculation, a low-complexity CAM enables immediate recovery. Experimental results show that in a four-way state-of-the-art superscalar processor, the new approach provides almost the same performance as an ideal CAM-based renaming scheme, while dissipating only between 17% and 26% of the original energy and, in some cases, consuming less energy than purely RAM-based renaming schemes. Overall, the silicon area required to implement the hybrid RAM–CAM scheme does not exceed the area required by conventional renaming mechanisms.",
    	journal = "IEEE Transactions on Very Large Scale Integration (VLSI) Systems",
    	number = 22,
    	pages = "1506-1514",
    	title = "{E}fficient {R}egister {R}enaming and {R}ecovery for {H}igh-{P}erformance {P}rocessors",
    	volume = 7,
    	year = 2014
  5. Roberto Peñaranda, Crispín Gomez, Maria E Gomez, Pedro Lopez and Jose Duato. A New Family of Hybrid Topologies for Large-Scale Interconnection Networks. IEEE 11th International Symposium on Network Computing and Applications, pages 220-227, August 2012. BibTeX

    	author = "Pe{\~n}aranda, Roberto and Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "In large supercomputers the topology of the interconnection network is a key design issue that impacts the performance and cost of the whole system. Direct topologies provide a reduced hardware cost, but, as the number of dimensions is conditioned by 3D wiring restrictions, a high number of nodes per dimension is used, which increases communication latency and reduces network throughput. On the other hand, indirect topologies can provide better performance for large network sizes, but at the cost of a high number of switches and links. In this paper, we propose a new family of topologies that combines the best features of both direct and indirect topologies to efficiently connect an extremely high number of nodes. In particular, we propose an n–dimensional topology, where the nodes of each dimension are connected through a small indirect topology. This combination results in a family of topologies that provides high performance, with latency and throughput figures of merit close to indirect topologies, but with a lower hardware cost. In particular, it is able to double the throughput obtained per switching element of indirect topologies. Moreover, the layout of the topology is much simpler than in indirect topologies. Indeed, its fault–tolerance degree is equal or higher than the one for direct and indirect topologies.",
    	journal = "IEEE 11th International Symposium on Network Computing and Applications",
    	keywords = "routing algorithm, direct topology, indirect topology",
    	month = "August",
    	pages = "220-227",
    	title = "{A} {N}ew {F}amily of {H}ybrid {T}opologies for {L}arge-{S}cale {I}nterconnection {N}etworks",
    	year = 2012
  6. Roberto Peñaranda, Crispín Gomez, Maria E Gomez and Pedro Lopez. A New Family of Hybrid Topologies for Large-Scale Interconnection Networks. Network Computing and Applications (NCA), 2012 11th IEEE International Symposium on, 2012. BibTeX

    	author = "Pe{\~n}aranda, Roberto and Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro",
    	abstract = "In large supercomputers the topology of the interconnection network is a key design issue that impacts the performance and cost of the whole system. Direct topologies provide a reduced hardware cost, but as the number of dimensions is conditioned by 3D wiring restrictions, a high number of nodes per dimension is used, which increases communication latency and reduces network throughput. On the other hand, indirect topologies can provide better performance for large network sizes, but at the cost of a high amount of switches and links. In this paper we propose a new family of topologies that combines the best features of both direct and indirect topologies to efficiently connect an extremely high number of nodes. In particular, we propose an n-dimensional topology where the nodes of each dimension are connected through a small indirect topology. This combination results in a family of topologies that provides high performance, with latency and throughput figures of merit close to indirect topologies, but with a lower hardware cost. In particular, it is able to double the throughput obtained per switching element of indirect topologies. Moreover, the layout of the topology is much simpler than in indirect topologies. Indeed, its fault-tolerance degree is equal or higher than the one for direct and indirect topologies.",
    	journal = "Network Computing and Applications (NCA), 2012 11th IEEE International Symposium on",
    	title = "{A} {N}ew {F}amily of {H}ybrid {T}opologies for {L}arge-{S}cale {I}nterconnection {N}etworks",
    	year = 2012
    	author = "Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Networks on-chip (NoCs) interconnect the components located inside a chip. In multicore chips, NoCs have a strong impact on the overall system performance. NoC bandwidth is limited by the critical path delay. Recent works show that the critical path delay is heavily affected by switch port buffer size. Therefore, by removing buffers, switch clock frequency can be increased. Recently, a new switching technique for NoCs called Blind Packet Switching (BPS) has been proposed, which is based on removing the switch port buffers. Since buffers consume a high percentage of switch power and area, BPS not only improves performance but also reduces power and area. In BPS, as there are no buffers at the switch ports, packets cannot be stopped and stored on them. If contention arises packets are dropped and later reinjected, negatively affecting performance. In order to prevent packet dropping, some techniques based on resource replication have been proposed. In this paper, we propose some alternative and complementary techniques that do not rely on resource replication. By using them, packet dropping is highly reduced. In particular, packet dropping is completely removed for a very wide network traffic range. Moreover, network throughput is increased and packet latency is reduced. © 2010 John Wiley {\&} Sons, Ltd.",
    	address = "UK",
    	doi = "10.1002/cpe.1606",
    	issn = "1532-0626",
    	journal = "Concurrency and Computation: Practice and Experience",
    	keywords = "buffer circuits;circuit switching;network-on-chip;",
    	note = "packet dropping reduction;bufferless NoC;networks on-chip;critical path delay;switch clock frequency;blind packet switching;switch port buffers;network traffic range;",
    	number = 1,
    	pages = "86 - 99",
    	title = "{H}ow to reduce packet dropping in a bufferless {N}o{C}",
    	url = "",
    	volume = 23,
    	year = 2011
    	author = "Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Abstract Networks on-chip (NoCs) interconnect the components located inside a chip. In multicore chips, NoCs have a strong impact on the overall system performance. NoC bandwidth is limited by the critical path delay. Recent works show that the critical path delay is heavily affected by switch port buffer size. Therefore, by removing buffers, switch clock frequency can be increased. Recently, a new switching technique for NoCs called Blind Packet Switching (BPS) has been proposed, which is based on removing the switch port buffers. Since buffers consume a high percentage of switch power and area, BPS not only improves performance but also reduces power and area. In BPS, as there are no buffers at the switch ports, packets cannot be stopped and stored on them. If contention arises packets are dropped and later reinjected, negatively affecting performance. In order to prevent packet dropping, some techniques based on resource replication have been proposed. In this paper, we propose some alternative and complementary techniques that do not rely on resource replication. By using them, packet dropping is highly reduced. In particular, packet dropping is completely removed for a very wide network traffic range. Moreover, network throughput is increased and packet latency is reduced. Copyright © 2010 John Wiley {\&} Sons, Ltd.",
    	doi = "10.1002/cpe.1606",
    	issn = "1532-0634",
    	journal = "Concurrency and Computation: Practice and Experience",
    	keywords = "networks on-chip;buffer limitations;packet dropping reduction",
    	number = 1,
    	pages = "86-99",
    	title = "{H}ow to reduce packet dropping in a bufferless {N}o{C}",
    	url = "",
    	volume = 23,
    	year = 2011
    	author = "Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Networks on-chip (NoCs) interconnect the components located inside a chip. In multicore chips, NoCs have a strong impact on the overall system performance. NoC bandwidth is limited by the critical path delay. Recent works show that the critical path delay is heavily affected by switch port buffer size. Therefore, by removing buffers, switch clock frequency can be increased. Recently, a new switching technique for NoCs called Blind Packet Switching (BPS) has been proposed, which is based on removing the switch port buffers. Since buffers consume a high percentage of switch power and area, BPS not only improves performance but also reduces power and area. In BPS, as there are no buffers at the switch ports, packets cannot be stopped and stored on them. If contention arises packets are dropped and later reinjected, negatively affecting performance. In order to prevent packet dropping, some techniques based on resource replication have been proposed. In this paper, we propose some alternative and complementary techniques that do not rely on resource replication. By using them, packet dropping is highly reduced. In particular, packet dropping is completely removed for a very wide network traffic range. Moreover, network throughput is increased and packet latency is reduced. Copyright {\&}copy; 2010 John Wiley {\&}amp; Sons, Ltd.",
    	address = "Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom",
    	issn = 15320626,
    	journal = "Concurrency Computation Practice and Experience",
    	key = "Packet switching",
    	keywords = "Signal filtering and prediction;",
    	note = "buffer limitations;Buffer sizes;Clock frequency;Critical path delays;Multicore chips;Network throughput;Network traffic;On chips;Packet dropping;Packet latencies;Resource replication;Switch ports;Switch power;Switching techniques;",
    	number = 1,
    	pages = "86 - 99",
    	title = "{H}ow to reduce packet dropping in a bufferless {N}o{C}",
    	url = "",
    	volume = 23,
    	year = 2011
    	author = "Alonso, Marina and Coll, Salvador and Mart{\'i}nez, Juan Miguel and Santonja, Vicente and Lopez, Pedro and Duato, Jose",
    	abstract = "The high level of computing power required for some applications can only be achieved by multiprocessor systems. These systems consist of several processors that communicate by means of an interconnection network. The huge increase both in size and complexity of high-end multiprocessor systems has triggered up their power consumption. Complex cooling systems are needed, which, in turn, increases power consumption. Power consumption reduction techniques are being applied everywhere in computer systems and the interconnection network is not an exception, as its contribution is not negligible. In this paper, we propose a mechanism to reduce interconnect power consumption that combines two alternative techniques: (i) dynamically switching on and off network links as a function of traffic (any link can be switched off, provided that network connectivity is guaranteed), (ii) dynamically reducing the available network bandwidth when traffic becomes low. In both cases, the topology of the network is not modified. Therefore, the same routing algorithm can be used regardless of the power saving actions taken, thus simplifying router design. Our simulation results show that the network power consumption can be greatly reduced, at the expense of some increase in latency. However, the achieved power reduction is always higher than the latency penalty.",
    	doi = "DOI: 10.1016/j.parco.2010.08.003",
    	issn = "0167-8191",
    	journal = "Parallel Computing",
    	keywords = "Power saving; Interconnection networks; Routing",
    	number = 12,
    	pages = "696 - 712",
    	title = "{P}ower saving in regular interconnection networks",
    	url = "",
    	volume = 36,
    	year = 2010
  11. Joan-Lluis Ferrer, Elvira Baydal, Antonio Robles, Pedro Lopez and Jose Duato. A Scalable and Early Congestion Management Mechanism for MINs. In Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2010. 2010, 43 - 50. URL BibTeX

    	author = "Ferrer, Joan-Lluis and Baydal, Elvira and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "Several packet marking-based mechanisms have been proposed to manage congestion in multistage interconnection networks. One of them, the MVCM mechanism obtains very good results for different network configurations and traffic loads. However, as MVCM applies full virtual output queuing at origin, its memory requirements may jeopardize its scalability. Additionally, the applied packet marking technique introduces certain delay to detect congestion. In this paper, we propose and evaluate the Scalable Early Congestion Management mechanism which eliminates the drawbacks exhibited by MVCM. The new mechanism replaces the full virtual output queuing at origin by either a partial virtual output queuing or a shared buffer, in order to reduce its memory requirements, thus making the mechanism scalable. Also, it applies an improved packet marking technique based on marking packets at output buffers regardless of their marking at input buffers, which simplifies the marking technique, allowing also a sooner detection of the root of a congestion tree.",
    	address = "Piscataway, NJ, USA",
    	booktitle = "Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2010",
    	journal = "Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010)",
    	keywords = "multistage interconnection networks;",
    	note = "packet marking based mechanisms;multistage interconnection networks;MVCM mechanism;virtual output queuing;scalable early congestion management mechanism;shared buffer;",
    	pages = "43 - 50",
    	title = "{A} {S}calable and {E}arly {C}ongestion {M}anagement {M}echanism for {MIN}s",
    	url = "",
    	year = 2010
  12. Salvador Petit, Rafael Ubal, Julio Sahuquillo and Pedro Lopez. A power-aware hybrid RAM-CAM renaming mechanism for fast recovery. In Computer Design, 2009. ICCD 2009. IEEE International Conference on. 2009, 150 -157. URL, DOI BibTeX

    	author = "Petit, Salvador and Ubal, Rafael and Sahuquillo, Julio and Lopez, Pedro",
    	abstract = "Modern superscalar processors implement register renaming by using either RAM or CAM tables. The design of these structures should address their access time and misprediction recovery penalty. While direct-mapped RAMs provide faster access times, CAMs are more appropriate to avoid recovery penalties. Although they are more complex and slower, CAMs usually match the processor cycle in current designs. However, they do not scale with the number of physical registers and the pipeline width. In this paper we present a new hybrid RAM-CAM register renaming scheme, which combines the best of both approaches. In a steady state, a RAM provides the current mappings quickly; on mispeculation, a low-complexity CAM enables immediate recovery and further register renaming. Compared to an ideal CAM in a 4-way state-of-the-art superscalar microprocessor, and for almost the same performance (1% slowdown) and area (95% of the ideal CAM size), the proposed scheme consumes about 90% less dynamic energy.",
    	booktitle = "Computer Design, 2009. ICCD 2009. IEEE International Conference on",
    	doi = "10.1109/ICCD.2009.5413160",
    	issn = "1063-6404",
    	keywords = "direct-mapped RAM;misprediction recovery penalty;physical registers;pipeline width;power-aware hybrid RAM-CAM renaming mechanism;processor cycle;register renaming;superscalar processors;microprocessor chips;power aware computing;random-access storage;",
    	month = "oct.",
    	pages = "150 -157",
    	title = "{A} power-aware hybrid {RAM}-{CAM} renaming mechanism for fast recovery",
    	url = "",
    	year = 2009
  13. Salvador Petit, Rafael Ubal, Julio Sahuquillo, Pedro Lopez and Jose Duato. An Efficient Low-Complexity Alternative to the ROB for Out-of-Order Retirement of Instructions. In Antonio Nunez; Pedro P Carballo (ed.). Digital System Design, Architectures, Methods and Tools, 2009. DSD '09. 12th Euromicro Conference on. 2009, 635 -642. URL, DOI BibTeX

    	author = "Petit, Salvador and Ubal, Rafael and Sahuquillo, Julio and Lopez, Pedro and Duato, Jose",
    	abstract = "Current superscalar processors use a reorder buffer (ROB) to support speculation, precise exceptions, and register reclamation. Instructions are retired from this structure in program order, which may lead to significant performance degradation if a long latency operation blocks the ROB head. In this paper, a checkpoint-free out-of-order commit architecture is proposed, which replaces the ROB with a small structure called validation buffer (VB) from which instructions are retired as soon as their speculative state is resolved. An aggressive register reclamation mechanism targeted to this microarchitecture is also devised. Experimental results show that the VB microarchitecture is much more efficient than a ROB-based microprocessor. For example, a 32-entry VB provides similar performance to a 256-entry ROB, while reducing the utilization of other major processor structures.",
    	booktitle = "Digital System Design, Architectures, Methods and Tools, 2009. DSD '09. 12th Euromicro Conference on",
    	doi = "10.1109/DSD.2009.237",
    	editor = "Antonio Nunez; Pedro P. Carballo",
    	isbn = "978-0-7695-3782-5",
    	keywords = "ROB-based microprocessor;checkpoint-free out-of-order commit architecture;out-of-order instruction retirement;register reclamation;register reclamation mechanism;superscalar reorder buffer processors;validation buffer;buffer circuits;microprocessor chips;",
    	month = "aug.",
    	pages = "635 -642",
    	title = "{A}n {E}fficient {L}ow-{C}omplexity {A}lternative to the {ROB} for {O}ut-of-{O}rder {R}etirement of {I}nstructions",
    	url = "",
    	year = 2009
  14. , M Palesi, Jose Flich, S Kumar, Pedro Lopez, R Holsmark and Jose Duato. Region-Based Routing: A Mechanism to Support Efficient Routing Algorithms in NoCs. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 17(3):356 -369, March 2009. URL, DOI BibTeX

    	author = ", and M. Palesi and Flich, Jose and S. Kumar and Lopez, Pedro and R. Holsmark and Duato, Jose",
    	abstract = "An efficient routing algorithm is important for large on-chip networks [network-on-chip (NoC)] to provide the required communication performance to applications. Implementing NoC using table-based switches provide many advantages, including possibility of changing routing algorithms and fault tolerance, due to the option of table reconfigurations. However, table-based switches have been considered unsuitable for NoCs due to their perceived high area and power consumption. In this paper, we describe the region-based routing (RBR) mechanism which groups destinations into network regions allowing an efficient implementation with logic blocks. RBR can also be viewed as a mechanism to reduce the number of entries in routing tables. RBR is general and can be used in conjunction with any adaptive routing algorithm. In particular, we have evaluated the proposed scheme in conjunction with a general routing algorithm, namely segment-based routing (SR) and an application specific routing algorithm (APSRA) using regular and irregular mesh topologies. Our study shows that the number of entries in the table is significantly reduced, especially for large networks. Evaluation results show that RBR requires only four regions to support several routing algorithms in a 2-D mesh with no performance degradation. Considering link failures, our results indicate that RBR combined with SR is able to tolerate up to 7 link failures in an 8times8 mesh. RBR also reduces area and power dissipation of an equivalent table-based implementation by factors of 8 and 10, respectively. Moreover, the degradation in performance of the network is insignificant when using APSRA combined with RBR.",
    	doi = "10.1109/TVLSI.2008.2012010",
    	issn = "1063-8210",
    	journal = "Very Large Scale Integration (VLSI) Systems, IEEE Transactions on",
    	keywords = "adaptive routing algorithm;application specific routing algorithm;fault tolerance;large on-chip networks;network-on-chip;region-based routing mechanism;segment-based routing;table-based switches;network topology;network-on-chip;",
    	month = "march",
    	number = 3,
    	pages = "356 -369",
    	title = "{R}egion-{B}ased {R}outing: {A} {M}echanism to {S}upport {E}fficient {R}outing {A}lgorithms in {N}o{C}s",
    	url = "",
    	volume = 17,
    	year = 2009
  15. D Ludovici, Francisco Gilabert, S Medardoni, Crispín Gomez, Maria E Gomez, Pedro Lopez, G N Gaydadjiev and D Bertozzi. Assessing fat-tree topologies for regular network-on-chip design under nanoscale technology constraints. In 2009 Design, Automation & Test in Europe Conference & Exhibition (DATE'09). 2009, 4 pp. -. BibTeX

    	author = "D. Ludovici and Gilabert, Francisco and S. Medardoni and Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro and G.N. Gaydadjiev and D. Bertozzi",
    	abstract = "Most of past evaluations of fat-trees for on-chip interconnection networks rely on oversimplifying or even irrealistic architecture and traffic pattern assumptions, and very few layout analyses are available to relieve practical feasibility concerns in nanoscale technologies. This work aims at providing an in-depth assessment of physical synthesis efficiency of fat-trees and at extrapolating silicon-aware performance figures to back-annotate in the system-level performance analysis. A 2D mesh is used as a reference architecture for comparison, and a 65 nm technology is targeted by our study. Finally, in an attempt to mitigate the implementation cost of k-ary n-tree topologies, we also review an alternative unidirectional multi-stage interconnection network which is able to simplify the fat-tree architecture and to minimally impact performance.",
    	address = "Piscataway, NJ, USA",
    	booktitle = "2009 Design, Automation {\&} Test in Europe Conference {\&} Exhibition (DATE'09)",
    	journal = "2009 Design, Automation {{\&}}amp; Test in Europe Conference {{\&}}amp; Exhibition (DATE'09)",
    	keywords = "extrapolation;integrated circuit interconnections;integrated circuit layout;nanoelectronics;network topology;network-on-chip;",
    	note = "fat-tree topology;network-on-chip design;nanoscale technology;on-chip interconnection network;traffic pattern;layout analysis;extrapolation;system-level performance analysis;",
    	pages = "4 pp. -",
    	title = "{A}ssessing fat-tree topologies for regular network-on-chip design under nanoscale technology constraints",
    	year = 2009
  16. Crispín Gomez, Maria E Gomez, Pedro Lopez and Jose Duato. An Efficient Switching Technique for NoCs with Reduced Buffer Requirements. In Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on. 2008, 713 -720. URL, DOI BibTeX

    	author = "Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Networks on chip (NoCs) communicate the components located inside a chip. Overall system performance depends on NoC performance, that is affected by several factors. One of them is the network clock frequency, imposed by the critical path delay. Recent works show that switch critical path includes buffer control logic. Consequently, by removing switch buffers, switch frequency can be doubled. In this paper, we exploit this idea, proposing a new switching technique for NoCs which requires a reduced amount of storage at the switches. It is based on replacing switch port buffers by single latches. By doing so, network cycle can be reduced, which reduces packet latency. On the other hand, power and area consumption requirements can be reduced. However, since there are no buffers at the switch ports, packets can not be stopped. Stopped packets due to contention are dropped and reinjected from their senders via negative acknowledgments. Packet dropping is strongly reduced by exploiting NoCs wiring capability.",
    	booktitle = "Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on",
    	doi = "10.1109/ICPADS.2008.43",
    	issn = "1521-9097",
    	keywords = "buffer control logic;critical path delay;network clock frequency;network cycle;networks on chip;packet dropping;reduced buffer requirements;switching technique;network-on-chip;performance evaluation;",
    	month = "dec.",
    	pages = "713 -720",
    	title = "{A}n {E}fficient {S}witching {T}echnique for {N}o{C}s with {R}educed {B}uffer {R}equirements",
    	url = "",
    	year = 2008
  17. Crispín Gomez, Francisco Gilabert, Maria E Gomez, Pedro Lopez and Jose Duato. Beyond Fat–tree: Unidirectional Load–Balanced Multistage Interconnection Network. Computer Architecture Letters 7(2):49 -52, 2008. URL, DOI BibTeX

    	author = "Gomez, Crisp{\'i}n and Gilabert, Francisco and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "The fat-tree is one of the most widely-used topologies by interconnection network manufacturers. Recently, it has been demonstrated that a deterministic routing algorithm that optimally balances the network traffic can not only achieve almost the same performance than an adaptive routing algorithm but also outperforms it. On the other hand, fat-trees require a high number of switches with a non-negligible wiring complexity. In this paper, we propose replacing the fat-tree by a unidirectional multistage interconnection network (UMIN) that uses a traffic balancing deterministic routing algorithm. As a consequence, switch hardware is almost reduced to the half, decreasing, in this way, the power consumption, the arbitration complexity, the switch size itself, and the network cost. Preliminary evaluation results show that the UMIN with the load balancing scheme obtains lower latency than fat-tree for low and medium traffic loads. Furthermore, in networks with a high number of stages or with high radix switches, it obtains the same, or even higher, throughput than fat-tree.",
    	doi = "10.1109/L-CA.2008.8",
    	issn = "1556-6056",
    	journal = "Computer Architecture Letters",
    	keywords = "adaptive routing algorithm;interconnection network manufacturers;network traffic;nonnegligible wiring complexity;power consumption;radix switches;traffic balancing deterministic routing algorithm;unidirectional load-balanced multistage interconnection net",
    	month = "july-dec.",
    	number = 2,
    	pages = "49 -52",
    	title = "{B}eyond {F}at--tree: {U}nidirectional {L}oad--{B}alanced {M}ultistage {I}nterconnection {N}etwork",
    	url = "",
    	volume = 7,
    	year = 2008
  18. Crispín Gomez, Maria E Gomez, Pedro Lopez and Jose Duato. Exploiting Wiring Resources on Interconnection Network: Increasing Path Diversity. In Parallel, Distributed and Network-Based Processing, 2008. PDP 2008. 16th Euromicro Conference on. 2008, 20 -29. URL, DOI BibTeX

    	author = "Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "On-chip networks are the answer to the growing demands for high communication performance of chip multiprocessors. These networks have a number of characteristics that make their design quite different to off-chip networks. In particular, wires are an abundant available resource inside the chip. In this paper, we explore how to organize the huge wiring capabilities available in on-chip networks. In particular, we analyze the option of distributing the wires among several parallel links connecting the same two switches. This technique is known as Space Division Multiplexing (SDM). The number of parallel sub-links and their width are two key parameters that are studied together with the relationship with the mean packet size. The paper shows that SDM is a technique to take into account in on-chip networks since it allows to highly increase the network accepted traffic at the expense of a small latency increase or even no increase. Moreover, in some networks, it allows to reduce the network hardware, providing simiar performance results, which results in a reduction in the consumption of area and power.",
    	booktitle = "Parallel, Distributed and Network-Based Processing, 2008. PDP 2008. 16th Euromicro Conference on",
    	doi = "10.1109/PDP.2008.33",
    	isbn = "978-0-7695-3089-5",
    	issn = "1066-6192",
    	keywords = "chip multiprocessors;interconnection network;mean packet size;on-chip networks;parallel links;path diversity;space division multiplexing;wiring capabilities;wiring resources;multiprocessor interconnection networks;space division multiplexing;wiring;",
    	month = "feb.",
    	pages = "20 -29",
    	title = "{E}xploiting {W}iring {R}esources on {I}nterconnection {N}etwork: {I}ncreasing {P}ath {D}iversity",
    	url = "",
    	year = 2008
  19. Crispín Gomez, Francisco Gilabert, Maria E Gomez, Pedro Lopez and Jose Duato. RUFT: Simplifying the fat-tree topology. In Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on. 2008, 153 - 160. URL BibTeX

    	author = "Gomez, Crisp{\'i}n and Gilabert, Francisco and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "The fat-tree is one of the most widely-used topologies by interconnection network manufacturers. Recently, a deterministic routing algorithm that optimally balances the network traffic in fat-trees was proposed. It can not only achieve almost the same performance than adaptive routing, but also outperforms it for some traffic patterns. Nevertheless, fat-trees require a high number of switches with a non-negligible wiring complexity. In this paper, we propose replacing the fat-tree by an unidirectional multistage interconnection network referred to as Reduced Unidirectional Fat-tree (RUFT) that uses a a simplified version of the aforementioned deterministic routing algorithm. As a consequence, switch hardware is almost reduced to the half, decreasing, in this way, power consumption, arbitration complexity, switch size, and network cost. Evaluation results show that RUFT obtains lower latency than fat-tree for low and medium traffic loads. Furthermore, in large networks, it obtains almost the same throughput than the classical fat-tree. {{\&}}copy; 2008 IEEE.",
    	address = "Melbourne, VIC, Australia",
    	booktitle = "Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on",
    	issn = 15219097,
    	journal = "Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS",
    	key = "Trees (mathematics)",
    	keywords = "Interconnection networks;Internet;Routing algorithms;Switches;Switching circuits;",
    	note = "Adaptive routing;Deterministic routing algorithms;Evaluation results;Large networks;Multi-stage interconnection networks;Network costs;Network traffics;Number of switches;Power consumption;Switch sizes;Traffic loads;Traffic patterns;Tree topologies;",
    	pages = "153 - 160",
    	title = "{RUFT}: {S}implifying the fat-tree topology",
    	url = "",
    	year = 2008
  20. Crispín Gomez, Maria E Gomez, Pedro Lopez and Jose Duato. Reducing packet dropping in a bufferless NoC. In Euro-Par 2008 – Parallel Processing. 2008, 899 - 909. URL BibTeX

    	author = "Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Networks on chip (NoCs) has a strong impact on overall chip performance. Interconnection bandwidth is limited by the critical path delay. Recent works show that the critical path includes the switch input buffer control logic. As a consequence, by removing buffers, switch clock frequency can be doubled. Recently, a new switching technique for NoCs called blind packet switching (BPS) has been proposed. It is based on replacing the buffers of the switch ports by simple latches. Since buffers consume a high percentage of switch power and area, BPS not only improves performance but also helps in reducing power and area. In BPS there are no buffers at the switch ports, so packets can not be stopped. If the required output port is busy, the packet will be dropped. In order to prevent packet dropping, some techniques based on resource replication has been proposed. In this paper, we propose some alternative and complementary techniques that does not rely on resource replication. By using these techniques, packet dropping and its negative effects are highly reduced. In particular, packet dropping is completely removed for a very wide network traffic range. The first dropped packet appears at a 11.6 higher traffic load. As a consequence, network throughput is increased and the packet latency is kept almost constant.",
    	address = "Berlin, Germany",
    	booktitle = "Euro-Par 2008 – Parallel Processing",
    	journal = "Euro-Par 2008 Parallel Processing. 14th International Euro-Par Conference",
    	keywords = "delays;multiprocessor interconnection networks;network-on-chip;packet switching;",
    	note = "bufferless NoC;networks on chip;interconnection bandwidth;buffer control logic;blind packet switching;resource replication;packet dropping;network traffic load;critical path delay;",
    	pages = "899 - 909",
    	title = "{R}educing packet dropping in a bufferless {N}o{C}",
    	url = "",
    	year = 2008
  21. Joan-Lluis Ferrer, Elvira Baydal, Antonio Robles, Pedro Lopez and Jose Duato. On the influence of the packet marking and injection control schemes in congestion management for MINs. 2008, 930 - 9. URL BibTeX

    	author = "Ferrer, Joan-Lluis and Baydal, Elvira and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "Several Congestion Management Mechanisms (CMMs) have been proposed for Multistage Interconnection Networks (MINs) in order to avoid the degradation of network performance when congestion appears. Most of them are based on Explicit Congestion Notification (ECN). For this purpose, switches detect congestion and, depending on the applied mechanism, some flags are marked to warn the source hosts. In response, source hosts apply corrective actions to adjust their packet injection rate. These mechanisms have been evaluated by analyzing whether they are able to manage a congestion situation but there is not a comparison study among them. Moreover, marking effects are not separately analyzed from corrective actions. In this paper, we analyze the current proposals for CMMs, showing the impact of the applied packet marking techniques as well as the corrective actions they apply.",
    	address = "Berlin, Germany",
    	journal = "Euro-Par 2008 Parallel Processing. 14th International Euro-Par Conference",
    	keywords = "multistage interconnection networks;packet switching;telecommunication congestion control;",
    	note = "packet marking;injection control schemes;congestion management mechanisms;multistage interconnection networks;explicit congestion notification;message throttling;",
    	pages = "930 - 9",
    	title = "{O}n the influence of the packet marking and injection control schemes in congestion management for {MIN}s",
    	url = "",
    	year = 2008
  22. Francisco Gilabert, S Medardoni, D Bertozzi, L Benini, Maria E Gomez, Pedro Lopez and Jose Duato. Exploring high-dimensional topologies for NoC design through an integrated analysis and synthesis framework. 2008, 107 - 16. BibTeX

    	author = "Gilabert, Francisco and S. Medardoni and D. Bertozzi and L. Benini and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Networks-on-chip (NoCs) address the challenge to provide scalable communication bandwidth to tiled architectures in a power-efficient fashion. The 2-D mesh is currently the most popular regular topology used for on-chip networks in tile-based architectures, because it perfectly matches the 2-D silicon surface and is easy to implement. However, a number of limitations have been proved in the open literature, especially for long distance traffic. Two relevant variants of 2-D meshes are explored in this paper: high-dimensional and concentrated topologies. The novelty of our exploration framework includes the use of fast and accurate transaction level simulation to provide constraints to the physical synthesis flow, which is integrated with standard industrial toolchains for accurate physical implementation. Interestingly, this work illustrates how effectively the compared topologies can handle synchronization-intensive traffic patterns and accounts for chip I/O interfaces.",
    	address = "Piscataway, NJ, USA",
    	journal = "2008 2nd ACM/IEEE International Symposium on Networks-on-Chip (NOCS '08)",
    	keywords = "integrated circuit design;logic design;network topology;network-on-chip;",
    	note = "NoC design;networks-on-chip;2D mesh topology;on-chip networks;tile-based architectures;industrial toolchains;chip I/O interfaces;",
    	pages = "107 - 16",
    	title = "{E}xploring high-dimensional topologies for {N}o{C} design through an integrated analysis and synthesis framework",
    	year = 2008
  23. Crispín Gomez, Maria E Gomez, Pedro Lopez and Jose Duato. Exploiting wiring resources on interconnection network: Increasing path diversity. In Parallel, Distributed and Network-Based Processing, 2008. PDP 2008. 16th Euromicro Conference on. 2008, 20 - 29. URL BibTeX

    	author = "Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "On-chip networks are the answer to the growing demands for high communication performance of chip multiprocessors. These networks have a number of characteristics that make their design quite different to off-chip networks. In particular, wires are an abundant available resource inside the chip. In this paper, we explore how to organize the huge wiring capabilities available in on-chip networks. In particular, we analyze the option of distributing the wires among several parallel links connecting the same two switches. This technique is known as Space Division Multiplexing (SDM). The number of parallel sub-links and their width are two key parameters that are studied together with the relationship with the mean packet size. The paper shows that SDM is a technique to take into account in on-chip networks since it allows to highly increase the network accepted traffic at the expense of a small latency increase or even no increase. Moreover, in some networks, it allows to reduce the network hardware, providing similar performance results, which results in a reduction in the consumption of area and power. © 2008 IEEE.",
    	address = "Toulouse, France",
    	booktitle = "Parallel, Distributed and Network-Based Processing, 2008. PDP 2008. 16th Euromicro Conference on",
    	journal = "Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2008",
    	key = "Space division multiple access",
    	keywords = "Electric network topology;Internet;Telecommunication;Wire;",
    	note = "Chip multi processor (CMP);Communication performances;Key parameters;Latency increase;Off chip;On Chip Network (OCN);Packet size (PS);Parallel links;Path diversity;Performance results;Space division multiplexing (SDM);",
    	pages = "20 - 29",
    	title = "{E}xploiting wiring resources on interconnection network: {I}ncreasing path diversity",
    	url = "",
    	year = 2008
  24. Crispín Gomez, Maria E Gomez, Pedro Lopez and Jose Duato. An efficient switching technique for NoCs with reduced buffer requirements. In Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on. 2008, 713 - 20. URL BibTeX

    	author = "Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Networks on chip (NoCs) communicate the components located inside a chip. Overall system performance depends on NoC performance, that is affected by several factors. One of them is the network clock frequency, imposed by the critical path delay. Recent works show that switch critical path includes buffer control logic. Consequently, by removing switch buffers, switch frequency can be doubled. In this paper, we exploit this idea, proposing a new switching technique for NoCs which requires a reduced amount of storage at the switches. It is based on replacing switch port buffers by single latches. By doing so, network cycle can be reduced, which reduces packet latency. On the other hand, power and area consumption requirements can be reduced. However, since there are no buffers at the switch ports, packets can not be stopped. Stopped packets due to contention are dropped and reinjected from their senders via negative acknowledgments. Packet dropping is strongly reduced by exploiting NoCs wiring capability.",
    	address = "Piscataway, NJ, USA",
    	booktitle = "Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on",
    	journal = "Proceedings of the Fourteenth International Conference on Parallel and Distributed Systems",
    	keywords = "network-on-chip;performance evaluation;",
    	note = "switching technique;reduced buffer requirements;networks on chip;network clock frequency;critical path delay;buffer control logic;network cycle;packet dropping;",
    	pages = "713 - 20",
    	title = "{A}n efficient switching technique for {N}o{C}s with reduced buffer requirements",
    	url = "",
    	year = 2008
  25. Scott Pakin, Craig Stunkel, Jose Flich, Francisco Alfaro, Gheorghe Almasi, Angelos Bilas, Ron Brightwell, Darius Buntinas, Wu-Chun Feng, Mitchell Gusat, Nectarios Koziris, Pedro Lopez, Andrew Lumsdaine, Jarek Nieplocha, Greg Pfister, Jamie Riotto, Vikram Saletore, Evan Speight, Pete Wyckoff, D K Panda, Jose Duato and Mazin Yousif. Workshop 9 Introduction: The Workshop on Communication Architecture for Clusters - CAC 2008. IPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM, pages IEEE Computer Societ, 2008. URL BibTeX

    	author = "Scott Pakin and Craig Stunkel and Flich, Jose and Francisco Alfaro and Gheorghe Almasi and Angelos Bilas and Ron Brightwell and Darius Buntinas and Wu-Chun Feng and Mitchell Gusat and Nectarios Koziris and Lopez, Pedro and Andrew Lumsdaine and Jarek Nieplocha and Greg Pfister and Jamie Riotto and Vikram Saletore and Evan Speight and Pete Wyckoff and D.K. Panda and Duato, Jose and Mazin Yousif",
    	abstract = "No abstract available",
    	address = "Miami, FL, United states",
    	journal = "IPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM",
    	pages = "IEEE Computer Societ",
    	title = "{W}orkshop 9 {I}ntroduction: {T}he {W}orkshop on {C}ommunication {A}rchitecture for {C}lusters - {CAC} 2008",
    	url = "",
    	year = 2008
    	author = "Crispin Gomez Requena and Francisco Gilabert Villamon and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "The fat-tree is one of the most widely-used topologies by interconnection network manufacturers. Recently, it has been demonstrated that a deterministic routing algorithm that optimally balances the network traffic can not only achieve almost the same performance than an adaptive routing algorithm but also outperforms it. On the other hand, fat-trees require a high number of switches with a non-negligible wiring complexity. In this paper, we propose replacing the fat - tree by a unidirectional multistage interconnection network (UMIN) that uses a traffic balancing deterministic routing algorithm. As a consequence, switch hardware is almost reduced to the half, decreasing, in this way, the power consumption, the arbitration complexity, the switch size itself, and the network cost. Preliminary evaluation results show that the UMIN with the load balancing scheme obtains lower latency than fat - tree for low and medium traffic loads. Furthermore, in networks with a high number of stages or with high radix switches, it obtains the same, or even higher, throughput than fat-tree. © 2006 IEEE.",
    	address = "3 Park Avenue, 17th Floor, New York, NY 10016-5997, United States",
    	issn = 15566056,
    	journal = "IEEE Computer Architecture Letters",
    	key = "Computer networks",
    	keywords = "Adaptive algorithms;Interconnection networks;Internet;Metropolitan area networks;Routing algorithms;Switches;Switching circuits;Telecommunication networks;Trees;",
    	note = "Butterfly network;Deterministic routing;Fat-trees;Multistage Interconnection networks;Traffic balancing;",
    	number = 2,
    	pages = "49 - 52",
    	title = "{B}eyond fat - {T}ree: {U}nidirectional load - {B}alanced multistage interconnection network",
    	url = "",
    	volume = 7,
    	year = 2008
  27. Rafael Ubal, Julio Sahuquillo, Salvador Petit and Pedro Lopez. Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors. In Computer Architecture and High Performance Computing, 2007. SBAC-PAD 2007. 19th International Symposium on. 2007, 62 -68. URL, DOI BibTeX

    	author = "Ubal, Rafael and Sahuquillo, Julio and Petit, Salvador and Lopez, Pedro",
    	abstract = "Current microprocessors are based in complex designs, integrating different components on a single chip, such as hardware threads, processor cores, memory hierarchy or interconnection networks. The permanent need of evaluating new designs on each of these components motivates the development of tools which simulate the system working as a whole. In this paper, we present the Multi2Sim simulation framework, which models the major components of incoming systems, and is intended to cover the limitations of existing simulators. A set of simulation examples is also included for illustrative purposes.",
    	booktitle = "Computer Architecture and High Performance Computing, 2007. SBAC-PAD 2007. 19th International Symposium on",
    	doi = "10.1109/SBAC-PAD.2007.17",
    	issn = "1550-6533",
    	keywords = "Multi2Sim;hardware threads;interconnection networks;memory hierarchy;microprocessors;multicore-multithreaded processors;processor cores;multi-threading;multiprocessor interconnection networks;",
    	month = "oct.",
    	pages = "62 -68",
    	title = "{M}ulti2{S}im: {A} {S}imulation {F}ramework to {E}valuate {M}ulticore-{M}ultithreaded {P}rocessors",
    	url = "",
    	year = 2007
  28. Rafael Ubal, Julio Sahuquillo, Salvador Petit, H Hassan and Pedro Lopez. Leakage Current Reduction in Data Caches on Embedded Systems. In Intelligent Pervasive Computing, 2007. IPC. The 2007 International Conference on. 2007, 45 -50. URL, DOI BibTeX

    	author = "Ubal, Rafael and Sahuquillo, Julio and Petit, Salvador and H. Hassan and Lopez, Pedro",
    	abstract = "Nowadays, embedded systems can be found in a wide range of pervasive devices (e.g., smart phones, PDAs, or video/digital cameras). These devices contain large cache memories, whose power consumption can reach about 50% of the total spent energy, from which leakage energy is the predominant fraction in current technologies. This paper proposes a technique to reduce leakage energy consumption in data caches on embedded systems, which is based on the fact that most stored bits take a logical value of zero. The proposed technique has been evaluated on a model of a contemporary high-end embedded microprocessor, namely the ARM Cortex A8 processor, executing a set of standard embedded benchmarks. Experimental results show that leakage energy savings reach about 40% with no IPC loss.",
    	booktitle = "Intelligent Pervasive Computing, 2007. IPC. The 2007 International Conference on",
    	doi = "10.1109/IPC.2007.95",
    	keywords = "ARM Cortex A8 processor;cache memories;data caches;high-end embedded microprocessor;leakage energy consumption reduction;pervasive devices;cache storage;microprocessor chips;power consumption;ubiquitous computing;",
    	month = "oct.",
    	pages = "45 -50",
    	title = "{L}eakage {C}urrent {R}eduction in {D}ata {C}aches on {E}mbedded {S}ystems",
    	url = "",
    	year = 2007
  29. Rafael Ubal, Julio Sahuquillo, Salvador Petit, Pedro Lopez and Jose Duato. VB-MT: Design Issues and Performance of the Validation Buffer Microarchitecture for Multithreaded Processors. In Parallel Architecture and Compilation Techniques, 2007. PACT 2007. 16th International Conference on. 2007, 429 -429. URL, DOI BibTeX

    	author = "Ubal, Rafael and Sahuquillo, Julio and Petit, Salvador and Lopez, Pedro and Duato, Jose",
    	abstract = "The validation buffer (VB) Microarchitecture retires instructions out of order, by substituting the classical ROB by the VB structure. The VB removes the negative effect of long latency instructions located at the ROB head, which prevent other instructions from retiring and cause frequent pipeline stalls due to lack of space in the ROB. This work analyzes different multithreading models (coarse grain, fine grain and simultaneous multithreading) and a set of different instruction fetch policies.",
    	booktitle = "Parallel Architecture and Compilation Techniques, 2007. PACT 2007. 16th International Conference on",
    	doi = "10.1109/PACT.2007.4336257",
    	issn = "1089-795X",
    	keywords = "ROB head;VB structure;instruction fetch policies;multithreaded processors;validation buffer microarchitecture;buffer storage;multi-threading;parallel architectures;storage allocation;",
    	month = "sept.",
    	pages = "429 -429",
    	title = "{VB}-{MT}: {D}esign {I}ssues and {P}erformance of the {V}alidation {B}uffer {M}icroarchitecture for {M}ultithreaded {P}rocessors",
    	url = "",
    	year = 2007
  30. Crispín Gomez, Francisco Gilabert, Maria E Gomez, Pedro Lopez and Jose Duato. Deterministic versus Adaptive Routing in Fat-Trees. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International. March 2007, 1 -8. URL, DOI BibTeX

    	author = "Gomez, Crisp{\'i}n and Gilabert, Francisco and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Clusters of PCs have become very popular to build high performance computers. These machines use commodity PCs linked by a high speed interconnect. Routing is one of the most important design issues of interconnection networks. Adaptive routing usually better balances network traffic, thus allowing the network to obtain a higher throughput. However, adaptive routing introduces out-of-order packet delivery, which is unacceptable for some applications. Concerning topology, most of the commercially available interconnects are based on fat-tree. Fat-trees offer a rich connectivity among nodes, making possible to obtain paths between all source-destination pairs that do not share any link. We exploit this idea to propose a deterministic routing algorithm for fat-trees, comparing it with adaptive routing in several workloads. The results show that deterministic routing can achieve a similar, and in some scenarios higher, level of performance than adaptive routing, while providing in-order packet delivery.",
    	booktitle = "Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International",
    	doi = "10.1109/IPDPS.2007.370482",
    	isbn = "1-4244-0910-1",
    	keywords = "PC clusters;adaptive routing;deterministic routing algorithm;fat-tree topology;interconnection networks;packet delivery;multistage interconnection networks;telecommunication network routing;telecommunication network topology;telecommunication traffic;tree",
    	month = "march",
    	pages = "1 -8",
    	title = "{D}eterministic versus {A}daptive {R}outing in {F}at-{T}rees",
    	url = "",
    	year = 2007
    	author = "Gomez, Crisp{\'i}n and Gilabert, Francisco and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Clusters of PCs have become very popular to build high performance computers. These machines use commodity PCs linked by a high speed interconnect. Routing is one of the most important design issues of interconnection networks. Adaptive routing usually better balances network traffic, thus allowing the network to obtain a higher throughput. However, adaptive routing introduces out-of-order packet delivery, which is unacceptable for some applications. Concerning topology, most of the commercially available interconnects are based on fat-tree. Fat-trees offer a rich connectivity among nodes, making possible to obtain paths between all source-destination pairs that do not share any link. We exploit this idea to propose a deterministic routing algorithm for fat-trees, comparing it with adaptive routing in several workloads. The results show that deterministic routing can achieve a similar, and in some scenarios higher, level of performance than adaptive routing, while providing in-order packet delivery.",
    	booktitle = "Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International",
    	doi = "10.1109/IPDPS.2007.370482",
    	isbn = "1-4244-0910-1",
    	journal = "2007 IEEE International Parallel and Distributed Processing Symposium (IEEE Cat. No.07TH8938)",
    	keywords = "multistage interconnection networks;telecommunication network routing;telecommunication network topology;telecommunication traffic;trees;",
    	month = "Mar.",
    	note = "adaptive routing;fat-tree topology;PC clusters;interconnection networks;packet delivery;deterministic routing algorithm;",
    	pages = "8 pp. -",
    	publisher = "IEEE Computer Society",
    	title = "{D}eterministic versus adaptive routing in fat-trees",
    	url = "",
    	year = 2007
  32. Marina Alonso, Salvador Coll, Vicente Santonja, Juan Miguel Martínez, Pedro Lopez and Jose Duato. Power-aware fat-tree networks using on/off links. In R Perrott, BM Chapman, J Subhlok, RF DeMello and LT Yang (eds.). HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS 4782. 2007, 472-483. BibTeX

    	author = "Alonso, Marina and Coll, Salvador and Santonja, Vicente and Mart{\'i}nez, Juan Miguel and Lopez, Pedro and Duato, Jose",
    	abstract = "Nowadays, power consumption reduction techniques are being increasingly used in computer systems, and high-performance computing systems are not an exception. In particular, the power consumed by the interconnect circuitry has a non-negligible contribution to the total system budget. In this scenario, fat-tree interconnection networks are one of the most popular topologies. This topology is particularly well-suited for applying power consumption reduction techniques since it provides multiple alternative paths for each source/destination pair. In this paper, we present a mechanism that dynamically adjusts the available network bandwidth by switching links on and off, according to the traffic requirements. This mechanism provides significant reduction in power consumption while maintaining the original underlying routing algorithm, at the expense of slight latency increase for low loads.",
    	editor = "Perrott, R and Chapman, BM and Subhlok, J and DeMello, RF and Yang, LT",
    	isbn = "978-3-540-75443-5",
    	issn = "0302-9743",
    	note = "3rd International Conference on High Performance Computing and Communications (HPCC 2007), Houston, TX, SEP 26-28, 2007",
    	pages = "472-483",
    	title = "{P}ower-aware fat-tree networks using on/off links",
    	volume = 4782,
    	year = 2007
  33. Joan-Lluis Ferrer, Elvira Baydal, Antonio Robles, Pedro Lopez and Jose Duato. Congestion management in MINs through marked validated packets. 2007, 260 - 7. BibTeX

    	author = "Ferrer, Joan-Lluis and Baydal, Elvira and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = {Congestion management is a very critical problem tackled in interconnection networks for years but not solved yet. Although several mechanisms have been recently proposed for lossless multistage interconnection networks (MINs), they either have drawbacks or are partial solutions. Some of them introduce penalty over packets not really addressed to the hot-spots, whereas others can cope only with congestion situations that last a short time. In this paper, we propose an effective and efficient congestion management mechanism for lossless interconnection networks based on explicit congestion notification. The mechanism uses two different flags in ACK packets, a Marking Bit (MB) and a Validation Bit (VB), to detect congestion and warn the origin hosts. In this way, packets belonging to "coldflows" but stopped because of head-of-line (HOL) blocking can be distinguished from "hotflow" packets which are really causing congestion. In response, origin hosts can apply corrective actions only to the "hotflows", minimizing the negative impact on "coldflows"performance. Evaluation results show that the proposed congestion management strategy is able to avoid the degradation of network performance, regardless of traffic load and the location of the congestion in the network.},
    	address = "Piscataway, NJ, USA",
    	journal = "15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07)",
    	keywords = "multistage interconnection networks;",
    	note = "congestion management;lossless multistage interconnection network;validated packet;marked packet;ACK packet;head-of-line blocking;marking bit;validation bit;",
    	pages = "260 - 7",
    	title = "{C}ongestion management in {MIN}s through marked validated packets",
    	year = 2007
  34. Jose Flich, , Pedro Lopez and Jose Duato. Region-Based Routing: An Efficient Routing Mechanism to Tackle Unreliable Hardware in Network on Chips. In Networks-on-Chip, 2007. NOCS 2007. First International Symposium on. 2007, 183 -194. URL, DOI BibTeX

    	author = "Flich, Jose and , and Lopez, Pedro and Duato, Jose",
    	abstract = "The design of scalable and reliable interconnection networks for system on chips (SoCs) introduce new design constraints not present in current multicomputer systems. Although regular topologies are preferred for building NoCs, heterogeneous blocks, fabrication faults and reliability issues derived from the high integration scale may lead to irregular topologies. In this situation, efficient routing becomes a challenge. Although table-based routing allows the use of most routing algorithms on any topology, it does not scale in terms of latency and area. In this paper we propose the region-based routing mechanism that avoids the scalability problems of table-based solutions. From an initial topology and routing algorithm, the mechanism groups, at every switch, destinations into different regions based on the output ports. By doing this, redundant routing information typically found in routing tables is eliminated. Evaluation results show that the mechanism requires only four regions to support several routing algorithms in a 2D mesh with no performance degradation. Moreover, when dealing with link failures, our results indicate that the mechanism combined with the segment-based routing algorithm is able to pack all the routing information into eight regions providing high throughput. The paper provides also a simple and efficient hardware implementation of the mechanism requiring only 240 logic gates per switch to support eight regions in a 2D mesh topology",
    	booktitle = "Networks-on-Chip, 2007. NOCS 2007. First International Symposium on",
    	doi = "10.1109/NOCS.2007.39",
    	keywords = "2D mesh topology;interconnection networks;multicomputer systems;network on chips;region-based routing;segment-based routing algorithm;system on chips;table-based routing;integrated circuit interconnections;logic design;microprocessor chips;network routing",
    	month = "7-9",
    	pages = "183 -194",
    	title = "{R}egion-{B}ased {R}outing: {A}n {E}fficient {R}outing {M}echanism to {T}ackle {U}nreliable {H}ardware in {N}etwork on {C}hips",
    	url = "",
    	year = 2007
  35. Crispín Gomez, Maria E Gomez, Pedro Lopez and Jose Duato. An efficient fault-tolerant routing methodology for fat-tree interconnection networks*. 2007, 509 - 22. BibTeX

    	author = "Gomez, Crisp{\'i}n and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "In large cluster-based machines, fault-tolerance in the interconnection network is an issue of growing importance, since their increasing size rises the probability of failure. The topology used in these machines is usually a fat-tree. This paper proposes a new distributed fault-tolerant routing methodology for fat-trees. It does not require additional network hardware. It is scalable, since the required memory, switch hardware and routing delay do not depend on the net work size. The methodology is based on enhancing the interval routing scheme with exclusion intervals. Exclusion intervals are associated to each switch output port, and represent the set of nodes that are unreachable from this port after a failure appears. We propose a mechanism to identify the exclusion intervals that must be updated after detecting a failure, and the values to write on them. Our methodology is able to support a relatively high number of network failures with a low degradation in network performance.",
    	address = "Berlin, Germany",
    	journal = "Parallel and Distributed Processing and Applications. Proceedings 5th International Symposium, ISPA 2007. (Lecture Notes in Computer Science vol. 4742)",
    	keywords = "failure analysis;fault tolerant computing;multiprocessor interconnection networks;network routing;network topology;probability;trees;",
    	note = "distributed fault-tolerant routing methodology;fat-tree interconnection networks;large cluster-based machines;failure probability;interval routing scheme;switch output port;",
    	pages = "509 - 22",
    	title = "{A}n efficient fault-tolerant routing methodology for fat-tree interconnection networks*",
    	year = 2007
  36. Maria E Gomez, N A Nordbotten, Jose Flich, Pedro Lopez, Antonio Robles, Jose Duato, T Skeie and O Lysne. A routing methodology for achieving fault tolerance in direct networks. Computers, IEEE Transactions on 55(4):400 - 415, April 2006. URL, DOI BibTeX

    	author = "Gomez, Maria E. and N.A. Nordbotten and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose and T. Skeie and O. Lysne",
    	abstract = "Massively parallel computing systems are being built with thousands of nodes. The nterconnection network plays a key role for the performance of such systems. However, the high number of components significantly increases the probability of failure. Additionally, failures in the interconnection network may isolate a large fraction of the machine. It is therefore critical to provide an efficient fault-tolerant mechanism to keep the system running, even in the presence of faults. This paper presents a new fault-tolerant routing methodology that does not degrade performance in the absence of faults and tolerates a reasonably large number of faults without disabling any healthy node. In order to avoid faults, for some source-destination pairs, packets are first sent to an intermediate node and then from this node to the destination node. Fully adaptive routing is used along both subpaths. The methodology assumes a static fault model and the use of a checkpoint/restart mechanism. However, there are scenarios where the faults cannot be avoided solely by using an intermediate node. Thus, we also provide some extensions to the methodology. Specifically, we propose disabling adaptive routing and/or using misrouting on a per-packet basis. We also propose the use of more than one intermediate node for some paths. The proposed fault-tolerant routing methodology is extensively evaluated in terms of fault tolerance, complexity, and performance.",
    	doi = "10.1109/TC.2006.46",
    	issn = "0018-9340",
    	journal = "Computers, IEEE Transactions on",
    	keywords = "adaptive routing; checkpoint-restart mechanism; direct networks; fault-tolerant routing methodology; interconnection network; parallel computing system; fault tolerant computing; multiprocessor interconnection networks; network routing; parallel processi",
    	month = "april",
    	number = 4,
    	pages = "400 - 415",
    	title = "{A} routing methodology for achieving fault tolerance in direct networks",
    	url = "",
    	volume = 55,
    	year = 2006
  37. Marina Alonso, Salvador Coll, Jose Maria Martínez, Vicente Santonja, Pedro Lopez and Jose Duato. Dynamic power saving in fat-tree interconnection networks using on/off links. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International. April 2006, 8 pp.. URL, DOI BibTeX

    	author = "Alonso, Marina and Coll, Salvador and Mart{\'i}nez, Jose Maria and Santonja, Vicente and Lopez, Pedro and Duato, Jose",
    	abstract = "Current trends in high-performance parallel computers show that fat-tree interconnection networks are one of the most popular topologies. The particular characteristics of this topology, that provide multiple alternative paths for each source/destination pair, make it an excellent candidate for applying power consumption reduction techniques. Such techniques are being increasingly applied in computer systems and the interconnection network is not an exception, since its contribution to the system power budget is not negligible. In this paper, we present a mechanism that dynamically switches on and off network links as a function of traffic. The mechanism is designed to guarantee network connectivity, according to the underlying routing algorithm. In this way, the default routing algorithm can be used regardless of the power saving actions taken, thus simplifying router design. Our simulation results show that significant network power consumption reductions can be obtained at no cost. Latency remains the same although the number of operating network links is dynamically adjusted.",
    	booktitle = "Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International",
    	doi = "10.1109/IPDPS.2006.1639599",
    	isbn = "0-7695-0990-8",
    	keywords = "dynamic power saving; fat-tree interconnection networks; high-performance parallel computers; network power consumption reduction; on-off links; routing algorithm; energy conservation; multiprocessor interconnection networks; parallel processing;",
    	month = "april",
    	pages = "8 pp.",
    	title = "{D}ynamic power saving in fat-tree interconnection networks using on/off links",
    	url = "",
    	year = 2006
    	author = "Alonso, Marina and Coll, Salvador and Mart{\'i}nez, Juan Miguel and Santonja, Vicente and Lopez, Pedro and Duato, Jose",
    	abstract = "Current trends in high-performance parallel computers show that fat-tree interconnection networks are one of the most popular topologies. The particular characteristics of this topology, that provide multiple alternative paths for each source/destination pair, make it an excellent candidate for applying power consumption reduction techniques. Such techniques are being increasingly applied in computer systems and the interconnection network is not an exception, since its contribution to the system power budget is not negligible. In this paper, we present a mechanism that dynamically switches on and off network links as a function of traffic. The mechanism is designed to guarantee network connectivity, according to the underlying routing algorithm. In this way, the default routing algorithm can be used regardless of the power saving actions taken, thus simplifying router design. Our simulation results show that significant network power consumption reductions can be obtained at no cost. Latency remains the same although the number of operating network links is dynamically adjusted",
    	booktitle = "Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International",
    	doi = "10.1109/IPDPS.2006.1639599",
    	isbn = "1-4244-0054-6",
    	journal = "Proceedings. 20th International Parallel and Distributed Processing Symposium (IEEE Cat. No.06TH8860)",
    	keywords = "energy conservation;multiprocessor interconnection networks;parallel processing;trees;",
    	month = "Apr.",
    	note = "dynamic power saving;fat-tree interconnection networks;on/off links;high-performance parallel computers;routing algorithm;network power consumption reduction;",
    	pages = "8 pp. -",
    	publisher = "IEEE Computer Society",
    	title = "{D}ynamic power saving in fat-tree interconnection networks using on/off links",
    	url = "",
    	year = 2006
  39. Gaspar Mora, Jose Flich, Jose Duato, Pedro Lopez, Elvira Baydal and O Lysne. Towards an efficient switch architecture for high-radix switches. 2006, 11 - 20. URL, DOI BibTeX

    	author = "Mora, Gaspar and Flich, Jose and Duato, Jose and Lopez, Pedro and Baydal, Elvira and O. Lysne",
    	abstract = "The interconnection network plays a key role in the overall performance achieved by high performance computing systems, also contributing an increasing fraction of its cost and power consumption. Current trends in interconnection network technology suggest that high-radix switches will be preferred as networks will become smaller (in terms of switch count) with the associated savings in packet latency, cost, and power consumption. Unfortunately, current switch architectures have scalability problems that prevent them from being effective when implemented with a high number of ports. In this paper, an efficient and cost-effective architecture for high-radix switches is proposed. The architecture, referred to as partitioned crossbar input queued (PCIQ), relies on three key components: a partitioned crossbar organization that allows the use of simple arbiters and crossbars, a packet-based arbiter, and a mechanism to eliminate the switch-level HOL blocking. Under uniform traffic, maximum switch efficiency is achieved. Furthermore, switch-level HOL blocking is completely eliminated under hot-spot traffic, again delivering maximum throughput. Additionally, PCIQ inherently implements an efficient congestion management technique that eliminates all the network-wide HOL blocking. On the contrary, the previously proposed architectures either show poor performance or they require significantly higher costs than PCIQ (in both components and complexity).",
    	address = "Piscataway, NJ, USA",
    	doi = "10.1109/ANCS.2006.4579519",
    	journal = "ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS 2006)",
    	keywords = "multistage interconnection networks;",
    	note = "high-radix switch architecture;interconnection network;power consumption;partitioned crossbar input queued;switch-level head-of-line block elimination;congestion management technique;",
    	pages = "11 - 20",
    	title = "{T}owards an efficient switch architecture for high-radix switches",
    	url = "",
    	year = 2006
  40. Francisco Gilabert, Maria E Gomez, Pedro Lopez and Jose Duato. On the influence of the selection function on the performance of fat-trees. 2006, 864 - 73. BibTeX

    	author = "Gilabert, Francisco and Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Fat-tree topology has become very popular among switch manufacturers. Routing in fat-trees is composed of two phases, an adaptive upwards phase, and a deterministic downwards phase. The unique downwards path to the destination depends on the switch that has been reached in the upwards phase. As adaptive routing is used in the ascending phase, several output ports are possible at each switch and the final choice depends on the selection function. The impact of the selection function on performance has been previously studied for direct networks and has not resulted to be very important. In fat-trees, the decisions made in the upwards phase by the selection function can be critical, since it determines the switch reached in the upwards phase, and therefore the unique downwards path to the destination. In this paper, we analyze the effect of the selection function on fat-trees. Several selection functions are defined, compared and evaluated. The evaluation shows that selection function has a great impact on fat-trees",
    	address = "Berlin, Germany",
    	journal = "Euro-Par 2006 Parallel Processing. 12th International Euro-Par Conference. Proceedings (Lecture Notes in Computer Science Vol. 4128)",
    	keywords = "telecommunication network routing;telecommunication network topology;telecommunication switching;trees;",
    	note = "selection function;fat-trees;adaptive routing;interconnection networks;",
    	pages = "864 - 73",
    	title = "{O}n the influence of the selection function on the performance of fat-trees",
    	year = 2006
  41. Maria E Gomez, Pedro Lopez and Jose Duato. FIR: An efficient routing strategy for tori and meshes. Journal of Parallel and Distributed Computing 66(7):907 - 21, 2006. URL, DOI BibTeX

    	author = "Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Recent massively parallel computers are based on clusters of PCs. These machines use one of the recently proposed standard interconnects. These interconnects either use source routing or distributed routing based on forwarding tables. While source routers are simpler, distributed routers provides more flexibility allowing the network to achieve a higher performance. Distributed routing can be implemented by a fixed hardware specific to a routing function on a given topology or by using forwarding tables. The main problem of this approach is the lack of scalability of forwarding tables. In this paper, we propose a distributed routing strategy for commercial switches, flexible interval routing, that is scalable, both in memory and routing time because it is not based on tables. At the same time, the strategy is easy to reconfigure, being able to implement the most commonly used routing algorithms in the most widely used regular topologies. [All rights reserved Elsevier]",
    	address = "USA",
    	doi = "10.1016/j.jpdc.2005.12.012",
    	issn = "0743-7315",
    	journal = "Journal of Parallel and Distributed Computing",
    	keywords = "multiprocessor interconnection networks;telecommunication network routing;workstation clusters;",
    	note = "FIR;flexible interval routing;network routing;PC clusters;network topology;",
    	number = 7,
    	pages = "907 - 21",
    	title = "{FIR}: {A}n efficient routing strategy for tori and meshes",
    	url = "",
    	volume = 66,
    	year = 2006
    	author = "Gomez, Maria E. and N.A. Nordbotten and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose and T. Skeie and O. Lysne",
    	abstract = "Massively parallel computing systems are being built with thousands of nodes. The interconnection network plays a key role for the performance of such systems. However, the high number of components significantly increases the probability of failure. Additionally, failures in the interconnection network may isolate a large fraction of the machine. It is therefore critical to provide an efficient fault-tolerant mechanism to keep the system running, even in the presence of faults. This paper presents a new fault-tolerant routing methodology that does not degrade performance in the absence of faults and tolerates a reasonably large number of faults without disabling any healthy node. In order to avoid faults, for some source-destination pairs, packets are first sent to an intermediate node and then from this node to the destination node. Fully adaptive routing is used along both subpaths. The methodology assumes a static fault model and the use of a checkpoint/restart mechanism. However, there are scenarios where the faults cannot be avoided solely by using an intermediate node. Thus, we also provide some extensions to the methodology. Specifically, we propose disabling adaptive routing and/or using misrouting on a per-packet basis. We also propose the use of more than one intermediate node for some paths. The proposed fault-tolerant routing methodology is extensively evaluated in terms of fault tolerance, complexity, and performance",
    	address = "USA",
    	doi = "10.1109/TC.2006.46",
    	issn = "0018-9340",
    	journal = "IEEE Transactions on Computers",
    	keywords = "fault tolerant computing;multiprocessor interconnection networks;network routing;parallel processing;",
    	note = "direct networks;parallel computing system;interconnection network;fault-tolerant routing methodology;adaptive routing;checkpoint-restart mechanism;",
    	number = 4,
    	pages = "400 - 15",
    	title = "{A} routing methodology for achieving fault tolerance in direct networks",
    	url = "",
    	volume = 55,
    	year = 2006
  43. Elvira Baydal, Pedro Lopez and Jose Duato. A family of mechanisms for congestion control in wormhole networks. Parallel and Distributed Systems, IEEE Transactions on 16(9):772 - 784, 2005. URL, DOI BibTeX

    	author = "Baydal, Elvira and Lopez, Pedro and Duato, Jose",
    	abstract = "Multiprocessor interconnection networks may reach congestion with high traffic loads, which prevents reaching the wished performance. Unfortunately, many of the mechanisms proposed in the literature for congestion control either suffer from a lack of robustness, being unable to work properly with different traffic patterns or message lengths, or detect congestion relying on global information that wastes some network bandwidth. This paper presents a family of mechanisms to avoid network congestion in wormhole networks. All of them need only local information, applying message throttling when it is required. The proposed mechanisms use different strategies to detect network congestion and also apply different corrective actions. The mechanisms are evaluated and compared for several network loads and topologies, noticeably improving network performance with high loads but without penalizing network behavior for low and medium traffic rates, where no congestion control is required.",
    	doi = "10.1109/TPDS.2005.102",
    	issn = "1045-9219",
    	journal = "Parallel and Distributed Systems, IEEE Transactions on",
    	keywords = "message throttling; multiprocessor interconnection network; network bandwidth; network congestion control; traffic load; wormhole network; wormhole switching; multiprocessor interconnection networks; telecommunication congestion control; telecommunicatio",
    	month = "sept.",
    	number = 9,
    	pages = "772 - 784",
    	title = "{A} family of mechanisms for congestion control in wormhole networks",
    	url = "",
    	volume = 16,
    	year = 2005
  44. Maria E Gomez, Pedro Lopez and Jose Duato. A Memory-Effective Fault-Tolerant Routing Strategy for Direct Interconnection Networks. In Parallel and Distributed Computing, 2005. ISPDC 2005. The 4th International Symposium on. July 2005, 341 -348. URL, DOI BibTeX

    	author = "Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "High-performance interconnection networks are crucial in massively parallel computers. Routing is one of the most important design issues of interconnection networks. Moreover, the huge amount of hardware of these machines makes fault-tolerance another important design issue. In this paper, we propose a mechanism that combines scalable routing and fault-tolerance for commercial switches to build direct regular topologies, which are the topologies used in large machines. The hardware required is not complex. Furthermore, it allows a high degree of fault-tolerance inflicting a minimal decrease of performance",
    	booktitle = "Parallel and Distributed Computing, 2005. ISPDC 2005. The 4th International Symposium on",
    	doi = "10.1109/ISPDC.2005.6",
    	keywords = "adaptive routing;direct interconnection networks;distributed routing;memory-effective fault-tolerant routing;fault tolerance;multiprocessor interconnection networks;telecommunication network reliability;telecommunication network routing;",
    	month = "july",
    	pages = "341 -348",
    	title = "{A} {M}emory-{E}ffective {F}ault-{T}olerant {R}outing {S}trategy for {D}irect {I}nterconnection {N}etworks",
    	url = "",
    	year = 2005
  45. Marina Alonso, Juan Miguel Martínez, Vicente Santonja, Pedro Lopez and Jose Duato. Power Saving in Regular Interconnection Networks Built with High-Degree Switches. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International. April 2005, 5b - 5b. URL, DOI BibTeX

    	author = "Alonso, Marina and Mart{\'i}nez, Juan Miguel and Santonja, Vicente and Lopez, Pedro and Duato, Jose",
    	abstract = "Nowadays, high-degree switches are available as building blocks of the interconnection network of clusters of PCs. An alternative to take advantage of the high number of switch ports is to connect every pair of switches through not only one but several links (this is known as link trunking in other environments). This extra connectivity can be exploited by using adaptive routing algorithms, thus improving network throughput and reducing network congestion. However with low traffic loads, all the links that compose the trunk link will not be utilized, but this idle links continue consuming power. Power consumption reduction techniques are being applied everywhere in computer systems and the interconnection network is not an exception, as its contribution is not negligible. In this paper, we present a mechanism that dynamically switches on and off network links as a function of traffic. It is specially targeted to those networks where trunk links are used. The mechanism can switch off any link, provided that network connectivity is guaranteed, (i.e. every pair of switches should be connected through at least one active link). Indeed, this restriction makes possible to use the same routing algorithm regardless the power saving actions taken, thus simplifying router design. Our simulation results show that the network power consumption can be greatly reduced, at the expense of some increase in latency. Nevertheless, it is shown that the power reduction is always higher that this latency increase.",
    	booktitle = "Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International",
    	doi = "10.1109/IPDPS.2005.349",
    	isbn = "0-7695-2312-9",
    	keywords = "PC clusters; adaptive routing algorithm; high-degree switch; link trunking; network congestion; network link; network throughput; power consumption; power saving; regular interconnection network; telecommunication traffic; power consumption; telecommunic",
    	month = "april",
    	pages = "5b - 5b",
    	title = "{P}ower {S}aving in {R}egular {I}nterconnection {N}etworks {B}uilt with {H}igh-{D}egree {S}witches",
    	url = "",
    	year = 2005
  46. Maria E Gomez, Pedro Lopez and Jose Duato. A Memory-Effective Routing Strategy for Regular Interconnection Networks. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International. April 2005, 41b - 41b. URL, DOI BibTeX

    	author = "Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Massively parallel computing systems have been or are being built with thousands of nodes. In such systems, high-performance interconnection networks are crucial to achieve the maximum performance. Routing is one of the most important design issues of interconnection networks. Routing strategies can be mainly classified as source and distributed routing. Source routing has been used in some networks because routers are very simple. On the other hand, distributed routing allows more flexibility, but the routers are more complex. Distributed routing can be implemented by a fixed hardware specific to a routing function on a given topology, or by using forwarding tables that are very flexible but suffer from a lack of scalability. In this paper, we propose a distributed routing strategy for commercial switches, Flexible Interval Routing, that is scalable for the most widely used regular topologies (tori and meshes) because it is not based on tables. At the same time, the strategy is easy to reconfigure to deal with changes in the topology or in the routing algorithm for a given topology, being able to implement the most commonly-used routing algorithms in regular topologies.",
    	booktitle = "Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International",
    	doi = "10.1109/IPDPS.2005.44",
    	keywords = "distributed routing; flexible interval routing; high-performance interconnection networks; memory-effective routing strategy; parallel computing system; multiprocessor interconnection networks; network routing; parallel machines; performance evaluation;",
    	month = "april",
    	pages = "41b - 41b",
    	title = "{A} {M}emory-{E}ffective {R}outing {S}trategy for {R}egular {I}nterconnection {N}etworks",
    	url = "",
    	year = 2005
    	author = "Baydal, Elvira and Lopez, Pedro and Duato, Jose",
    	abstract = "Multiprocessor interconnection networks may reach congestion with high traffic loads, which prevents reaching the wished performance. Unfortunately, many of the mechanisms proposed in the literature for congestion control either suffer from a lack of robustness, being unable to work properly with different traffic patterns or message lengths, or detect congestion relying on global information that wastes some network bandwidth. This paper presents a family of mechanisms to avoid network congestion in wormhole networks. All of them need only local information, applying message throttling when it is required. The proposed mechanisms use different strategies to detect network congestion and also apply different corrective actions. The mechanisms are evaluated and compared for several network loads and topologies, noticeably improving network performance with high loads but without penalizing network behavior for low and medium traffic rates, where no congestion control is required",
    	address = "USA",
    	issn = "1045-9219",
    	journal = "IEEE Transactions on Parallel and Distributed Systems",
    	keywords = "multiprocessor interconnection networks;telecommunication congestion control;telecommunication network routing;telecommunication network topology;telecommunication switching;telecommunication traffic;",
    	note = "multiprocessor interconnection network;traffic load;network congestion control;network bandwidth;wormhole network;message throttling;wormhole switching;",
    	number = 9,
    	pages = "772 - 84",
    	title = "{A} family of mechanisms for congestion control in wormhole networks",
    	url = "",
    	volume = 16,
    	year = 2005
    	author = "Alonso, Marina and Mart{\'i}nez, Juan Miguel and Santonja, Vicente and Lopez, Pedro and Duato, Jose",
    	abstract = "Nowadays, high-degree switches are available as building blocks of the interconnection network of clusters of PCs. An alternative to take advantage of the high number of switch ports is to connect every pair of switches through not only one but also several links (this is known as link trunking in other environments). This extra connectivity can be exploited by using adaptive routing algorithms, thus improving network throughput and reducing network congestion. However with low traffic loads, all the links that compose the trunk link will not be utilized, but this idle links continue consuming power. Power consumption reduction techniques are being applied everywhere in computer systems and the interconnection network is not an exception, as its contribution is not negligible. In this paper, we present a mechanism that dynamically switches on and off network links as a function of traffic. It is specially targeted to those networks where trunk links are used. The mechanism can switch off any link, provided that network connectivity is guaranteed, (i.e. every pair of switches should be connected through at least one active link). Indeed, this restriction makes possible to use the same routing algorithm regardless the power saving actions taken, thus simplifying router design. Our simulation results show that the network power consumption can be greatly reduced, at the expense of some increase in latency. Nevertheless, it is shown that the power reduction is always higher that this latency increases",
    	address = "Los Alamitos, CA, USA",
    	journal = "Proceedings. 19th IEEE International Parallel and Distributed Processing Symposium",
    	keywords = "power consumption;telecommunication congestion control;telecommunication links;telecommunication network routing;telecommunication switching;telecommunication traffic;workstation clusters;",
    	note = "power saving;regular interconnection network;high-degree switch;PC clusters;link trunking;adaptive routing algorithm;network throughput;network congestion;power consumption;telecommunication traffic;network link;",
    	pages = "10 pp. -",
    	title = "{P}ower saving in regular interconnection networks built with high-degree switches",
    	year = 2005
  49. Michihiro Koibuchi, Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Enforcing in-order packet delivery in system area networks with adaptive routing. Journal of Parallel and Distributed Computing 65(10):1223 - 1236, 2005. URL BibTeX

    	author = "Michihiro Koibuchi and Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "Adaptive routing, which dynamically selects the route of packets, has been widely studied for interconnection networks in massively parallel computers and system area networks. Although adaptive routing has the advantage of providing high bandwidth, it may deliver packets out-of-order, which some message passing libraries do not accept. In this paper, we propose two mechanisms called (1) FIFO transmission and (2) couple limitation to guarantee in-order packet delivery in adaptive routing. Both of them limit packet injection at source hosts. The FIFO transmission completely avoids packet sorting at destination hosts, while the couple limitation uses a few buffers to sort packets at destination hosts. Evaluation results show that the FIFO transmission and the couple limitation achieve a similar throughput to that of a method equipped with huge (infinite) buffers enough to store all out-of-order packets at destination hosts under both synthetic traffic and NAS Parallel Benchmarks. © 2005 Elsevier Inc. All rights reserved.",
    	issn = 07437315,
    	journal = "Journal of Parallel and Distributed Computing",
    	key = "Packet networks",
    	keywords = "Bandwidth;Benchmarking;Interconnection networks;Routers;Telecommunication traffic;",
    	note = "Adaptive routing;In-order packet delivery;PC clusters;System area networks;",
    	number = 10,
    	pages = "1223 - 1236",
    	title = "{E}nforcing in-order packet delivery in system area networks with adaptive routing",
    	url = "",
    	volume = 65,
    	year = 2005
  50. Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez, Jose Duato and M Koibuchi. In-Order Packet Delivery in Interconnection Networks using Adaptive Routing. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International. 2005, 101 - 101. DOI BibTeX

    	author = "Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose and M. Koibuchi",
    	abstract = "Most commercial switch-based network technologies for PC clusters use deterministic routing. Alternatively, adaptive routing could be used to improve network performance. In this case, switches decide the path to reach the destination by using local information about the state of the possible outgoing links. However, there are two drawbacks that discourage adaptive routing from being applied to commercial interconnects. The first one concerns the possible switch complexity increase with respect to deterministic routing. The second drawback is due to the fact that adaptive routing may introduce out-of-order packet delivery, which is not acceptable for some applications. For the best of our knowledge, there are no works that analyze the degree of out-of-order packet delivery caused by different network and traffic conditions. In this paper, we take on such a challenge. We show that only for high traffic conditions (reaching saturation) out-of-order delivery is introduced. Moreover, by using small buffers and simple sorting mechanisms at destination, we show that high network throughput can be obtained at the same time packets are delivered in order. Thus, the paper demonstrates that it is possible to use adaptive routing, while still guaranteeing in-order packet delivery, without using large buffer resources nor degrading significantly its performance.",
    	booktitle = "Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International",
    	doi = "10.1109/IPDPS.2005.255",
    	keywords = "PC clusters; adaptive routing; deterministic routing; interconnection networks; out-of-order packet delivery; sorting mechanisms; switch-based network technologies; multiprocessor interconnection networks; network routing; packet switching; sorting; work",
    	month = "04-08",
    	pages = "101 - 101",
    	title = "{I}n-{O}rder {P}acket {D}elivery in {I}nterconnection {N}etworks using {A}daptive {R}outing",
    	year = 2005
    	author = "Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "Massively parallel computing systems are being built with thousands of nodes. In such systems, high-performance inter-connection networks are crucial to achieve the maximum performance. Routing is one of the most important design issues of interconnection networks. Routing strategies can be mainly classified as source and distributed routing. Source routing has been used in some networks because routers are very simple. On the other hand, distributed routing allows more flexibility, but the routers are more complex. Distributed routing can be implemented by a fixed hardware specific to a routing function on a given topology, or by using forwarding tables that are very flexible but suffer from a lack of scalability. In this paper, we propose a distributed routing strategy for commercial switches, Flexible Interval Routing, that is scalable for the most widely used regular topologies (tori and meshes) because it is not based on tables. At the same time, the strategy is easy to reconfigure to deal with changes in the topology or in the routing algorithm for a given topology, being able to implement the most commonly-used routing algorithms in regular topologies.",
    	address = "Denver, CO, United states",
    	journal = "Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium",
    	key = "Interconnection networks",
    	keywords = "Algorithms;Computer hardware;Data storage equipment;Parallel processing systems;Routers;Switches;Topology;",
    	note = "Distributed routing;Routing algorithms;Routing strategies;Source routing;",
    	pages = "41 -",
    	title = "{A} memory-effective routing strategy for regular interconnection networks",
    	year = 2005
    	author = "Gomez, Maria E. and Lopez, Pedro and Duato, Jose",
    	abstract = "High-performance interconnection networks are crucial in massively parallel computers. Routing is one of the most important design issues of interconnection networks. Moreover, the huge amount of hardware of these machines makes fault-tolerance another important design issue. In this paper, we propose a mechanism that combines scalable routing and fault-tolerance for commercial switches to build direct regular topologies, which are the topologies used in large machines. The hardware required is not complex. Furthermore, it allows a high degree of fault-tolerance inflicting a minimal decrease of performance",
    	address = "Los Alamitos, CA, USA",
    	journal = "ISPDC 2005. The 4th International Workshop on Parallel and Distributed Computing",
    	keywords = "fault tolerance;multiprocessor interconnection networks;telecommunication network reliability;telecommunication network routing;",
    	note = "memory-effective fault-tolerant routing;direct interconnection networks;distributed routing;adaptive routing;",
    	pages = "341 - 8",
    	title = "{A} memory-effective fault-tolerant routing strategy for direct interconnection networks",
    	year = 2005
  53. Maria E Gomez, Jose Flich, Pedro Lopez, Antonio Robles, Jose Duato, N A Nordbotten, O Lysne and T Skeie. An effective fault-tolerant routing methodology for direct networks. In Parallel Processing, 2004. ICPP 2004. International Conference on. 2004, 222 - 231 vol.1. URL, DOI BibTeX

    	author = "Gomez, Maria E. and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose and N.A. Nordbotten and O. Lysne and T. Skeie",
    	abstract = "Current massively parallel computing systems are being built with thousands of nodes, which significantly affect the probability of failure. M. E. Gomez proposed a methodology to design fault-tolerant routing algorithms for direct interconnection networks. The methodology uses a simple mechanism: for some source-destination pairs, packets are first forwarded to an intermediate node, and later, from this node to the destination node. Minimal adaptive routing is used along both subpaths. For those cases where the methodology cannot find a suitable intermediate node, it combines the use of intermediate nodes with two additional mechanisms: disabling adaptive routing and using misrouting on a per-packet basis. While the combination of these three mechanisms tolerates a large number of faults, each one requires adding some hardware support in the network and also introduces some overhead. In this paper, we perform an in-depth detailed analysis of the impact of these mechanisms on network behaviour. We analyze the impact of the three mechanisms separately and combined. The ultimate goal of this paper is to obtain a suitable combination of mechanisms that is able to meet the trade-off between fault-tolerance degree, routing complexity, and performance.",
    	booktitle = "Parallel Processing, 2004. ICPP 2004. International Conference on",
    	doi = "10.1109/ICPP.2004.1327925",
    	issn = "0190-3918",
    	keywords = "direct networks; fault-tolerant routing algorithm; in-depth detailed analysis; interconnection networks; minimal adaptive routing; parallel computing system; communication complexity; fault tolerant computing; multiprocessor interconnection networks; par",
    	month = "aug.",
    	pages = "222 - 231 vol.1",
    	title = "{A}n effective fault-tolerant routing methodology for direct networks",
    	url = "",
    	year = 2004
  54. J M Montañana, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. A transition-based fault-tolerant routing methodology for InfiniBand networks. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International. April 2004, 186. URL, DOI BibTeX

    	author = "Monta{\~n}ana, J. M. and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "Summary form only given. Currently, clusters of PCs are considered a cost-effective alternative to large parallel computers. As the number of elements increases in these systems, the probability of faults increases dramatically. Therefore, it is critical to keep the system running even in the presence of faults. The interconnection network plays a key role in its performance. InfiniBand (IBA) is a new standard interconnect suitable for clusters. Most of the fault-tolerant routing strategies proposed for massively parallel computers cannot be applied to IBA because routing and virtual channel transitions are deterministic, which prevents packets from avoiding the faults. A possible approach to provide fault-tolerance in IBA consists of using several disjoint paths between every source-destination pair of nodes and selecting the appropriate path at the source host. However, to this end, a routing algorithm able to provide enough disjoint paths, while still guaranteeing deadlock freedom, is required. We propose a simple and effective fault-tolerant methodology for IBA networks that can be applied to any network topology and meets the trade-off between fault-tolerance degree and the number of network resources devoted to it. Preliminary results show that the proposed methodology scales well and supports up to three faults in 2D and five in 3D tori using only two virtual channels.",
    	booktitle = "Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International",
    	doi = "10.1109/IPDPS.2004.1303198",
    	isbn = "0-7695-2132-0",
    	issn = "",
    	keywords = "fault tolerant computing;multiprocessor interconnection networks;network topology;parallel machines;telecommunication network routing;workstation clusters;",
    	month = "april",
    	pages = 186,
    	title = "{A} transition-based fault-tolerant routing methodology for {I}nfini{B}and networks",
    	url = "",
    	year = 2004
    	author = "Gomez, Maria E. and Duato, Jose and Flich, Jose and Lopez, Pedro and Robles, Antonio and N.A. Nordbotten and O. Lysne and T. Skeie",
    	abstract = "In this paper we present a methodology to design fault-tolerant routing algorithms for regular direct interconnection networks. It supports fully adaptive routing, does not degrade performance in the absence of faults, and supports a reasonably large number of faults without significantly degrading performance. The methodology is mainly based on the selection of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, at this node, without being ejected, they are adaptively forwarded to their destinations. In order to allow deadlock-free minimal adaptive routing, the methodology requires only one additional virtual channel (for a total of three), even for tori. Evaluation results for a 4 x 4 x 4 torus network show that the methodology is 5-fault tolerant. Indeed, for up to 14 link failures, the percentage of fault combinations supported is higher than 99.96%. Additionally, network throughput degrades by less than 10% when injecting three random link faults without disabling any node. In contrast, a mechanism similar to the one proposed in the BlueGene/L, that disables some network planes, would strongly degrade network throughput by 79%.",
    	doi = "10.1109/L-CA.2004.1",
    	issn = "1556-6056",
    	journal = "Computer Architecture Letters",
    	month = "january-december",
    	number = 1,
    	pages = "3 - 3",
    	title = "{A}n {E}fficient {F}ault-{T}olerant {R}outing {M}ethodology for {M}eshes and {T}ori",
    	url = "",
    	volume = 3,
    	year = 2004
    	author = "Gomez, Maria E. and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose and N.A. Nordbotten and O. Lysne and T. Skeie",
    	abstract = "Current massively parallel computing systems are being built with thousands of nodes, which significantly affect the probability of failure. M. E. Gomex proposed a methodology to design fault-tolerant routing algorithms for direct interconnection networks. The methodology uses a simple mechanism: for some source-destination pairs, packets are first forwarded to an intermediate node, and later, from this node to the destination node. Minimal adaptive routing is used along both subpaths. For those cases where the methodology cannot find a suitable intermediate node, it combines the use of intermediate nodes with two additional mechanisms: disabling adaptive routing and using misrouting on a per-packet basis. While the combination of these three mechanisms tolerates a large number of faults, each one requires adding some hardware support in the network and also introduces some overhead. In this paper, we perform an in-depth detailed analysis of the impact of these mechanisms on network behaviour. We analyze the impact of the three mechanisms separately and combined. The ultimate goal of this paper is to obtain a suitable combination of mechanisms that is able to meet the trade-off between fault-tolerance degree, routing complexity, and performance",
    	address = "Los Alamitos, CA, USA",
    	journal = "2004 International Conference on Parallel Processing",
    	keywords = "communication complexity;fault tolerant computing;multiprocessor interconnection networks;parallel processing;",
    	note = "parallel computing system;fault-tolerant routing algorithm;interconnection networks;minimal adaptive routing;in-depth detailed analysis;direct networks;",
    	pages = "222 - 31",
    	title = "{A}n effective fault-tolerant routing methodology for direct networks",
    	volume = "vol.1",
    	year = 2004
    	author = "Gomez, Maria E. and Duato, Jose and Flich, Jose and Lopez, Pedro and Robles, Antonio and N.A. Nordbotten and T. Skeie and O. Lysne",
    	abstract = "Interconnection networks play a key role in the fault tolerance of massively parallel computers, since faults may isolate a large fraction of the machine containing many healthy nodes. In this paper, we present a methodology to design fully adaptive fault-tolerant routing algorithms for direct interconnection networks that can be applied to different regular topologies. The methodology is mainly based on the selection of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, from this node, they are adaptively forwarded to their destination. This methodology requires only one additional virtual channel, even for tori. Evaluation results show that the methodology is 7-fault tolerant, and for up to 14 faults, more than 99% of the combinations are tolerated, also without significantly degrading performance in the presence of faults",
    	address = "Berlin, Germany",
    	journal = "High Performance Computing-HiPC 2004. 11th International Conference (Lecture notes in Computer Science Vol.3296)",
    	keywords = "fault tolerant computing;multiprocessor interconnection networks;parallel processing;telecommunication network routing;telecommunication network topology;",
    	note = "adaptive fault-tolerant routing;direct interconnection networks;massively parallel computers;",
    	pages = "462 - 73",
    	title = "{A} new adaptive fault-tolerant routing methodology for direct networks",
    	year = 2004
    	author = "Alonso, Marina and J.M. Martinez and Santonja, Vicente and Lopez, Pedro",
    	abstract = "The huge increase both in size and complexity of high-end multiprocessor systems has triggered their power consumption. Air or liquid cooling systems are needed, which, in turn, increases power consumption. Another important percentage of the consumption is due to the interconnection network. In this paper, we propose a mechanism that dynamically reduces the available network bandwidth when traffic becomes low. Unlike other approaches that completely switch links off when they are not fully utilized, our mechanism is based on reducing their bandwidth by narrowing their width. As the topology of the network is not modified, the same routing algorithm can be used regardless of the power consumption level, which simplifies the router design. By using this strategy, the consumption may be strongly reduced. In fact, the lower bound of this reduction is a design parameter of the mechanism. The price to pay is an increase in the message latency with low network loads",
    	address = "Berlin, Germany",
    	journal = "Euro-Par 2004 Parallel Processing. 10th International Euro-Par Conference. Proceedings (Lecture Notes in Comput. Sci. Vol.3149)",
    	keywords = "bandwidth allocation;multiprocessor interconnection networks;power consumption;telecommunication links;telecommunication network routing;telecommunication traffic;",
    	note = "power consumption reduction;interconnection networks;link width adjustment;multiprocessor systems;network bandwidth;",
    	pages = "882 - 90",
    	title = "{R}educing power consumption in interconnection networks by dynamically adjusting link width",
    	year = 2004
  59. T Skeie, O Lysne, Jose Flich, Pedro Lopez, Antonio Robles and Jose Duato. LASH-TOR: a generic transition-oriented routing algorithm. In Parallel and Distributed Systems, 2004. ICPADS 2004. Proceedings. Tenth International Conference on. 2004, 595 - 604. URL, DOI BibTeX

    	author = "T. Skeie and O. Lysne and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose",
    	abstract = "Cluster networks are seen as the future access networks for multimedia streaming, e-commerce, network storage, etc. For these applications, performance and high availability are particularly crucial. Regular topologies are preferred when performance is the primary concern. However, due to spatial constraints or fault-related issues, the network structure may become irregular, which makes more difficult to find deadlock-free minimal paths. Over the recent years, several solutions have been proposed. One of them is the LASH routing, which enables minimal routing by assigning paths to different virtual layers. In this paper, we propose an extension of LASH in order to reduce the number of required virtual layers by allowing transitions between virtual layers. Evaluation results show that the new routing scheme (LASH-TOR) is able to obtain full minimal routing with a reduced number of virtual channels. For torus and mesh networks, with only two virtual channels, LASH throughput is increased by an average factor of improvement of 3.30 for large networks. For regular networks with some unconnected (faulty) links, equal performance improvements are achieved. Even for highly irregular networks of size up to 128 switches the new routing scheme only needs three virtual channels for guaranteeing minimal routing. Besides, LASH-TOR performs well compared to dimension order routing for mesh and torus networks.",
    	booktitle = "Parallel and Distributed Systems, 2004. ICPADS 2004. Proceedings. Tenth International Conference on",
    	doi = "10.1109/ICPADS.2004.1316144",
    	isbn = "0-7695-2152-5",
    	issn = "1521-9097",
    	keywords = "LASH routing; LASH-TOR; access networks; cluster networks; deadlock-free minimal paths; e-commerce; mesh network; multimedia streaming; network storage; network structure; spatial constraints; torus network; transition-oriented routing algorithm; virtual",
    	month = "7-9",
    	pages = "595 - 604",
    	title = "{LASH}-{TOR}: a generic transition-oriented routing algorithm",
    	url = "",
    	year = 2004
  60. N A Nordbotten, Maria E Gomez, Jose Flich, Pedro Lopez, Antonio Robles, T Skeie, O Lysne and Jose Duato. A fully adaptive fault-tolerant routing methodology based on intermediate nodes. 2004, 341 - 56. BibTeX

    	author = "N.A. Nordbotten and Gomez, Maria E. and Flich, Jose and Lopez, Pedro and Robles, Antonio and T. Skeie and O. Lysne and Duato, Jose",
    	abstract = "Massively parallel computing systems are being built with thousands of nodes. Because of the high number of components, it is critical to keep these systems running even in the presence of failures. Interconnection networks play a key-role in these systems, and this paper proposes a fault-tolerant routing methodology for use in such networks. The methodology supports any minimal routing function (including fully adaptive routing), does not degrade performance in the absence of faults, does not disable any healthy node, and is easy to implement both in meshes and tori. In order to avoid network failures, the methodology uses a simple mechanism: for some source-destination pairs, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network). The methodology is shown to tolerate a large number of faults (e.g., five/nine faults when using two/three intermediate nodes in a 3D torus). Furthermore, the methodology offers a gracious performance degradation: in an 8 × 8 × 8 torus network with 14 faults the throughput is only decreased by 6.49%",
    	address = "Germany, Germany",
    	journal = "Network and Parallel Computing. IFIP International Conference, NPC 2004. Proceedings (Lecture Notes in Computer Science Vol.3222)",
    	keywords = "fault tolerant computing;multiprocessor interconnection networks;packet switching;parallel processing;telecommunication network routing;",
    	note = "fully adaptive fault-tolerant routing;intermediate nodes;massively parallel computing systems;interconnection networks;minimal routing function;network failures;source-destination pairs;",
    	pages = "341 - 56",
    	title = "{A} fully adaptive fault-tolerant routing methodology based on intermediate nodes",
    	year = 2004
  61. Jose Flich, Pedro Lopez, M P Malumbres, Jose Duato and T Rokicki. Applying in-transit buffers to boost the performance of networks with source routing. Computers, IEEE Transactions on 52(9):1134 - 1153, 2003. DOI BibTeX

    	author = "Flich, Jose and Lopez, Pedro and M.P. Malumbres and Duato, Jose and T. Rokicki",
    	abstract = "In this paper, we analyze in depth the effect of using ITB in the network, showing that they not only serve for guaranteeing minimal routing, but also that they are a powerful mechanism able to balance network traffic and reduce network contention. To demonstrate these capabilities, we apply the ITB mechanism to improved routing schemes, such as DFS and smart-routing. These routing algorithms (without ITB) are able to improve the performance of up*/down* by 30 percent and 90 percent, respectively, for a 32-switch network. The evaluation results show that, when ITB are used together with these improved routing algorithms, network throughput achieved by DFS and smart-routing can still be improved by 56 percent and 23 percent, respectively. However, smart-routing requires a time to compute the routing tables that rapidly grows with network size, it being impossible in practice to build networks with more than 32 switches. This high computational cost is mainly motivated by the need of obtaining deadlock-free routing tables. However, when ITB are used, one can decouple the stages of computing routing tables and breaking cycles. Moreover, as stated above, ITB can be used to reduce network contention. In this way, in this paper, we also propose a completely new routing algorithm that tries to balance network traffic by using a simple and low time consuming strategy. The proposed algorithm guarantees deadlock freedom and reduces network contention with the use of ITB. The evaluation results show that our algorithm obtains unprecedented throughputs in 32-switch networks, tripling the original up*/down* and almost doubling smart-routing.",
    	doi = "10.1109/TC.2003.1228510",
    	issn = "0018-9340",
    	journal = "Computers, IEEE Transactions on",
    	keywords = "32-switch network; DFS; ITB; NOW; breaking cycles; deadlock-free routing tables; in-transit buffers; minimal routing; network contention reduction; network performance; network throughput; network traffic balancing; networks of workstations; performance;",
    	month = "sept.",
    	number = 9,
    	pages = "1134 - 1153",
    	Full Professor
This email address is being protected from spambots. You need JavaScript enabled to view it.
    	volume = 52,
    	year = 2003
Out-of-Order Retirement of Instructions in Superscalar, Multithreaded, and Multicore Processors. Julio Sahuquillo, Pedro López (Processor Architecture)

Low-Memory Techniques for Routing and Fault-Tolerance on the Fat-Tree Topology. Maria E. Gómez, Pedro López (Routing Algorithms)

Improvement of interconnection networks for clusters: direct-indirect hybrid topology and HoL-blocking reduction routing. Pedro López, Maria E. Gómez (High Performance Clusters)