As of January 2010, full-time associate professor at the Universidad Politécnica de Valencia, in the Parallel Arquitectures Group in the School of Engineering in Computer Science

Research Topics

Networks on chip. Routing algorithms and their implementations to address new challenges when building the on-chip network, including fault-tolerance, power management issues, virtualization. New router architectures and topologies for on-chip networks. Interaction of cache coherency protocols and the on-chip network in CMP tile-based systems. Congestion management in on-chip networks. Router designs for efficient on-chip interconnects. On-chip networks for embedded systems (addressing heterogeneity). High performance (off-chip) interconnects. InfiniBand-like networks, addressing routing algorithms, congestion management techniques and fault-tolerant algorithms. Quality of service

Much of this research has been performed as part of national and international research projects, framed in different funded projects like NaNoC, COMCAS, Consolider-Ingenio 2010, CICYT.

The following is a list of current or past advised PhD students:

  • Teresa Nachiondo Farinós, Assistant Professor at UPV
  • José Miguel Montañana Aliaga
  • Andrés Mejía Gómez, currently at Intel Santa Clara
  • Gaspar Mora Porta, currently at Intel Santa Clara
  • Samuel Rodrigo Mocholí
  • Jesús Camacho Villanueva
  • Toni Roca
  • José Cano Reyes

Other advised students working in research projects:

  • José María Martí­nez


    	author = "Gomez, Maria E. and N.A. Nordbotten and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose and T. Skeie and O. Lysne",
    	abstract = "Massively parallel computing systems are being built with thousands of nodes. The nterconnection network plays a key role for the performance of such systems. However, the high number of components significantly increases the probability of failure. Additionally, failures in the interconnection network may isolate a large fraction of the machine. It is therefore critical to provide an efficient fault-tolerant mechanism to keep the system running, even in the presence of faults. This paper presents a new fault-tolerant routing methodology that does not degrade performance in the absence of faults and tolerates a reasonably large number of faults without disabling any healthy node. In order to avoid faults, for some source-destination pairs, packets are first sent to an intermediate node and then from this node to the destination node. Fully adaptive routing is used along both subpaths. The methodology assumes a static fault model and the use of a checkpoint/restart mechanism. However, there are scenarios where the faults cannot be avoided solely by using an intermediate node. Thus, we also provide some extensions to the methodology. Specifically, we propose disabling adaptive routing and/or using misrouting on a per-packet basis. We also propose the use of more than one intermediate node for some paths. The proposed fault-tolerant routing methodology is extensively evaluated in terms of fault tolerance, complexity, and performance.",
    	title = "{A} routing methodology for achieving fault tolerance in direct networks",
  56. , Jose Flich, Jose Duato, S -A Reinemo and T Skeie. Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International. April 2006, 10 pp.. URL, DOI BibTeX

    	abstract = "Computers get faster every year, but the demand for computing resources seems to grow at an even faster rate. Depending on the problem domain, this demand for more power can be satisfied by either, massively parallel computers, or clusters of computers. Common for both approaches is the dependence on high performance interconnect networks such as Myrinet, Infiniband, or 10 Gigabit Ethernet. While high throughput and low latency are key features of interconnection networks, the issue of fault-tolerance is now becoming increasingly important. As the number of network components grows so does the probability for failure, thus it becomes important to also consider the fault-tolerance mechanism of interconnection networks. The main challenge then lies in combining performance and fault-tolerance, while still keeping cost and complexity low. This paper proposes a new deterministic routing methodology for tori and meshes, which achieves high performance without the use of virtual channels. Furthermore, it is topology agnostic in nature, meaning it can handle any topology derived from any combination of faults when combined with static reconfiguration. The algorithm, referred to as segment-based routing (SR), works by partitioning a topology into subnets, and subnets into segments. This allows us to place bidirectional turn restrictions locally within a segment. As segments are independent, we gain the freedom to place turn restrictions within a segment independently from other segments. This results in a larger degree of freedom when placing turn restrictions compared to other routing strategies. In this paper a way to compute segment-based routing tables is presented and applied to meshes and tori. Evaluation results show that SR increases performance by a factor of 1.8 over FX and up*/down* routing",
    	booktitle = "Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International",
    	title = "{S}egment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori",
  57. Gaspar Mora, Jose Flich, Jose Duato, Pedro Lopez, Elvira Baydal and O Lysne. Towards an efficient switch architecture for high-radix switches. 2006, 11 - 20. URL, DOI BibTeX

    	author = "Mora, Gaspar and Flich, Jose and Duato, Jose and Lopez, Pedro and Baydal, Elvira and O. Lysne",
    	abstract = "The interconnection network plays a key role in the overall performance achieved by high performance computing systems, also contributing an increasing fraction of its cost and power consumption. Current trends in interconnection network technology suggest that high-radix switches will be preferred as networks will become smaller (in terms of switch count) with the associated savings in packet latency, cost, and power consumption. Unfortunately, current switch architectures have scalability problems that prevent them from being effective when implemented with a high number of ports. In this paper, an efficient and cost-effective architecture for high-radix switches is proposed. The architecture, referred to as partitioned crossbar input queued (PCIQ), relies on three key components: a partitioned crossbar organization that allows the use of simple arbiters and crossbars, a packet-based arbiter, and a mechanism to eliminate the switch-level HOL blocking. Under uniform traffic, maximum switch efficiency is achieved. Furthermore, switch-level HOL blocking is completely eliminated under hot-spot traffic, again delivering maximum throughput. Additionally, PCIQ inherently implements an efficient congestion management technique that eliminates all the network-wide HOL blocking. On the contrary, the previously proposed architectures either show poor performance or they require significantly higher costs than PCIQ (in both components and complexity).",
    	title = "{T}owards an efficient switch architecture for high-radix switches",
  58. , Jose Flich, Jose Duato, S -A Reinemo and T Skeie. Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International. 2006, 10 pp. -. URL, DOI BibTeX

    	author = ", and Flich, Jose and Duato, Jose and S.-A. Reinemo and T. Skeie",
    	abstract = "Computers get faster every year, but the demand for computing resources seems to grow at an even faster rate. Depending on the problem domain, this demand for more power can be satisfied by either, massively parallel computers, or clusters of computers. Common for both approaches is the dependence on high performance interconnect networks such as Myrinet, Infiniband, or 10 Gigabit Ethernet. While high throughput and low latency are key features of interconnection networks, the issue of fault-tolerance is now becoming increasingly important. As the number of network components grows so does the probability for failure, thus it becomes important to also consider the fault-tolerance mechanism of interconnection networks. The main challenge then lies in combining performance and fault-tolerance, while still keeping cost and complexity low. This paper proposes a new deterministic routing methodology for tori and meshes, which achieves high performance without the use of virtual channels. Furthermore, it is topology agnostic in nature, meaning it can handle any topology derived from any combination of faults when combined with static reconfiguration. The algorithm, referred to as segment-based routing (SR), works by partitioning a topology into subnets, and subnets into segments. This allows us to place bidirectional turn restrictions locally within a segment. As segments are independent, we gain the freedom to place turn restrictions within a segment independently from other segments. This results in a larger degree of freedom when placing turn restrictions compared to other routing strategies. In this paper a way to compute segment-based routing tables is presented and applied to meshes and tori. Evaluation results show that SR increases performance by a factor of 1.8 over FX and up*/down* routing",
    	booktitle = "Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International",
    	title = "{S}egment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori",
  59. Teresa Nachiondo, Jose Flich and Jose Duato. Destination-based HoL blocking elimination. In Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference onParallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on 1. 2006, 10 pp. -. URL, DOI BibTeX

    	author = "Nachiondo, Teresa and Flich, Jose and Duato, Jose",
    	abstract = "In future interconnection networks, congestion management is likely to become a critical issue owing to increasing power consumption and cost concerns. As congested packets introduce head-of-line (HoL) blocking to the rest of packets, congestion spreads quickly. The best-known solution to HoL blocking, virtual output queues (VOQs), is not scalable at all or too costly when implemented in large networks. In previous works, we proposed an efficient and cost-effective solution, referred to as destination-based buffer management (DBBM). DBBM groups destinations into different sets, and packets addressed to destinations in the same set are mapped to the same queue. DBBM eliminates most of the HoL blocking (among packets addressed to different sets). It achieves very good results in terms of scalability, throughput, and robustness. However, depending on the distribution of packet destinations, it may introduce an uncertain degree of unfairness among packets mapped on the same queue. In order to overcome this problem, we propose the dynamic DBBM mechanism (DDBBM). DDBBM dynamically eliminates completely the HoL blocking. Performance results show that DDBBM keeps (and in some cases improves) the good results achieved by DBBM in terms of throughput and scalability. Moreover, DDBBM solves the unfairness introduced by DBBM. As an example of applicability, in this paper we show that DDBBM can be applied to InfiniBand with no hardware modification",
    	title = "{D}estination-based {H}o{L} blocking elimination",
  60. Maria E Gomez, N A Nordbotten, Jose Flich, Pedro Lopez, Antonio Robles, Jose Duato, T Skeie and O Lysne. A routing methodology for achieving fault tolerance in direct networks. IEEE Transactions on Computers 55(4):400 - 15, 2006. URL, DOI BibTeX

    	author = "Gomez, Maria E. and N.A. Nordbotten and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose and T. Skeie and O. Lysne",
    	abstract = "Massively parallel computing systems are being built with thousands of nodes. The interconnection network plays a key role for the performance of such systems. However, the high number of components significantly increases the probability of failure. Additionally, failures in the interconnection network may isolate a large fraction of the machine. It is therefore critical to provide an efficient fault-tolerant mechanism to keep the system running, even in the presence of faults. This paper presents a new fault-tolerant routing methodology that does not degrade performance in the absence of faults and tolerates a reasonably large number of faults without disabling any healthy node. In order to avoid faults, for some source-destination pairs, packets are first sent to an intermediate node and then from this node to the destination node. Fully adaptive routing is used along both subpaths. The methodology assumes a static fault model and the use of a checkpoint/restart mechanism. However, there are scenarios where the faults cannot be avoided solely by using an intermediate node. Thus, we also provide some extensions to the methodology. Specifically, we propose disabling adaptive routing and/or using misrouting on a per-packet basis. We also propose the use of more than one intermediate node for some paths. The proposed fault-tolerant routing methodology is extensively evaluated in terms of fault tolerance, complexity, and performance",
    	title = "{A} routing methodology for achieving fault tolerance in direct networks",
  61. J M Montañana, Jose Flich, Antonio Robles and Jose Duato. Reachability-based fault-tolerant routing. In Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on 1. 2006, 10 pp.. URL, DOI BibTeX

    	author = "Monta{\~n}ana, J. M. and Flich, Jose and Robles, Antonio and Duato, Jose",
    	abstract = "Clusters of PCs are being used as cost-effective alternative to large parallel computers. In most of them it is critical to keep the system running even in the presence of faults. As the number of nodes increases in these systems, the interconnection network grows accordingly. Along with the increase in components the probability of faults increases dramatically, and thus, fault-tolerance in the system, in general, and in the interconnection network, in particular, plays a key role. An interesting approach to provide fault-tolerance consists of migrating on fly the paths affected by the failure to new fault-free paths. In this paper, we propose a simple and effective fault-tolerant routing methodology, referred to as reachability based fault tolerant routing (RFTR), that can be applied to any topology. RFTR builds new alternative paths by joining subpaths extracted from the set of already computed paths, thus being time-efficient. In order to avoid deadlocks, RFTR performs, if required, a virtual channel transition on the subpath union. As an example of applicability, in this paper we apply RFTR to InfiniBand. Evaluation results on tori show that RFTR exhibits a low computation cost and does not degrade performance significantly",
    	title = "{R}eachability-based fault-tolerant routing",
  62. A Martinez, P J Garcia, F J Alfaro, J L Sanchez, Jose Flich, F J Quiles and Jose Duato. Towards a cost-effective interconnection network architecture with QoS and congestion management support. 2006, 884 - 95. BibTeX

    	author = "A. Martinez and P.J. Garcia and F.J. Alfaro and J.L. Sanchez and Flich, Jose and F.J. Quiles and Duato, Jose",
    	abstract = "Congestion management and quality of service (QoS) provision are two important issues in current network design. The most popular techniques proposed for both issues require the existence of specific resources in the interconnection network, usually a high number of separate queues at switch ports. Therefore, the implementation of these techniques is expensive or even in feasible. However, two novel, efficient, and cost-effective techniques for provision of QoS and for congestion management have been proposed recently. In this paper, we combine those techniques to build a single interconnection network architecture, providing an excellent performance while reducing the number of required resources",
    	title = "{T}owards a cost-effective interconnection network architecture with {Q}o{S} and congestion management support",
  63. P J Garcia, F J Quiles, Jose Flich, Jose Duato and I Johnson. RECN-DD: A Memory-Efficient Congestion Management Technique for Advanced Switching. In Parallel Processing, 2006. ICPP 2006. International Conference on. 2006, 23 -32. DOI BibTeX

    	author = "P.J. Garcia and F.J. Quiles and Flich, Jose and Duato, Jose and I. Johnson",
    	abstract = "As VLSI technology advances, the interconnection network represents a larger percentage of the total system cost and power consumption. In fact, a current trend in network design is to reduce the number of components. However, this leads to systems working closer to saturation point, and therefore an efficient congestion management technique is required. In that sense, RECN has been recently proposed for advanced switching (AS). RECN detects the formation of congestion trees and dynamically allocates queues for storing congested packets, thus, eliminating the HOL blocking introduced by congestion trees. These queues are deallocated when congestion vanishes. We have identified two shortcomings that may affect RECN scalability and implementation. Firstly, although RECN allocates queues in an efficient way, resource deallocation is performed in-order, thus losing efficiency and wasting resources. This leads to an excessive requirement of memory at switch ports. Secondly, both allocation and deallocation mechanisms involve the use of specific control packets not supported by the AS standard, thus preventing RECN implementation. In this sense we provide a detailed description of the current RECN deallocation mechanism. In this paper we present an enhanced RECN version (RECN-DD) where these problems have been eliminated. Specifically, we propose a new distributed queue deallocation mechanism that reduces the number of required resources and does not require the use of control packets. Moreover, we propose a new congestion notification mechanism that does not require non-standard AS packets. Instead, flow control packets are used to notify congestion, thus simplifying the implementation of RECN-DD in AS",
    	title = "{RECN}-{DD}: {A} {M}emory-{E}fficient {C}ongestion {M}anagement {T}echnique for {A}dvanced {S}witching",
  64. Teresa Nachiondo, Jose Flich, Jose Duato and M Gusat. Cost/performance trade-offs and fairness evaluation of queue mapping policies. In José Cunha; Pedro C D Medeiros (ed.). Euro-Par 2005 Parallel Processing 3648. August 2005, 1024 - 1034. URL, DOI BibTeX

    	author = "Nachiondo, Teresa and Flich, Jose and Duato, Jose and M. Gusat",
    	abstract = "Whereas the established interconnection networks (ICTN) achieve low latency by operating in the linear region, i.e. oversizing the fabric, the strict cost and power constraints demand more efficient utilization of future networks. Increasing the utilization of lossless ICTNs may, however, lead to saturation and performance degradation owing to HOL-blocking. The current solution to HOL-blocking consists of using virtual output queueing (VOQ), whose quadratical scalability is expensive in large networks. To improve VOQ's scalability we have proposed the destination-based buffer management (DBBM), a scheme that compares well with VOQ. Whereas previously we have analyzed DBBM's basic operation and performance, in this paper we have set two different goals. First we focus on how the different DBBM mappings can impact the cost/performance of multistage ICTNs. Next, because DBBM can introduce unfairness, this constitutes the second theme of our paper. The new results show that DBBM with modulo-4/8 mapping performs very well for only a fraction of the VOQ cost. Also in terms of fairness DBBM shows promise, because it (i) keeps the unfairness degree independent of both topology and routing, while (ii) minimizing the number of flows affected by unfairness",
    	title = "{C}ost/performance trade-offs and fairness evaluation of queue mapping policies",
  65. Teresa Nachiondo, Jose Flich and Jose Duato. Efficient reduction of HOL blocking in multistage networks. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International. April 2005, 8 pp.. URL, DOI BibTeX

    	author = "Nachiondo, Teresa and Flich, Jose and Duato, Jose",
    	abstract = "Head-of-line blocking is one of the main problems arising in input-buffered switches. The best-known solution to this problem consists of using virtual output queues (VOQs). However this strategy is not scalable. Its implementation cost increases quadratically with the number of ports in the switch. Taking into account current trends, the demand for larger number of ports in high-performance switches is likely to increase very rapidly in the future. Therefore, a scalable and cost-effective solution is required. In this paper we propose an efficient and cost-effective strategy (belonging to a family of strategies previously proposed, referred to as destination-based buffer management (DBBM)), to reduce HOL blocking in single-stage and multistage networks. The proposed strategy is based on allowing certain destinations to share the same queue. Its main purpose is to maximize network throughput whereas keeping HOL blocking to negligible values. In this paper, we apply the strategy at every switch included in a bidirectional multistage network (BMIN). We have evaluated DBBM, VOQ, and alternative strategies in different BMIN sizes and with different traffic conditions (synthetic traffic, IP traces, and I/O traces). Results show that DBBM with a reduced number of queues at each switch obtains roughly the same throughput as the VOQ mechanism. Moreover, VOQ at the switch level (as many queues as output ports at every switch) has also been analyzed. Results demonstrate that it does not scale. As the number of stages in the network increases, the VOQ solution at the switch level introduces more HOL blocking that leads to a severe degradation in network throughput. With the DBBM using 16 queues, maximum network throughput is sustained for all the traffic cases analyzed. Moreover, as the network size increases (up to a 2048 times; 2048 BMIN), DBBM keeps roughly the same performance with the same number of queues.",
    	title = "{E}fficient reduction of {HOL} blocking in multistage networks",
  66. R Martinez, J L Sanchez, F J Alfaro, Vicente Chirivella and Jose Flich. Studying the effect of the design parameters on the interconnection network performance in NOWs. In Parallel, Distributed and Network-Based Processing, 2005. PDP 2005. 13th Euromicro Conference on. February 2005, 102 - 109. URL, DOI BibTeX

    	author = "R. Martinez and J.L. Sanchez and F.J. Alfaro and Chirivella, Vicente and Flich, Jose",
    	abstract = "With the increasing use of network of workstations (NOWs) as an alternative to huge parallel computers it has become essential to design high-performance interconnection networks for the communication between the nodes of these clusters. A large number of studies have been carried out to achieve this objective. Most of them propose a new technique that affects one of the parameters that characterize the interconnection network. These techniques are completely new or inspired in the techniques previously used in multiprocessor systems. The impact of the proposal is studied (in most cases using simulation), and an analysis is made of the effect of the new technique over the system performance versus those currently in existence. In this kind of study most of the network parameters are fixed and usually only a few parameters are varied. This paper presents a more general study of the interconnection network performance. This study consists in showing the effect of different design parameters over the network performance, and the interaction between them. This study would not be viable with the traditional techniques due to the number of simulations required. The alternative of the experimental design is used to carry out the study.",
    	title = "{S}tudying the effect of the design parameters on the interconnection network performance in {NOW}s",
  67. Jose Duato, I Johnson, Jose Flich, F Naven, P Garcia and Teresa Nachiondo. A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks. In High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on. February 2005, 108 - 119. URL, DOI BibTeX

    	author = "Duato, Jose and I. Johnson and Flich, Jose and F. Naven and P. Garcia and Nachiondo, Teresa",
    	abstract = "In this paper, we propose a new congestion management strategy for lossless multistage interconnection networks that scales as network size and/or link bandwidth increase. Instead of eliminating congestion, our strategy avoids performance degradation beyond the saturation point by eliminating the HOL blocking produced by congestion trees. This is achieved in a scalable manner by using separate queues for congested flows. These are dynamically allocated only when congestion arises, and deallocated when congestion subsides. Performance evaluation results show that our strategy responds to congestion immediately and completely eliminates the performance degradation produced by HOL blocking while using only a small number of additional queues.",
    	title = "{A} new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks",
  68. Michihiro Koibuchi, Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Enforcing in-order packet delivery in system area networks with adaptive routing. Journal of Parallel and Distributed Computing 65(10):1223 - 1236, 2005. URL BibTeX

    	author = "Michihiro Koibuchi and Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "Adaptive routing, which dynamically selects the route of packets, has been widely studied for interconnection networks in massively parallel computers and system area networks. Although adaptive routing has the advantage of providing high bandwidth, it may deliver packets out-of-order, which some message passing libraries do not accept. In this paper, we propose two mechanisms called (1) FIFO transmission and (2) couple limitation to guarantee in-order packet delivery in adaptive routing. Both of them limit packet injection at source hosts. The FIFO transmission completely avoids packet sorting at destination hosts, while the couple limitation uses a few buffers to sort packets at destination hosts. Evaluation results show that the FIFO transmission and the couple limitation achieve a similar throughput to that of a method equipped with huge (infinite) buffers enough to store all out-of-order packets at destination hosts under both synthetic traffic and NAS Parallel Benchmarks. © 2005 Elsevier Inc. All rights reserved.",
    	title = "{E}nforcing in-order packet delivery in system area networks with adaptive routing",
  69. P J Garcia, Jose Flich, Jose Duato, F J Quiles, I Johnson and F Naven. On the correct sizing on meshes through an effective congestion management strategy. 2005, 1035 - 45. BibTeX

    	author = "P.J. Garcia and Flich, Jose and Duato, Jose and F.J. Quiles and I. Johnson and F. Naven",
    	abstract = "Interconnection networks used in clusters of PCs are often dimensioned with certain restrictions. One restriction could be the reduction of power consumption and overall cost. In this sense, the network size must be reduced. Another restriction is to guarantee that the system offers a minimum bandwidth. In this case, the network size must be increased. In both cases, the head-of-line (HOL) blocking effect (related to network congestion) may appear, degrading network performance and thus, preventing the correct sizing of the network. Therefore, some mechanisms should be implemented for reducing or eliminating this problem, in order to dimension the network as desired while keeping network performance at maximum. In this paper we analyze the impact on network performance when using different mechanisms for handling HOL blocking when interconnection networks with mesh topology are dimensioned in several ways. We show that the previously proposed RECN congestion control mechanism is key in order to efficiently eliminate HOL blocking in meshes and, therefore, it allows the correct network sizing",
    	title = "{O}n the correct sizing on meshes through an effective congestion management strategy",
  70. Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez, Jose Duato and M Koibuchi. In-Order Packet Delivery in Interconnection Networks using Adaptive Routing. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International. 2005, 101 - 101. DOI BibTeX

    	author = "Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose and M. Koibuchi",
    	abstract = "Most commercial switch-based network technologies for PC clusters use deterministic routing. Alternatively, adaptive routing could be used to improve network performance. In this case, switches decide the path to reach the destination by using local information about the state of the possible outgoing links. However, there are two drawbacks that discourage adaptive routing from being applied to commercial interconnects. The first one concerns the possible switch complexity increase with respect to deterministic routing. The second drawback is due to the fact that adaptive routing may introduce out-of-order packet delivery, which is not acceptable for some applications. For the best of our knowledge, there are no works that analyze the degree of out-of-order packet delivery caused by different network and traffic conditions. In this paper, we take on such a challenge. We show that only for high traffic conditions (reaching saturation) out-of-order delivery is introduced. Moreover, by using small buffers and simple sorting mechanisms at destination, we show that high network throughput can be obtained at the same time packets are delivered in order. Thus, the paper demonstrates that it is possible to use adaptive routing, while still guaranteeing in-order packet delivery, without using large buffer resources nor degrading significantly its performance.",
    	title = "{I}n-{O}rder {P}acket {D}elivery in {I}nterconnection {N}etworks using {A}daptive {R}outing",
  71. P J Garcia, Jose Flich, Jose Duato, I Johnson, F J Quiles and F Naven. Dynamic evolution of congestion trees: Analysis and impact on switch architecture. 2005, 266 - 285. BibTeX

    	author = "P.J. Garcia and Flich, Jose and Duato, Jose and I. Johnson and F.J. Quiles and F. Naven",
    	abstract = "Designers of large parallel computers and clusters are becoming increasingly concerned with the cost and power consumption of the interconnection network. A simple way to reduce them consists of reducing the number of network components and increasing their utilization. However, doing so without a suitable congestion management mechanism may lead to dramatic throughput degradation when the network enters saturation. Congestion management strategies for lossy networks (computer networks) are well known, but relatively little effort has been devoted to congestion management in lossless networks (parallel computers, clusters, and on-chip networks). Additionally, congestion is much more difficult to solve in this context due to the formation of congestion trees. In this paper we study the dynamic evolution of congestion trees. We show that, contrary to the common belief, trees do not only grow from the root toward the leaves. There exist cases where trees grow from the leaves to the root, cases where several congestion trees grow independently and later merge, and even cases where some congestion trees completely overlap while being independent. This complex evolution and its implications on switch architecture are analyzed, proposing enhancements to a recently proposed congestion management mechanism and showing the impact on performance of different design decisions. {{\&}}copy; Springer-Verlag Berlin Heidelberg 2005.",
    	title = "{D}ynamic evolution of congestion trees: {A}nalysis and impact on switch architecture",
  72. Maria E Gomez, Jose Flich, Pedro Lopez, Antonio Robles, Jose Duato, N A Nordbotten, O Lysne and T Skeie. An effective fault-tolerant routing methodology for direct networks. In Parallel Processing, 2004. ICPP 2004. International Conference on. 2004, 222 - 231 vol.1. URL, DOI BibTeX

    	author = "Gomez, Maria E. and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose and N.A. Nordbotten and O. Lysne and T. Skeie",
    	abstract = "Current massively parallel computing systems are being built with thousands of nodes, which significantly affect the probability of failure. M. E. Gomez proposed a methodology to design fault-tolerant routing algorithms for direct interconnection networks. The methodology uses a simple mechanism: for some source-destination pairs, packets are first forwarded to an intermediate node, and later, from this node to the destination node. Minimal adaptive routing is used along both subpaths. For those cases where the methodology cannot find a suitable intermediate node, it combines the use of intermediate nodes with two additional mechanisms: disabling adaptive routing and using misrouting on a per-packet basis. While the combination of these three mechanisms tolerates a large number of faults, each one requires adding some hardware support in the network and also introduces some overhead. In this paper, we perform an in-depth detailed analysis of the impact of these mechanisms on network behaviour. We analyze the impact of the three mechanisms separately and combined. The ultimate goal of this paper is to obtain a suitable combination of mechanisms that is able to meet the trade-off between fault-tolerance degree, routing complexity, and performance.",
    	title = "{A}n effective fault-tolerant routing methodology for direct networks",
  73. J M Montañana, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. A transition-based fault-tolerant routing methodology for InfiniBand networks. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International. April 2004, 186. URL, DOI BibTeX

    	author = "Monta{\~n}ana, J. M. and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "Summary form only given. Currently, clusters of PCs are considered a cost-effective alternative to large parallel computers. As the number of elements increases in these systems, the probability of faults increases dramatically. Therefore, it is critical to keep the system running even in the presence of faults. The interconnection network plays a key role in its performance. InfiniBand (IBA) is a new standard interconnect suitable for clusters. Most of the fault-tolerant routing strategies proposed for massively parallel computers cannot be applied to IBA because routing and virtual channel transitions are deterministic, which prevents packets from avoiding the faults. A possible approach to provide fault-tolerance in IBA consists of using several disjoint paths between every source-destination pair of nodes and selecting the appropriate path at the source host. However, to this end, a routing algorithm able to provide enough disjoint paths, while still guaranteeing deadlock freedom, is required. We propose a simple and effective fault-tolerant methodology for IBA networks that can be applied to any network topology and meets the trade-off between fault-tolerance degree and the number of network resources devoted to it. Preliminary results show that the proposed methodology scales well and supports up to three faults in 2D and five in 3D tori using only two virtual channels.",
    	title = "{A} transition-based fault-tolerant routing methodology for {I}nfini{B}and networks",
  74. Jose Duato, Jose Flich and Teresa Nachiondo. A cost-effective technique to reduce HOL blocking in single-stage and multistage switch fabrics. In Parallel, Distributed and Network-Based Processing, 2004. Proceedings. 12th Euromicro Conference on. February 2004, 48 - 53. URL, DOI BibTeX

    	author = "Duato, Jose and Flich, Jose and Nachiondo, Teresa",
    	abstract = "Head-of-line (HOL) blocking is one of the main problems arising in input-buffered switches. The best-known solution to this problem consists of using virtual output queues (VOQs). However this strategy is not scalable at all. Its implementation cost increases quadratically with the number of ports in the switch. Taking into account current trends, the demand for larger number of ports in high-performance switches is likely to increase very rapidly in the near future. Therefore, a more scalable and cost-effective solution is required. We propose a very efficient and cost-effective technique, referred to as destination-based buffer management (DBBM), to reduce HOL blocking in single-stage and multistage switch. Results show that the use of the DBBM technique with a reduced number of queues at each IA is able to obtain roughly the same throughput as the VOQ mechanism. In particular, the number of queues can be reduced by a factor of up to 8 with the DBBM technique.",
    	title = "{A} cost-effective technique to reduce {HOL} blocking in single-stage and multistage switch fabrics",
  75. J M Stine, N P Carter and Jose Flich. Comparing Adaptive Routing and Dynamic Voltage Scaling for Link Power Reduction. Computer Architecture Letters 3(1):4 - 4, 2004. DOI BibTeX

    	author = "J.M. Stine and N.P. Carter and Flich, Jose",
    	title = "{C}omparing {A}daptive {R}outing and {D}ynamic {V}oltage {S}caling for {L}ink {P}ower {R}eduction",
  76. Maria E Gomez, Jose Duato, Jose Flich, Pedro Lopez, Antonio Robles, N A Nordbotten, O Lysne and T Skeie. An Efficient Fault-Tolerant Routing Methodology for Meshes and Tori. Computer Architecture Letters 3(1):3 - 3, 2004. URL, DOI BibTeX

    	author = "Gomez, Maria E. and Duato, Jose and Flich, Jose and Lopez, Pedro and Robles, Antonio and N.A. Nordbotten and O. Lysne and T. Skeie",
    	abstract = "In this paper we present a methodology to design fault-tolerant routing algorithms for regular direct interconnection networks. It supports fully adaptive routing, does not degrade performance in the absence of faults, and supports a reasonably large number of faults without significantly degrading performance. The methodology is mainly based on the selection of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, at this node, without being ejected, they are adaptively forwarded to their destinations. In order to allow deadlock-free minimal adaptive routing, the methodology requires only one additional virtual channel (for a total of three), even for tori. Evaluation results for a 4 x 4 x 4 torus network show that the methodology is 5-fault tolerant. Indeed, for up to 14 link failures, the percentage of fault combinations supported is higher than 99.96%. Additionally, network throughput degrades by less than 10% when injecting three random link faults without disabling any node. In contrast, a mechanism similar to the one proposed in the BlueGene/L, that disables some network planes, would strongly degrade network throughput by 79%.",
    	title = "{A}n {E}fficient {F}ault-{T}olerant {R}outing {M}ethodology for {M}eshes and {T}ori",
  77. Maria E Gomez, Jose Flich, Pedro Lopez, Antonio Robles, Jose Duato, N A Nordbotten, O Lysne and T Skeie. An effective fault-tolerant routing methodology for direct networks. 2004, 222 - 31. BibTeX

    	author = "Gomez, Maria E. and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose and N.A. Nordbotten and O. Lysne and T. Skeie",
    	abstract = "Current massively parallel computing systems are being built with thousands of nodes, which significantly affect the probability of failure. M. E. Gomex proposed a methodology to design fault-tolerant routing algorithms for direct interconnection networks. The methodology uses a simple mechanism: for some source-destination pairs, packets are first forwarded to an intermediate node, and later, from this node to the destination node. Minimal adaptive routing is used along both subpaths. For those cases where the methodology cannot find a suitable intermediate node, it combines the use of intermediate nodes with two additional mechanisms: disabling adaptive routing and using misrouting on a per-packet basis. While the combination of these three mechanisms tolerates a large number of faults, each one requires adding some hardware support in the network and also introduces some overhead. In this paper, we perform an in-depth detailed analysis of the impact of these mechanisms on network behaviour. We analyze the impact of the three mechanisms separately and combined. The ultimate goal of this paper is to obtain a suitable combination of mechanisms that is able to meet the trade-off between fault-tolerance degree, routing complexity, and performance",
    	title = "{A}n effective fault-tolerant routing methodology for direct networks",
  78. Maria E Gomez, Jose Duato, Jose Flich, Pedro Lopez, Antonio Robles, N A Nordbotten, T Skeie and O Lysne. A new adaptive fault-tolerant routing methodology for direct networks. 2004, 462 - 73. BibTeX

    	author = "Gomez, Maria E. and Duato, Jose and Flich, Jose and Lopez, Pedro and Robles, Antonio and N.A. Nordbotten and T. Skeie and O. Lysne",
    	abstract = "Interconnection networks play a key role in the fault tolerance of massively parallel computers, since faults may isolate a large fraction of the machine containing many healthy nodes. In this paper, we present a methodology to design fully adaptive fault-tolerant routing algorithms for direct interconnection networks that can be applied to different regular topologies. The methodology is mainly based on the selection of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, from this node, they are adaptively forwarded to their destination. This methodology requires only one additional virtual channel, even for tori. Evaluation results show that the methodology is 7-fault tolerant, and for up to 14 faults, more than 99% of the combinations are tolerated, also without significantly degrading performance in the presence of faults",
    	title = "{A} new adaptive fault-tolerant routing methodology for direct networks",
  79. T Skeie, O Lysne, Jose Flich, Pedro Lopez, Antonio Robles and Jose Duato. LASH-TOR: a generic transition-oriented routing algorithm. In Parallel and Distributed Systems, 2004. ICPADS 2004. Proceedings. Tenth International Conference on. 2004, 595 - 604. URL, DOI BibTeX

    	author = "T. Skeie and O. Lysne and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose",
    	abstract = "Cluster networks are seen as the future access networks for multimedia streaming, e-commerce, network storage, etc. For these applications, performance and high availability are particularly crucial. Regular topologies are preferred when performance is the primary concern. However, due to spatial constraints or fault-related issues, the network structure may become irregular, which makes more difficult to find deadlock-free minimal paths. Over the recent years, several solutions have been proposed. One of them is the LASH routing, which enables minimal routing by assigning paths to different virtual layers. In this paper, we propose an extension of LASH in order to reduce the number of required virtual layers by allowing transitions between virtual layers. Evaluation results show that the new routing scheme (LASH-TOR) is able to obtain full minimal routing with a reduced number of virtual channels. For torus and mesh networks, with only two virtual channels, LASH throughput is increased by an average factor of improvement of 3.30 for large networks. For regular networks with some unconnected (faulty) links, equal performance improvements are achieved. Even for highly irregular networks of size up to 128 switches the new routing scheme only needs three virtual channels for guaranteeing minimal routing. Besides, LASH-TOR performs well compared to dimension order routing for mesh and torus networks.",
    	title = "{LASH}-{TOR}: a generic transition-oriented routing algorithm",
  80. N A Nordbotten, Maria E Gomez, Jose Flich, Pedro Lopez, Antonio Robles, T Skeie, O Lysne and Jose Duato. A fully adaptive fault-tolerant routing methodology based on intermediate nodes. 2004, 341 - 56. BibTeX

    	author = "N.A. Nordbotten and Gomez, Maria E. and Flich, Jose and Lopez, Pedro and Robles, Antonio and T. Skeie and O. Lysne and Duato, Jose",
    	abstract = "Massively parallel computing systems are being built with thousands of nodes. Because of the high number of components, it is critical to keep these systems running even in the presence of failures. Interconnection networks play a key-role in these systems, and this paper proposes a fault-tolerant routing methodology for use in such networks. The methodology supports any minimal routing function (including fully adaptive routing), does not degrade performance in the absence of faults, does not disable any healthy node, and is easy to implement both in meshes and tori. In order to avoid network failures, the methodology uses a simple mechanism: for some source-destination pairs, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network). The methodology is shown to tolerate a large number of faults (e.g., five/nine faults when using two/three intermediate nodes in a 3D torus). Furthermore, the methodology offers a gracious performance degradation: in an 8 × 8 × 8 torus network with 14 faults the throughput is only decreased by 6.49%",
    	title = "{A} fully adaptive fault-tolerant routing methodology based on intermediate nodes",
  81. Jose Flich, Pedro Lopez, M P Malumbres, Jose Duato and T Rokicki. Applying in-transit buffers to boost the performance of networks with source routing. Computers, IEEE Transactions on 52(9):1134 - 1153, 2003. DOI BibTeX

    	author = "Flich, Jose and Lopez, Pedro and M.P. Malumbres and Duato, Jose and T. Rokicki",
    	abstract = "In this paper, we analyze in depth the effect of using ITB in the network, showing that they not only serve for guaranteeing minimal routing, but also that they are a powerful mechanism able to balance network traffic and reduce network contention. To demonstrate these capabilities, we apply the ITB mechanism to improved routing schemes, such as DFS and smart-routing. These routing algorithms (without ITB) are able to improve the performance of up*/down* by 30 percent and 90 percent, respectively, for a 32-switch network. The evaluation results show that, when ITB are used together with these improved routing algorithms, network throughput achieved by DFS and smart-routing can still be improved by 56 percent and 23 percent, respectively. However, smart-routing requires a time to compute the routing tables that rapidly grows with network size, it being impossible in practice to build networks with more than 32 switches. This high computational cost is mainly motivated by the need of obtaining deadlock-free routing tables. However, when ITB are used, one can decouple the stages of computing routing tables and breaking cycles. Moreover, as stated above, ITB can be used to reduce network contention. In this way, in this paper, we also propose a completely new routing algorithm that tries to balance network traffic by using a simple and low time consuming strategy. The proposed algorithm guarantees deadlock freedom and reduces network contention with the use of ITB. The evaluation results show that our algorithm obtains unprecedented throughputs in 32-switch networks, tripling the original up*/down* and almost doubling smart-routing.",
    	title = "{A}pplying in-transit buffers to boost the performance of networks with source routing",
  82. Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Supporting fully adaptive routing in InfiniBand networks. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International. April 2003, 10 pp.. URL, DOI BibTeX

    	author = "Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "InfiniBand is a new standard for communication between processing nodes and I/O devices as well as for interprocessor communication. The InfiniBand Architecture (IBA) supports distributed routing. However, routing in IBA is deterministic because forwarding tables store a single output port per destination ID. This prevents packets from using alternative paths when the requested output port is busy. Despite the fact that alternative paths could be selected at the source node to reach the same destination node, this is not effective enough to improve network performance. However, using adaptive routing could help to circumvent the congested areas in the network, leading to an increment in performance. In this paper, we propose a simple strategy to implement forwarding tables for IBA switches that support adaptive routing while still maintaining compatibility with the IBA specs. Adaptive routing can be enabled or disabled individually for each packet at the source node. Also, the proposed strategy enables the use in IBA of fully adaptive routing algorithms without using additional network resources to improve network performance. Evaluation results show that extending IBA switch capabilities with fully adaptive routing noticeably increases network performance. In particular, network throughput increases up to an average factor of 3.9.",
    	title = "{S}upporting fully adaptive routing in {I}nfini{B}and networks",
  83. Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Supporting adaptive routing in InfiniBand networks. In Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on. 2003, 165 - 172. URL, DOI BibTeX

    	author = "Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "InfiniBand is a new standard for communication between processing nodes and I/O devices as well as for interprocessor communication. The InfiniBand Architecture (IBA) supports distributed deterministic routing because forwarding tables store a single output port per destination ID. This prevents packets from using alternative paths when the requested output port is busy. Despite the fact that alternative paths could be selected at the source node to reach the same destination node, this is not effective enough to improve network performance. However using adaptive routing could help to circumvent the congested areas in the network, leading to an increment in performance. In this paper we propose a simple strategy to implement forwarding tables for IBA switches that supports adaptive routing while still maintaining compatibility with the IBA specifications. Adaptive routing can be individually enabled or disabled for each packet at the source node. The proposed strategy enables the use in IBA of any adaptive routing algorithm with an acyclic channel dependence graph. In this paper, we have taken advantage of the partial adaptivity provided by the well-known up*/down* routing algorithm. Evaluation results show that extending IBA switch capabilities with adaptive routing may noticeably increase network performance. In particular network throughput improvement can be, on average, as high as 46%.",
    	title = "{S}upporting adaptive routing in {I}nfini{B}and networks",
  84. Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Supporting fully adaptive routing in InfiniBand networks. 2003, 10 pp. -. URL BibTeX

    	author = "Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "InfiniBand is a new standard for communication between processing nodes and I/O devices as well as for interprocessor communication. The InfiniBand Architecture (IBA) supports distributed routing. However, routing in IBA is deterministic because forwarding tables store a single output port per destination ID. This prevents packets from using alternative paths when the requested output port is busy. Despite the fact that alternative paths could be selected at the source node to reach the same destination node, this is not effective enough to improve network performance. However, using adaptive routing could help to circumvent the congested areas in the network, leading to an increment in performance. In this paper, we propose a simple strategy to implement forwarding tables for IBA switches that support adaptive routing while still maintaining compatibility with the IBA specs. Adaptive routing can be enabled or disabled individually for each packet at the source node. Also, the proposed strategy enables the use in IBA of fully adaptive routing algorithms without using additional network resources to improve network performance. Evaluation results show that extending IBA switch capabilities with fully adaptive routing noticeably increases network performance. In particular, network throughput increases up to an average factor of 3.9",
    	title = "{S}upporting fully adaptive routing in {I}nfini{B}and networks",
  85. Maria E Gomez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. VOQSW: a methodology to reduce HOL blocking in InfiniBand networks. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International. 2003, 10 pp.. DOI BibTeX

    	author = "Gomez, Maria E. and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "InfiniBand is a new switch-based standard interconnect for communication between processor nodes and I/O devices as well as for interprocessor communication. InfiniBand architecture allows switches to support up to 15 virtual lanes per port for data traffic. To route packets through a given virtual lane (VL), packets are labeled with a certain service level (SL) at injection time, and SLtoVL mapping tables are used at each switch to determine the VL to be used. Many previous works in the literature have shown that separate virtual lanes are able to reduce the influence of the well-known head-of-line (HOL) blocking effect on network performance. However, using virtual lanes to form separate virtual networks is not enough to eliminate the HOL blocking problem. Alternative solutions such as Virtual Output Queuing (VOQ) are able to eliminate it at the expense of modifying the switch buffer organization. In this paper, we propose an effective strategy to implement the VOQ scheme in IBA switches by using virtual lanes. This strategy does not require to modify the switch architecture, simply SL to VL tables must be properly filled. Evaluation results show that our proposed VOQ scheme is able to outperform the results obtained with the virtual network approach using the same number of resources. Moreover, the methodology proposed to implement the VOQ scheme in IBA only requires a small number of resources in order to significantly improve network throughput.",
    	title = "{VOQSW}: a methodology to reduce {HOL} blocking in {I}nfini{B}and networks",
  86. JC Sancho, Antonio Robles, Pedro Lopez, Jose Flich and Jose Duato. Routing in InfiniBand (TM) torus network topologies. In P Sadayappan and CS Yang (eds.). 2003 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS. 2003, 509-518. BibTeX

    	author = "JC Sancho and Robles, Antonio and Lopez, Pedro and Flich, Jose and Duato, Jose",
    	abstract = "InfiniBand is an interconnect standard for communication between processing nodes and I/O devices as well as for interprocessor communication (NOWs). The InfiniBand Architecture (IBA) defines a switch-based network with point-to-point links whose topology can be established by the customer When the performance is the primary concern regular topologies are preferred. Low-dimensional tori (2D and 3D) are some of the regular topologies most widely used in commercial parallel computers. Routing in torus requires the use of virtual channels. Although InfiniBand provides support for deterministic routing and virtual channels, they are selected at each switch by service level (SL) identifiers associated to packets and do not depend on packet destination. This makes routing algorithm implementation more complex. In particular, a large number of SLs may be required, which is a scarce resource. In this paper we analyze the way several routing strategies can be applied in tori InfiniBand networks, also evaluating their resource requirements. In particular, we analyze and compare the well-known e-cube and up{*}/down{*} routing algorithms and the Flexible routing algorithm recently proposed.",
    	title = "{R}outing in {I}nfini{B}and ({TM}) torus network topologies",
  87. Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Supporting adaptive routing in IBA switches. 2003, 441 - 456. URL BibTeX

    	author = "Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "InfiniBand is a new standard for communication between processing nodes and I/O devices as well as for interprocessor communication. The InfiniBand Architecture (IBA) supports distributed deterministic routing because forwarding tables store a single output port per destination ID. This prevents packets from using alternative paths when the requested output port is busy. Despite the fact that alternative paths could be selected at the source node to reach the same destination node, this is not effective enough to improve network performance. However, using adaptive routing could help to circumvent the congested areas in the network, leading to an increment in performance. In this paper, we propose a simple strategy to implement forwarding tables for IBA switches that supports adaptive routing while still maintaining compatibility with the IBA specs. Adaptive routing can be individually enabled or disabled for each packet at the source node. The proposed strategy enables the use in IBA of any adaptive routing algorithm with an acyclic channel dependence graph. In this paper, we have taken advantage of the partial adaptivity provided by the well-known up*/down* routing algorithm. Evaluation results show that extending IBA switch capabilities with adaptive routing may noticeably increase network performance. In particular, network throughput improvement can be, on average, as high as 66%. © 2003 Elsevier B.V. All rights reserved.",
    	title = "{S}upporting adaptive routing in {IBA} switches",
  88. J C Sancho, Juan Carlos Martinez, Antonio Robles, Pedro Lopez, Jose Flich and Jose Duato. Performance evaluation of COWS under real parallel applications. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International. 2003, 10 pp.. DOI BibTeX

    	author = "J.C. Sancho and Martinez, Juan Carlos and Robles, Antonio and Lopez, Pedro and Flich, Jose and Duato, Jose",
    	abstract = "Clusters of workstations (COWS) are often arranged as a switch-based network with irregular topology. Usually, the evaluation of interconnection networks for COWS has been carried out by simulation using synthetic traffic and by traces from real parallel applications. Although both types of traffics are used as a first approximation of the behavior of the system, a more accurate behavior can be obtained by using real parallel applications. In this paper, a new simulation framework has been developed in order to evaluate interconnection networks under real parallel applications by using an execution-driven simulator. Moreover, the new simulator can be used to evaluate the impact on the performance of the whole system of several design parameters in addition to the interconnection network. Evaluation results show that the execution time of real parallel applications can be reduced by using an effective routing algorithm. Moreover, in some cases, the achieved improvements are higher than the ones achieved by improving other design issues, such as the processor instruction issue rate, the cache size or the network bandwidth.",
    	title = "{P}erformance evaluation of {COWS} under real parallel applications",
  89. Jose Flich, Pedro Lopez, M P Malumbres and Jose Duato. Boosting the performance of Myrinet networks. Parallel and Distributed Systems, IEEE Transactions on 13(11):1166 - 1182, November 2002. URL, DOI BibTeX

    	author = "Flich, Jose and Lopez, Pedro and M.P. Malumbres and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. These networks allow the customer to connect processors using irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. Some of these networks use source routing and wormhole switching. In particular, we are interested in Myrinet networks because it is a well-known commercial product and its behavior can be controlled by the software running in network interfaces (Myrinet Control Program, MCP). Usually, the Myrinet network uses up*/down* routing for computing the paths for every source-destination pair. We propose the In-Transit Buffer (ITB) mechanism to improve network performance. We apply the ITB mechanism to NOWs with up*/down* source routing, like Myrinet, analyzing its behavior on both networks with regular and irregular topologies. The proposed scheme can be implemented on Myrinet networks by only modifying the MCP, without changing the network hardware. We evaluate by simulation several networks with different traffic patterns using timing parameters taken from the Myrinet network. Results show that the current routing schemes used in Myrinet networks can be strongly improved by applying the ITB mechanism. In general, our proposed scheme is able to double the network throughput on medium and large NOWs. Finally, we present a first implementation of the ITB mechanism on a Myrinet network.",
    	title = "{B}oosting the performance of {M}yrinet networks",
  90. Jose Flich, Pedro Lopez, M P Malumbres and Jose Duato. Boosting the performance of Myrinet networks. Parallel and Distributed Systems, IEEE Transactions on 13(7):693 -709, July 2002. URL, DOI BibTeX

    	author = "Flich, Jose and Lopez, Pedro and M.P. Malumbres and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. These networks allow the customer to connect processors using irregular topologies, providing the wiring flexibility, scalability and incremental expansion capability required in this environment. Some of these networks use source routing and wormhole switching. In particular, we are interested in Myrinet networks because they are a well-known commercial product and their behavior can be controlled by the software running on the network interfaces (the Myrinet Control Program, MCP). Usually, the Myrinet network uses up*/down* routing for computing the paths for every source-destination pair. In this paper, we propose an in-transit buffer (ITB) mechanism to improve the network performance. We apply the ITB mechanism to NOWs with up*/down* source routing, like the Myrinet, analyzing its behavior on networks with both regular and irregular topologies. The proposed scheme can be implemented on Myrinet networks by simply modifying the MCP, without changing the network hardware. We evaluate by simulation several networks with different traffic patterns using timing parameters taken from the Myrinet network. The results show that the current routing schemes used in Myrinet networks can be strongly improved by applying the ITB mechanism. In general, our proposed scheme is able to double the network throughput on medium and large NOWs. Finally, we present a first implementation of the ITB mechanism on a Myrinet network",
    	title = "{B}oosting the performance of {M}yrinet networks",
  91. Maria E Gomez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Evaluation of routing algorithms for InfiniBand networks. 2002, 775 - 80. BibTeX

    	author = "Gomez, Maria E. and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "Storage area networks (SAN) provide the scalability required by the IT servers. The InfiniBand (IBA) interconnect is very likely to become the de facto standard for SAN as well as for NOW. The routing algorithm is a key design issue in irregular networks. Moreover, as several virtual lanes can be used and different network issues can be considered, the performance of the routing algorithms may be affected. In this paper we evaluate three existing routing algorithms (up*/down*, DFS, and smart-routing) suitable for being applied to IBA. Evaluation has been performed by simulation under different synthetic traffic patterns and I/O traces. Simulation results show that the smart-routing algorithm achieves the highest performance",
    	title = "{E}valuation of routing algorithms for {I}nfini{B}and networks",
  92. Jose Flich, M P Malumbres, Pedro Lopez and Jose Duato. Removing the latency overhead of the ITB mechanism in COWs with source routing. 2002, 463 - 70. URL BibTeX

    	author = "Flich, Jose and M.P. Malumbres and Lopez, Pedro and Duato, Jose",
    	abstract = "Clusters of workstations (COWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. The in-transit buffer (ITB) mechanism can improve network performance when applied to COWs with irregular topology and source routing. This mechanism considerably improves the performance of this kind of network when compared to current source routing algorithms; however, it introduces a latency penalty. An implementation of this mechanism was performed, showing that the latency overhead of the mechanism may be noticeable, especially for short messages and at low network loads. In this paper, we analyze in detail the latency overhead of ITBs, proposing several mechanisms to reduce, hide and remove it. Firstly, we show, by simulation, the effect of an ITB implementation that is much slower than the one implemented. Then we propose three mechanisms that try to overcome the latency penalty. All the mechanisms are simple and can be easily implemented; also, they are out of the critical path of the ITB packet-processing procedure. The results show very good behaviour of the proposed mechanisms, considerably reducing or even completely removing the latency overhead",
    	title = "{R}emoving the latency overhead of the {ITB} mechanism in {COW}s with source routing",
  93. Jose Flich, Pedro Lopez, J C Sancho, Antonio Robles and Jose Duato. Improving InfiniBand routing through multiple virtual networks. 2002, 49 - 63. BibTeX

    	author = "Flich, Jose and Lopez, Pedro and J.C. Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "InfiniBand is very likely to become the de facto standard for communication between nodes and I/O devices as well as for interprocessor communication. Often, the interconnection pattern is irregular. Up*/down* is the most popular routing scheme currently used in NOWs with irregular topologies. However, the main drawbacks of up*/down* routing are the unbalanced channel utilization and the difficulties to route most packets through minimal paths, which negatively affects network performance. Using additional virtual lanes can improve up*/down* routing performance by reducing the head-of-line blocking effect, but its use is not aimed to remove its main drawbacks. We propose a methodology that uses a reduced number of virtual lanes in an efficient way to achieve a better traffic balance and a higher number of minimal paths. This methodology is based on routing packets simultaneously through several properly selected up*/down* trees. To guarantee deadlock freedom, each up*/down* tree is built over a different virtual network. Simulation results, show that the proposed methodology increases throughput up to an average factor ranging from 1.18 to 2.18 for 8, 16, and 32-switch networks by using only two virtual lanes. For larger networks with an additional virtual lane, network throughput is tripled, on average",
    	title = "{I}mproving {I}nfini{B}and routing through multiple virtual networks",
  94. P J Garcia, M D Mora, F J Alfaro, J L Sanchez and Jose Flich. Evaluation of alternative arbitration policies fo myrinet switches. In Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, Abstracts and CD-ROM. 2002, 162 -169. BibTeX

    	author = "P.J. Garcia and M.D. Mora and F.J. Alfaro and J.L. Sanchez and Flich, Jose",
    	title = "{E}valuation of alternative arbitration policies fo myrinet switches",
  95. J C Sancho, Antonio Robles, Jose Flich, Pedro Lopez and Jose Duato. Effective methodology for deadlock-free minimal routing in InfiniBand networks. In Parallel Processing, 2002. Proceedings. International Conference on. 2002, 409 - 418. DOI BibTeX

    	author = "J.C. Sancho and Robles, Antonio and Flich, Jose and Lopez, Pedro and Duato, Jose",
    	abstract = "The InfiniBand Architecture (IBA) defines a switch-based network with point-to-point links whose topology is arbitrarily established by the customer. We propose a simple and effective methodology for designing deadlock-free routing strategies that are able to route packets through minimal paths in InfiniBand networks. This methodology can meet the trade-off between network performance and the number of resources dedicated to deadlock avoidance. Evaluation results show that the resulting routing strategies significantly outperform up*/down* routing. In particular, throughput improvement ranges, on average, from 1.33 for small networks to 4.05 for large networks. Also, it is shown that just two virtual lanes and three service levels are enough to achieve more than 80% of the throughput improvement achieved by the best proposed routing strategy (the one that always provides minimal paths without limiting the number of resources).",
    	title = "{E}ffective methodology for deadlock-free minimal routing in {I}nfini{B}and networks",
  96. Jose Flich, Pedro Lopez, Perez M Malumbres and Jose Duato. Boosting the performance of Myrinet networks. IEEE Transactions on Parallel and Distributed Systems 13(7):693 - 709, 2002. URL BibTeX

    	author = "Flich, Jose and Lopez, Pedro and M. Perez Malumbres and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. These networks allow the customer to connect processors using irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. Some of these networks use source routing and wormhole switching. In particular, we are interested in Myrinet networks because it is a well-known commercial product and its behavior can be controlled by the software running in network interfaces (Myrinet Control Program, MCP). Usually, the Myrinet network uses up*/down* routing for computing the paths for every source-destination pair. In this paper, we propose the In-Transit Buffer (ITB) mechanism to improve network performance. We apply the ITB mechanism to NOWs with up*/down* source routing, like Myrinet, analyzing its behavior on both networks with regular and irregular topologies. The proposed scheme can be implemented on Myrinet networks by only modifying the MCP, without changing the network hardware. We evaluate by simulation several networks with different traffic patterns using timing parameters taken from the Myrinet network. Results show that the current routing schemes used in Myrinet networks can be strongly improved by applying the ITB mechanism. In general, our proposed scheme is able to double the network throughput on medium and large NOWs. Finally, we present a first implementation of the ITB mechanism on a Myrinet network.",
    	title = "{B}oosting the performance of {M}yrinet networks",
  97. J C Sancho, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Analyzing the influence of virtual lanes on the performance of infiniband networks. In Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, Abstracts and CD-ROM. 2002, 166 -175. BibTeX

    	author = "J.C. Sancho and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	title = "{A}nalyzing the influence of virtual lanes on the performance of infiniband networks",
  98. Salvador Coll, Jose Flich, M P Malumbres, Pedro Lopez, Jose Duato and F J Mora. A first implementation of in-transit buffers on myrinet gm software. In Parallel and Distributed Processing Symposium., Proceedings 15th International. April 2001, 1640 -1647. URL, DOI BibTeX

    	author = "Coll, Salvador and Flich, Jose and M.P. Malumbres and Lopez, Pedro and Duato, Jose and F.J. Mora",
    	abstract = "Clusters of workstations (COWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. In these systems, the interconnection network connects hosts using irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. Myrinet is the most popular network used to build COWs. It uses source routing with the up*/down* routing algorithm. In previous papers we proposed the In-Transit Buffer (ITB) mechanism that improves network performance by allowing minimal routing, balancing network traffic, and reducing network contention. The mechanism is based on ejecting packets at some intermediate hosts and later re-injecting them into the network. Moreover, the ITB mechanism does not require additional hardware as it can be implemented on the software running at Myrinet network adapters. In this paper, we present a first implementation of the ITB mechanism on Myrinet GM software. We show the changes required in packet format and the modifications performed in the Myrinet Control Program (MCP). In addition, both the overhead introduced by the new code and the cost of extracting and re-injecting packets are measured. Results show that, even for this simple implementation, code overhead is only about 125 ns per packet and the message latency increase for messages that use the ITB mechanismis around 1.3 s per ITB. This is the first attempt to implement this mechanism, showing that a real implementation of ITBs is feasible on Myrinet COWs, and the associated overhead does not restrict the potential benefits of this mechanism.",
    	title = "{A} first implementation of in-transit buffers on myrinet gm software",
  99. Jose Flich, Pedro Lopez, M P Malumbres, Jose Duato and T Rokicki. Improving network performance by reducing network contention in source-based COWS with a low path-computation overhead. In Parallel and Distributed Processing Symposium., Proceedings 15th International. April 2001, 8 pp.. DOI BibTeX

    	author = "Flich, Jose and Lopez, Pedro and M.P. Malumbres and Duato, Jose and T. Rokicki",
    	abstract = "In previous papers, we have proposed the in-transit buffer mechanism (ITB) to improve network performance in COWs with irregular topology and source routing. This mechanism allows the use of minimal paths among all hosts, breaking cyclic dependences between channels by storing and later re-injecting packets at some intermediate hosts. However it also has two additional features that can improve even more network performance. First, the ITB mechanism reduces network contention because some messages are ejected from the network freeing network links. Second the ITB mechanism allows the use of any path between each source-destination pair improving traffic balance. In this paper we present a new routing algorithm that takes advantage of ITB by exploiting both issues: traffic balance and network contention reduction. The evaluation results show that network throughput can be considerably improved. On average, network throughput increases with respect to up*/down* by factors of 2.51 and 3.77 in 32 and 64-switch networks, respectively",
    	title = "{I}mproving network performance by reducing network contention in source-based {COWS} with a low path-computation overhead",
  100. Pedro Lopez, Jose Flich and Jose Duato. Deadlock-free routing in InfiniBandTM through destination renaming. In Parallel Processing, International Conference on, 2001.. 2001, 427 - 434. DOI BibTeX

    	author = "Lopez, Pedro and Flich, Jose and Duato, Jose",
    	abstract = "The InfiniBand Architecture (IBA) defines a switch-based network with point-to-point links that supports any topology defined by the user including irregular ones, in order to provide flexibility and incremental expansion capability. Routing in IBA is distributed, based on forwarding tables, and only considers the packet destination ID for routing within subnets in order to drastically reduce forwarding table size. Unfortunately, the forwarding tables for most of the previously proposed routing algorithms for irregular topologies consider both the destination ID and the input channel. Therefore, these popular routing algorithms for irregular topologies may not be usable in InfiniBand networks because they do nor conform to the IBA specifications. In this paper we propose an easy-to-implement strategy to adapt the forwarding tables already computed following any routing algorithm that considers the destination ID and the input channel into the required IBA forwarding table format. The resulting routing algorithm is deadlock-free on IBA. Indeed, the originally computed paths are not modified at all. Hence, the proposed strategy does not degrade performance with respect to the original routing scheme.",
    	title = "{D}eadlock-free routing in {I}nfini{B}and{TM} through destination renaming",
  101. Jose Flich, Pedro Lopez, M P Malumbres and Jose Duato. Improving routing performance in Myrinet networks. In Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International. 2000, 27 -32. URL, DOI BibTeX

    	author = "Flich, Jose and Lopez, Pedro and M.P. Malumbres and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. Typically, these networks connect processors using irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. In some of these networks, packets are delivered using source routing. Due to the irregular topology, the routing scheme is often non-minimal. In this paper we analyze the routing scheme used in Myrinet networks in order to improve its performance. We propose new routing algorithms that balance the utilization of the available routes and always use minimal paths. We show through simulation that the current routing schemes used in Myrinet networks can be improved by modifying only the routing software without increasing the software overhead significantly. The overall throughput can be doubled without modifying the network hardware",
    	title = "{I}mproving routing performance in {M}yrinet networks",
  102. Jose Flich, Pedro Lopez, M P Malumbres and Jose Duato. Improving the performance of regular networks with source routing. In Parallel Processing, 2000. Proceedings. 2000 International Conference on. 2000, 353 -361. URL, DOI BibTeX

    	author = "Flich, Jose and Lopez, Pedro and M.P. Malumbres and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. In these machines, the network connects processors using irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. Also, when performance is the primary concern, these network products are being used to build large commodity clusters with regular topologies. In previous papers, we have proposed the in-transit buffer mechanism to improve network performance, applying it to NOWs with irregular topology and source routing. This mechanism allows the use of minimal paths among all hosts, breaking cyclic dependencies between channels by storing and later re-injecting packers at some intermediate hosts. In this paper we apply the in-transit buffer mechanism to regular networks with source routing in order to improve their performance. Also, two path selection policies are evaluated. The first one will always choose the same minimal path from source to destination, whereas the second one will choose from different alternative minimal paths in a round-robin fashion. The evaluation results show that the overall network throughput can be doubled for large networks",
    	title = "{I}mproving the performance of regular networks with source routing",
  103. Jose Flich, M P Malumbres, Pedro Lopez and Jose Duato. Performance evaluation of a new routing strategy for irregular networks with source routing. 2000, 34 - 43. URL BibTeX

    	author = "Flich, Jose and M.P. Malumbres and Lopez, Pedro and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. Typically, these networks connect processors using irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. In some of these networks, messages are delivered using the up*/down* routing algorithm. However, the up*/down* routing scheme is often non-minimal. Also, some of these networks use source routing. With this technique, the entire path to destination is generated at the source host before the message is sent. In this paper we develop a new mechanism in order to improve the performance of irregular networks with source routing, increasing overall throughput. With this mechanism, messages always use minimal paths. To avoid possible deadlocks, when necessary, routes between a pair of hosts are divided into sub-routes, and a special kind of virtual cut-through is performed at some intermediate hosts. We evaluate the new mechanism by simulation using parameters taken from the Myrinet network. We show that the current routing schemes used in Myrinet can be improved by modifying only the routing software without increasing its overhead significantly and, most importantly, without modifying the network hardware. The benefits of using the new routing scheme are noticeable for networks with 16 or more switches, and increase with network size. For 32 and 64-switch networks, throughput is increased on average by a factor ranging from 1.3 to 3.3",
    	title = "{P}erformance evaluation of a new routing strategy for irregular networks with source routing",
  104. Jose Flich, Pedro Lopez, M P Malumbres and Jose Duato. Improving the performance of regular networks with source routing. 2000, 353 - 61. URL BibTeX

    	author = "Flich, Jose and Lopez, Pedro and M.P. Malumbres and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. In these machines, the network connects processors using irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. Also, when performance is the primary concern, these network products are being used to build large commodity clusters with regular topologies. In previous papers, we have proposed the in-transit buffer mechanism to improve network performance, applying it to NOWs with irregular topology and source routing. This mechanism allows the use of minimal paths among all hosts, breaking cyclic dependencies between channels by storing and later re-injecting packers at some intermediate hosts. In this paper we apply the in-transit buffer mechanism to regular networks with source routing in order to improve their performance. Also, two path selection policies are evaluated. The first one will always choose the same minimal path from source to destination, whereas the second one will choose from different alternative minimal paths in a round-robin fashion. The evaluation results show that the overall network throughput can be doubled for large networks",
    	title = "{I}mproving the performance of regular networks with source routing",
  105. Jose Flich, M P Malumbres, Pedro Lopez and Jose Duato. Improving routing performance in Myrinet networks. 2000, 27 - 32. URL BibTeX

    	author = "Flich, Jose and M.P. Malumbres and Lopez, Pedro and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. Typically, these networks connect processors using irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. In some of these networks, packets are delivered using source routing. Due to the irregular topology, the routing scheme is often non-minimal. In this paper we analyze the routing scheme used in Myrinet networks in order to improve its performance. We propose new routing algorithms that balance the utilization of the available routes and always use minimal paths. We show through simulation that the current routing schemes used in Myrinet networks can be improved by modifying only the routing software without increasing the software overhead significantly. The overall throughput can be doubled without modifying the network hardware",
    	title = "{I}mproving routing performance in {M}yrinet networks",
  106. Jose Flich, Pedro Lopez, M P Malumbres, Jose Duato and T Rokicki. Combining in-transit buffers with optimized routing schemes to boost the performance of networks with source routing. 2000, 300 - 9. BibTeX

    	author = "Flich, Jose and Lopez, Pedro and M.P. Malumbres and Duato, Jose and T. Rokicki",
    	abstract = "In previous papers we proposed the ITB mechanism to improve the performance of up*/down* routing in irregular networks with source routing. With this mechanism, both minimal routing and a better use of network links are guaranteed, resulting on an overall network performance improvement. In this paper, we show that the ITB mechanism can be used with any source routing scheme in the NOW environment. In particular, we apply ITB to DFS and Smart routing algorithms, which provide better routes than up*/down* routing. Results show that ITB strongly improves DFS (by 63%, for 64-switch networks) and Smart throughput (23%, for 32-switch networks)",
    	title = "{C}ombining in-transit buffers with optimized routing schemes to boost the performance of networks with source routing",
  107. Jose Flich, M P Malumbres, Pedro Lopez and Jose Duato. Performance evaluation of networks of workstations with hardware shared memory model using execution-driven simulation. In Parallel Processing, 1999. Proceedings. 1999 International Conference on. 1999, 146 -153. DOI BibTeX

    	author = "Flich, Jose and M.P. Malumbres and Lopez, Pedro and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. Typically, these networks connect processors using irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. Similar to the evolution of parallel computers, NOWs are also evolving from distributed memory to shared memory programming model. However, physical distances between processors are longer in NOWs than in tightly-coupled distributed shared-memory multiprocessors (DSMs), leading to higher message latency and lower network bandwidth. Therefore, the network may be a bottleneck when executing some parallel applications in a NOW supporting a shared-memory programming paradigm. In this paper we analyze whether the interconnection network is able to efficiently handle the traffic generated in a NOW with the shared memory model. In particular, we are interested in analyzing the influence of the routing mechanism in the performance of the system. We evaluate the behavior of a NOW with irregular topology by means of an execution-driven simulator using SPLASH-2 applications as the input load. The results show that the routing algorithm can considerably reduce the total execution time of applications. In particular routing adaptivity can reduce the total execution time by 58% in some applications. These results confirm the behavior observed in previous works using synthetic traffic loads",
    	title = "{P}erformance evaluation of networks of workstations with hardware shared memory model using execution-driven simulation",
  108. Jose Flich, Pedro Lopez, M P Malumbres and Jose Duato. Edinet: an execution driven interconnection network simulator for DSM systems. 1998, 336 - 9. BibTeX

    	author = "Flich, Jose and Lopez, Pedro and M.P. Malumbres and Duato, Jose",
    	abstract = "Evaluation studies on interconnection networks for distributed memory multiprocessors usually assume synthetic or trace-driven workloads. However, when the final design choices must be done a more precise evaluation study should be performed. In this paper, we describe a new execution-driven simulation tool to evaluate interconnection networks for distributed memory multiprocessors using real application workloads. As an example, we have developed a NCC-NUMA memory model and obtained some simulation results from the SPLASH-2 suite, using different network routing algorithms",
    	title = "{E}dinet: an execution driven interconnection network simulator for {DSM} systems",
Head-of-Line Blocking Reduction in Power-Efficient Networks-on-Chip. Jose Flich (Network-On-Chip)

Improving Network-on-Chip Performance in Multi-Core Systems. Jose Flich (Network-On-Chip)

Cost Effective Routing Implementations for On-chip Networks. Jose Flich (Network-On-Chip)

High Performance and Power Efficient On-Chip Network Designs through Multiple Injection Ports. Jose Flich (Network-On-Chip)

Floorplan-Aware High Performance NoC Design. Jose Flich, Federico Silla (Network-On-Chip)

Smart Memory and Network-On-Chip Design for High-Performance Shared-Memory Chip Multiprocessors. Jose Flich (Network-On-Chip)

High-performance arch. for high-radix switches. Jose Flich, Jose Duato (Switch Architectures)

Design and Implementation of Efficient Topology Agnostic Routing Algorithms for Interconnection Networks. Jose Flich (Routing Algorithms)

Efficient mechanisms to provide fault tolerance in interconnection networks for PC Clusters. Jose Flich, Antonio Robles (Fault Tolerance)