

Experimental results demonstrate that circuit performance can be improved by up to 4% (average 1.5%) compared with that after extreme retiming and sizing, while the increase of area is still negligible. In addition, we further enhance the optimization with VirtualSync+ by fine-tuning with commercial design tools, e.g., Design Compiler from Synopsys, to achieve more accurate result. By removing clock-to-q delays and setup time requirements of flip-flops on critical paths, the performance of a circuit can be pushed even beyond the limit of traditional sequential designs. Timing constraints are still satisfied at the boundary of the optimized circuit to maintain a consistent interface with existing designs. In this paper, we propose a new timing model, VirtualSync+, in which signals, specially those along critical paths, are allowed to propagate through several sequential stages without flip-flops. Although this fully synchronous style can reduce design efforts significantly, it may affect circuit performance negatively, because sequential components can only introduce delays into signal propagations but never accelerate them. Logic computations are aligned at and thus isolated by flip-flop stages. In digital circuit designs, sequential components such as flip-flops are used to synchronize signal propagations. Post place-and-route results demonstrate that our new 3-phase designs achieve retiming algorithm yields 23.5% and 23.9% average power reductions compared to more traditional FF and master-slave based alternatives across a board range of benchmarks with no degradation in performance and on average less area. This paper proposes a new retiming algorithm for 3-phase latch designs that further reduces the number of required latches while also considering the impact of clock gating. Recent research has demonstrated significant power benefits of an automatic conversion algorithm from FF-based to 3-phase latch-based designs. The flow includes a novel 3-phase aware retiming algorithm for power and area optimization. This paper presents a CAD flow that converts any arbitrarily complex single-clock-domain FF-based RTL design into an efficient 3-phase latch-based design. Latches have the advantages of timing-borrowing, smaller cell area, lower input capacitance, and lower power compared to flip-flops (FFs).
