[OpenPOWER-HDL-Cores] GPR-to-FPR and FPR-to-GPR move operations
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sat May 29 09:04:58 UTC 2021
Lauri is kindly investigating MP3 in SVP64 assembler and it's turning out to
be a good test of what opcodes are needed. in the bi-weekly meeting last
week, Paul, we mentioned briefly the need for GPR-to-FPR and FPR-to-GPR
mv operations (straight bit-wise) given that VSX/SIMD will not be added to
Libre-SOC as a GPU / VPU.
Jeff Bush's Nyuzi paper makes it clear that the cost of transferring
workloads through L1/L2 cache is hugely expensive, and describes the efforts
he went to to reduce power consumption
additionally, Lauri points out that just to get zero into an FPR is also
costly: it requires a LD operation which takes up data segment space
and unnecessarily activates both memory as well as L2 and L1 data
cache paths when compared to a MV-from-GPR operation.
in addition to that, in an Out-of-Order system the cycle latency of the
path through L1 cache will be much higher than a straight MV operation
(which in some micro-architectures may be a macro-op-fused operation).
* this in turn requires a larger number of "in-flight" operations
* this in turn increases the number of Reservation Stations
* this in turn increases O(N^2) the size of Dependency Matrices
the impact therefore of using the LD-ST path is extremely costly: all
of which points to a straight bit-copy between GPR and FPR being
in some micro-architectures the MV may end up being a macro-op
fused operation: it may end up actually being removed entirely from
the pipelines, instead being used to mark the source or destination
of INT or FP operations as targetting the *other* regfile:
fmv2int fp5, r3
addi r3, 0x5
addi fp5, 0x5
it should be clear that when adding bitmanip operations as well, the
possibilities expand to be able to perform bitmanipulation on FPRs.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenPOWER-HDL-Cores