[OpenPOWER-HDL-Cores] Compressed (16-bit) OpenPOWER instruction extension
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Tue Nov 17 03:43:17 UTC 2020
similar to VLE we are developing a 16-bit instruction encoding in
order to (significantly) reduce power consumption and executable size
many ISA studies show that reductions in I-cache size / usage result
in a quadratic (square law) power reduction. both RISC-V and ARM
benefit from compressed opcodes.
whilst originally intended to operate alongside Simple-V (which brings
predication, element width overrides such as FP16, and Vectorisation
to the *scalar* v3.0B OpenPOWER ISA), the C encoding is orthogonal and
SV is *not* a prerequisite for C.
like VLE the encoding fits "on top" of the existing OpenPOWER ISA.
also just like VLE it requires "viewing" the instruction data stream
in BE format, due to OpenPOWER using MSB0 and the Major Opcode being
in the 4th byte of a LE-formatted stream.
unlike VLE we are not recommending new 32-bit encodings be added. if
there is not enough 16 bit space to fit an immediate for example then
rather than extend the 16 bit encoding with an extra 16 bits of
immediate (to effectively create a duplicate of an existing 32 bit
v3.0B instruction) it is simply expected that the existing 32-bit
v3.0B instructions be used.
to increase the compression ratio significantly, three factors are at work:
1) two Major v3.0B opcodes are required (one odd, one even because the
LSB is used as the MSB of the C encoding). this gives 11 bits of
initial C space rather than a pressurised 10 bit one.
2) "Bank" Selection is a planned feature. Entire swathes (or even:
all except the Bank Select instruction, illegal and nop) of C encoding
may be replaced with alternatives, targetted at domain-specific
contexts (Video, DSP, 3D, Audio)
3) from the first C instruction in v3.0B Mode a decision can be made
to either enter a much "richer" Compressed encoding mode (using all 16
bits) or to continue in v3.0B mode, after the execution of the 10bit C
this latter has further refinement, where a single v3.0B instruction
may be executed before returning to C Mode, or v3.0B Mode return may
[alternatives (use an entire 16 bit instruction to enter/exit 16 bit
mode) result in fixed overhead that can be as high as 100%, making the
entire C exercise fairly pointless].
three primary design objectives are at play:
1) there shall be no new instructions added to C that cannot be mapped
to 32 bit v3.0B equivalents; similarly, no v3.0B instruction behaviour
shall be modified *by* the C encoding or its use.
2) the entirety of C shall be dynamically disableable and enableable
at all times, at any time, as well as dynamically switcheable from any
mode to any mode, so as to allow programs to primarily remain
OpenPOWER ISA compatible, yet also to be able to take advantage of the
best of both encodings (this is best controlled by PCR)
this is particularly relevant given that C may have to clash with
v3.1B Prefixing (use the EXT001 Major Opcode). alternatives include
EXT000/009 and EXT006/005.
3) the driving factor in what goes in is based on Huffan-esque
encoding: the more popular the instruction, the higher priority it is
it is however critically important with (3) to appreciate the
difference between reducing dynamic runtime I-cache usage and reducing
static executable size (a priority for the embedded market where
saving on-board embedded NAND and RAM is a critical business
this latter means placing a higher priority on instructions used in
commonly-called inner loops. given that different domains will have
completely different execution profiles, the different priorities and
the bun-fights over encoding that could result, is precisely why the
CBank Switching was added.
thie is quite a lot of work to evaluate and ensure that it is useful
as a proposal for extending the OpenPOWER ISA. assistance and review
greatly appreciated and welcome, using either this list (until a
better one is available) or the libresoc bugtracker issue, which may
be found via the link right at the top of this message.
More information about the OpenPOWER-HDL-Cores