[OpenPOWER-HDL-Cores] microwatt / libresoc dcache
paulus at ozlabs.org
Fri May 7 02:48:01 UTC 2021
On Fri, May 07, 2021 at 12:27:38AM +0100, Luke Kenneth Casson Leighton wrote:
> On Thursday, May 6, 2021, Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> > a normal SRAM you would expect a 1 clock cycle delay, all good. except
> > here, an *extra* cycle of delay is added. after assertion of the read it
> > is *two* cycles before the data appears on the read data output.
> > i have no idea why, and i'm not skilled enough at VHDL to work out how to
> > remove it.
> illustration, control_writeback:
> the logic there is reading all its decisions from r1. dout.valid etc, the
> data itself, all comes from r1.
> r1 was set up with a single clock delay from r0, r0 was set up
> the question i have is: is control_writeback making its decisions from the
> *current* r1 or is it making its decisions from the *future* r1?
That block is combinatorial, since it's process(all) and has no if
rising_edge(clk) then ... statement. So it's the current r1.
> i have seen VHDL set values that get used in the same cycle, like variables
> in normal software programs. very odd to have that in an HDL.
A combinatorial block like that is going to get executed repeatedly if
necessary until things settle, without waiting for any clock edges.
> how does dout.valid get set? it is equal to r1.ls_valid. for LOADs, how
> does r1.ls_valid get set? in NC_LOAD_ACK and RELOAD_WAIT_ACK.
> all good... except... does that setting of r1.ls_valid get picked up *in
> the same cycle* by control_writeback?
Yes. It's a combinatorial process.
> or does control_writeback react to
> the values in r1 on the *next* cycle?
> this is an aspect of VHDL that has me confused. also, if it is the *next*
> cycle it would explain why we are seeing a 2 cycle delay on LD cache reads.
If you mean you're seeing valid data two cycles after presenting the
address, that's how it's meant to work. The constraint is really the
TLB and cache tag matching and consequent hit/miss detection and cache
way determination, which takes up essentially the whole of cycle 1.
That is, cycle 0: address is presented, cycle 1: TLB & cache tag
matching, cycle 2: way multiplexing and data out. That means the data
is valid early enough in cycle 2 that we can do other useful work like
data formatting in loadstore1 during cycle 2.
Of course, you can always cram more into a cycle if you're willing to
increase the cycle time, but in fact I want to take Microwatt in the
other direction, towards more pipelining and shorter cycle times,
given that registers are close to free on FPGAs.
More information about the OpenPOWER-HDL-Cores