[OpenPOWER-HDL-Cores] [Libre-soc-dev] microwatt / libresoc dcache
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sat May 8 10:39:42 UTC 2021
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
On Fri, May 7, 2021 at 6:47 AM Paul Mackerras <paulus at ozlabs.org> wrote:
> The other point, which you don't seem to have taken in yet, is that
> this is NOT the critical path. There is no point getting the data out
> substantially before the hit_way is known, and for the sake of timing,
> that has a register (r1.hit_way) in the path. So r1.hit_way is not
> valid until cycle 2 (counting cycle 0 as the one where the address is
> presented to the dcache).
so my first instincts were:
* i am advocating setting up everything that's "input" to writeback_control
as a separate variable (combinatorially written to)
* all of dcache_request which calcs req_hit_way which goes in r1.hit_way
is combinatorial, agreed.
* r1.hit_way is used to index cache_out therefore this would be bad to make
combinatorial as well data_out := cache_out(r1.hit_way);
but then i noticed that in dcache_fast_hit r1.hit_way is set up in a
so the capture of req_hit_way at cycle 2 (using the definition above, cycle 0
is address), this would still be in that rising_edge() block in dcache_fast_hit.
except... what i am effectively saying is, that req_hit_way would
propagate through to write_back_control (the two paths now being connected
through a proposed alternative data structure), and that would be bad.
yep, agree with your assessment, paul, i'm all caught up now.
solutions that i have seen to this, used by intel, have been to make multi-level
PTE caches. an 8-entry single-cycle, followed by (guessing) 256-entry two-cycle
followed by (guessing) 4k three-cycle.
More information about the OpenPOWER-HDL-Cores