--- Log opened Tue Jun 24 00:00:48 2014 | ||
juliusb | stekern: cool thanks, it's something I'd like to document | 10:10 |
---|---|---|
juliusb | how to run the mor1kx verification env under FuseSoC but you should have given me enough to get started, thanks | 10:10 |
stekern | wallento: looks like it's your "keep in WRITE state on snoop_hit" that causes troubles | 10:22 |
stekern | + did you consider all possible cases when you removed IDLE? | 10:22 |
stekern | I know I said it might be redundant, but I was only thinking about the WRITE->READ case then | 10:23 |
stekern | it takes two cycles for a written value to be valid on the read port unless you have bypass registers in the RAM | 10:23 |
stekern | I think it might fail on Xilinx devices now | 10:24 |
stekern | but I might be wrong too, if you have considered that possibility, then it's probably fine | 10:25 |
stekern | ...no, actaully, I remodeled the RAMs so they should even fail in verilator if it's an issue | 10:28 |
stekern | but back to the actual problem, I don't understand the purpose of staying WRITE on snoop_hit? | 10:37 |
wallento | stekern: yes, thats exactly the issue i identified | 10:38 |
wallento | maybe you can help me | 10:38 |
wallento | problem is the adresses changes too fast | 10:39 |
wallento | its because the read->write transition is delayed | 10:39 |
wallento | i am not sure whether this can be handled by the LSU | 10:39 |
wallento | or i need to store it in the cache's state machine, what could be done when detecting we_i and snoop_read_tagmem | 10:39 |
stekern | which address changes too fast? | 10:40 |
wallento | dc_adr | 10:41 |
wallento | it already has the next read if we have a write->read sequence | 10:41 |
wallento | i think that may be a general issue in the lsu if the store buffer is full | 10:41 |
wallento | hence i think it may be better adressed there | 10:42 |
wallento | i will send you an elf file plus timestamp via mail | 10:42 |
stekern | hmm, it has nothing to do with the storebuffer being full, but what I don't understand is the purpose of staying in write state when the snoop_hit is asserted? | 10:43 |
wallento | i think it may also occur when the store buffer is full | 10:43 |
wallento | its if the write is not executed immediately | 10:43 |
wallento | the processor can change the address too fast | 10:43 |
stekern | but the write to cache is always a 1-cycle operation | 10:44 |
wallento | not if there is a snoop | 10:44 |
wallento | and if the store buffer is full its also delayed, correct? | 10:44 |
stekern | yeah... but I mean in the cache statemachine | 10:45 |
wallento | i implemented it to not have an extra snoop state | 10:45 |
wallento | otherwise we need some extra registers to store the context of an operation or so | 10:46 |
stekern | when you are in WRITE state, you've got to get out of there within one cycle (unless the next instruction upcoming instruction is a store) | 10:46 |
wallento | this is impossible if there is a snoop | 10:46 |
wallento | snoop needs to be handled instantl | 10:46 |
wallento | y | 10:46 |
wallento | otherwise we need backpressure | 10:46 |
wallento | and i think backpressure to the cpu is better than to the bus | 10:47 |
stekern | ...but you're not doing anything with the snoop in the WRITE state? | 10:47 |
wallento | yes, i am | 10:47 |
stekern | (I'm looking for a bit of an explanation of how things work) | 10:48 |
wallento | i need to check the tagmem if no snoop tagmem is there | 10:48 |
wallento | and i might need to access the tagmem on a hit | 10:48 |
wallento | thats the reason the version with extra tag memory for snooping "works" | 10:49 |
wallento | once you start reading the tagmem the probability to delay a write cycle increases | 10:49 |
stekern | ...but isn't the whole snoop mechanism just to invalidate the address that was "snoop written" | 10:49 |
wallento | yes | 10:49 |
wallento | but it needs two cycles | 10:49 |
wallento | let me explain: | 10:49 |
wallento | - you need to read the tag memory to check whether the cache has a copy. if OPTION_DCACHE_SNOOP_TAGMEM == "NONE" this is the normal tag memory, otherwise it reads from the duplicate snoopmem | 10:51 |
wallento | - if the tag matches you need to write. the write is shared among both memories if present | 10:51 |
wallento | - the first cycle is the snoop_read, the second is the snoop_check | 10:52 |
wallento | - if the tag memory is involved (i.e., there is only one tag memory), the snoop_read_tagmem and snoop_check_tagmem signals are up to obstruct the other cache operation | 10:53 |
wallento | - you cannot read from tag memory if snoop_read_tagmem matches | 10:53 |
wallento | - the hit result is not valid for normal operation if snoop_check_tagmem is up | 10:53 |
wallento | - REFILL is not affected at all, because once you are in there and get acks you cannot get snoops (as you hold the bus). this is an assumption for the moment, that snoops only come from the same bus you access | 10:55 |
wallento | - READ is also safe, as you pointed out it is the actual IDLE state.. It is only necessary to check that !snoop_check_tagmem when !cpu_hit | 10:56 |
wallento | - also the WRITE transitions are safe, we stay in WRITE as long as a snoop forbids to access the tagmem | 10:57 |
wallento | (there is a further fix i will commit soon) | 10:57 |
wallento | but you need to allow that WRITE is longer then one cycle | 10:57 |
wallento | the LSU can check this via the snoop_*_tagmem and snoop_hit signals | 10:58 |
wallento | but i am a little confused about where to do this properly, i think storebuffer_write is the correct place | 10:58 |
wallento | alternatively i can store the address in the cache, but I think the same could happen if store buffer is full, i will try to stress this situation and verify my statement ;) | 11:00 |
stekern | wouldn't it be easier to just stall the cpu on a snoop_hit and then invalidate the address via the INVALIDATE state? | 11:04 |
stekern | because, if you want to wait in WRITE state you will need to stall the cpu while doing so, that's why it's not working now | 11:05 |
wallento | if you do so, the performance will die | 11:07 |
wallento | its already bad if you don't have the extra snoop memory | 11:08 |
stekern | yeah, I think you can remove the non-extra snoop memory version, that's not going to ever work well | 11:08 |
wallento | when snoop and write occur concurrently, you have to backpressure one | 11:08 |
wallento | there is no way to avoid this | 11:08 |
stekern | I don't know what "backpressure" means in this context ;) | 11:09 |
wallento | that you can delay the upstream operation | 11:09 |
wallento | which means either tell the cpu to stall | 11:09 |
wallento | or the bus | 11:09 |
wallento | maybe once I narrow down the same can happen when the store buffer is full, this would be needed nevertheless | 11:10 |
wallento | can't we use the lsu_valid_o maybe? | 11:10 |
wallento | i am sorry my knowledge ends with the interface from LSU to the pipeline | 11:10 |
wallento | maybe you know | 11:10 |
stekern | I'll go for a walk and think about it a bit, bbl ;) | 11:13 |
wallento | enjoy your walk | 11:13 |
wallento | i will also rethink the whole thing, but i am quite sure that this is the last bit of it to work | 11:14 |
stekern | wallento: yes, negating lsu_valid_o will stall the cpu | 12:13 |
wallento | stekern: I think the problem comes from exec vs. ctrl, let me shortly describe the situation | 16:15 |
wallento | - there are two consecutive stores (or store followed by load) | 16:16 |
wallento | - exec_lsu_adr_i is A and exec_op_lsu_store_i is high during the first cycle | 16:17 |
wallento | - there is a snoop and therefore store_buffer_write stays low (i think the same applies if store buffer is full), lsu_valid_o is also low | 16:18 |
wallento | - dc_adr is A due to this situation | 16:19 |
wallento | - pipeline advances, next cycle: | 16:19 |
wallento | - ctrl_lsu_adr_i is A now and ctrl_op_store_i is high, the next write sets exec_lsu_adr_i to B and exec_op_store_i is high | 16:20 |
wallento | - no snoop, so that dc_we is one now, store_buffer_write is high | 16:20 |
wallento | - dc_adr is B | 16:21 |
wallento | - the write to A got lost | 16:21 |
wallento | which is the stack in my case, and the program crashes | 16:22 |
wallento | i will commit and send you an elf | 16:22 |
wallento | plus put the waveform somewhere | 16:22 |
wallento | i am not sure how to best solve this, maybe pipeline should not advance from exec to ctrl if lsu_valid_o is not set? Otherwise the LSU needs to track whether the exec operation is effective or not | 16:23 |
wallento | mmh, lsu_valid_o goes to wb_mux and not to ctrl | 16:27 |
wallento | https://github.com/wallento/mor1kx/commit/d6f2820c569e4fd0201cead505c8410f4a2ee808 | 16:27 |
wallento | http://lis.ei.tum.de/pub-download/hello.elf | 16:30 |
wallento | 195.26 us | 16:30 |
wallento | ah, its execute_ctrl :) | 16:33 |
wallento | sorry, i had a wrong one, its 195.63 | 16:55 |
stekern | wallento left again... | 20:47 |
stekern | I'll put something in the log to paste when he get back | 20:47 |
stekern | I probably need to read the snoop code more in depth, because to me it's not clear he invalidation works at all | 20:48 |
stekern | to me, the natural implementation would be as follows: | 20:49 |
stekern | 1) create a duplicate of the tag mem, to do snoop lookups from | 20:50 |
stekern | 2) when a snoop lookup is a hit, the pipeline is stalled and the cache line is invalidated | 20:50 |
stekern | as far as I can tell, 1 is now implemented | 20:51 |
stekern | but to me it seems that 2 doesn't happen, instead the snoop invalidation just hijacks the write port of the tag mem and discards the current operation | 20:57 |
stekern | I don't understand how that's supposed to work | 20:57 |
stekern | because, the current operation is most likely completely unrelated to the snoop hit | 20:58 |
stekern | oh, and I think we could just use a true_dpram for the "duplicate" tag mem | 21:14 |
stekern | actually, I don't think we'd need to stall at all then | 21:20 |
stekern | just use the other port to invalidate the snooped cahceline | 21:21 |
stekern | ...it'll need some changes to the state machine, since the true_dpram doesn't have the bypass logic though | 21:21 |
stekern | ah.. no, that's of course not going to work | 21:42 |
stekern | we'll still need to stall on the write | 21:43 |
--- Log closed Wed Jun 25 00:00:49 2014 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!