IRC logs for #openrisc Tuesday, 2014-06-24

--- Log opened Tue Jun 24 00:00:48 2014
juliusb	stekern: cool thanks, it's something I'd like to document	10:10
juliusb	how to run the mor1kx verification env under FuseSoC but you should have given me enough to get started, thanks	10:10
stekern	wallento: looks like it's your "keep in WRITE state on snoop_hit" that causes troubles	10:22
stekern	+ did you consider all possible cases when you removed IDLE?	10:22
stekern	I know I said it might be redundant, but I was only thinking about the WRITE->READ case then	10:23
stekern	it takes two cycles for a written value to be valid on the read port unless you have bypass registers in the RAM	10:23
stekern	I think it might fail on Xilinx devices now	10:24
stekern	but I might be wrong too, if you have considered that possibility, then it's probably fine	10:25
stekern	...no, actaully, I remodeled the RAMs so they should even fail in verilator if it's an issue	10:28
stekern	but back to the actual problem, I don't understand the purpose of staying WRITE on snoop_hit?	10:37
wallento	stekern: yes, thats exactly the issue i identified	10:38
wallento	maybe you can help me	10:38
wallento	problem is the adresses changes too fast	10:39
wallento	its because the read->write transition is delayed	10:39
wallento	i am not sure whether this can be handled by the LSU	10:39
wallento	or i need to store it in the cache's state machine, what could be done when detecting we_i and snoop_read_tagmem	10:39
stekern	which address changes too fast?	10:40
wallento	dc_adr	10:41
wallento	it already has the next read if we have a write->read sequence	10:41
wallento	i think that may be a general issue in the lsu if the store buffer is full	10:41
wallento	hence i think it may be better adressed there	10:42
wallento	i will send you an elf file plus timestamp via mail	10:42
stekern	hmm, it has nothing to do with the storebuffer being full, but what I don't understand is the purpose of staying in write state when the snoop_hit is asserted?	10:43
wallento	i think it may also occur when the store buffer is full	10:43
wallento	its if the write is not executed immediately	10:43
wallento	the processor can change the address too fast	10:43
stekern	but the write to cache is always a 1-cycle operation	10:44
wallento	not if there is a snoop	10:44
wallento	and if the store buffer is full its also delayed, correct?	10:44
stekern	yeah... but I mean in the cache statemachine	10:45
wallento	i implemented it to not have an extra snoop state	10:45
wallento	otherwise we need some extra registers to store the context of an operation or so	10:46
stekern	when you are in WRITE state, you've got to get out of there within one cycle (unless the next instruction upcoming instruction is a store)	10:46
wallento	this is impossible if there is a snoop	10:46
wallento	snoop needs to be handled instantl	10:46
wallento	y	10:46
wallento	otherwise we need backpressure	10:46
wallento	and i think backpressure to the cpu is better than to the bus	10:47
stekern	...but you're not doing anything with the snoop in the WRITE state?	10:47
wallento	yes, i am	10:47
stekern	(I'm looking for a bit of an explanation of how things work)	10:48
wallento	i need to check the tagmem if no snoop tagmem is there	10:48
wallento	and i might need to access the tagmem on a hit	10:48
wallento	thats the reason the version with extra tag memory for snooping "works"	10:49
wallento	once you start reading the tagmem the probability to delay a write cycle increases	10:49
stekern	...but isn't the whole snoop mechanism just to invalidate the address that was "snoop written"	10:49
wallento	yes	10:49
wallento	but it needs two cycles	10:49
wallento	let me explain:	10:49
wallento	- you need to read the tag memory to check whether the cache has a copy. if OPTION_DCACHE_SNOOP_TAGMEM == "NONE" this is the normal tag memory, otherwise it reads from the duplicate snoopmem	10:51
wallento	- if the tag matches you need to write. the write is shared among both memories if present	10:51
wallento	- the first cycle is the snoop_read, the second is the snoop_check	10:52
wallento	- if the tag memory is involved (i.e., there is only one tag memory), the snoop_read_tagmem and snoop_check_tagmem signals are up to obstruct the other cache operation	10:53
wallento	- you cannot read from tag memory if snoop_read_tagmem matches	10:53
wallento	- the hit result is not valid for normal operation if snoop_check_tagmem is up	10:53
wallento	- REFILL is not affected at all, because once you are in there and get acks you cannot get snoops (as you hold the bus). this is an assumption for the moment, that snoops only come from the same bus you access	10:55
wallento	- READ is also safe, as you pointed out it is the actual IDLE state.. It is only necessary to check that !snoop_check_tagmem when !cpu_hit	10:56
wallento	- also the WRITE transitions are safe, we stay in WRITE as long as a snoop forbids to access the tagmem	10:57
wallento	(there is a further fix i will commit soon)	10:57
wallento	but you need to allow that WRITE is longer then one cycle	10:57
wallento	the LSU can check this via the snoop_*_tagmem and snoop_hit signals	10:58
wallento	but i am a little confused about where to do this properly, i think storebuffer_write is the correct place	10:58
wallento	alternatively i can store the address in the cache, but I think the same could happen if store buffer is full, i will try to stress this situation and verify my statement ;)	11:00
stekern	wouldn't it be easier to just stall the cpu on a snoop_hit and then invalidate the address via the INVALIDATE state?	11:04
stekern	because, if you want to wait in WRITE state you will need to stall the cpu while doing so, that's why it's not working now	11:05
wallento	if you do so, the performance will die	11:07
wallento	its already bad if you don't have the extra snoop memory	11:08
stekern	yeah, I think you can remove the non-extra snoop memory version, that's not going to ever work well	11:08
wallento	when snoop and write occur concurrently, you have to backpressure one	11:08
wallento	there is no way to avoid this	11:08
stekern	I don't know what "backpressure" means in this context ;)	11:09
wallento	that you can delay the upstream operation	11:09
wallento	which means either tell the cpu to stall	11:09
wallento	or the bus	11:09
wallento	maybe once I narrow down the same can happen when the store buffer is full, this would be needed nevertheless	11:10
wallento	can't we use the lsu_valid_o maybe?	11:10
wallento	i am sorry my knowledge ends with the interface from LSU to the pipeline	11:10
wallento	maybe you know	11:10
stekern	I'll go for a walk and think about it a bit, bbl ;)	11:13
wallento	enjoy your walk	11:13
wallento	i will also rethink the whole thing, but i am quite sure that this is the last bit of it to work	11:14
stekern	wallento: yes, negating lsu_valid_o will stall the cpu	12:13
wallento	stekern: I think the problem comes from exec vs. ctrl, let me shortly describe the situation	16:15
wallento	- there are two consecutive stores (or store followed by load)	16:16
wallento	- exec_lsu_adr_i is A and exec_op_lsu_store_i is high during the first cycle	16:17
wallento	- there is a snoop and therefore store_buffer_write stays low (i think the same applies if store buffer is full), lsu_valid_o is also low	16:18
wallento	- dc_adr is A due to this situation	16:19
wallento	- pipeline advances, next cycle:	16:19
wallento	- ctrl_lsu_adr_i is A now and ctrl_op_store_i is high, the next write sets exec_lsu_adr_i to B and exec_op_store_i is high	16:20
wallento	- no snoop, so that dc_we is one now, store_buffer_write is high	16:20
wallento	- dc_adr is B	16:21
wallento	- the write to A got lost	16:21
wallento	which is the stack in my case, and the program crashes	16:22
wallento	i will commit and send you an elf	16:22
wallento	plus put the waveform somewhere	16:22
wallento	i am not sure how to best solve this, maybe pipeline should not advance from exec to ctrl if lsu_valid_o is not set? Otherwise the LSU needs to track whether the exec operation is effective or not	16:23
wallento	mmh, lsu_valid_o goes to wb_mux and not to ctrl	16:27
wallento	https://github.com/wallento/mor1kx/commit/d6f2820c569e4fd0201cead505c8410f4a2ee808	16:27
wallento	http://lis.ei.tum.de/pub-download/hello.elf	16:30
wallento	195.26 us	16:30
wallento	ah, its execute_ctrl :)	16:33
wallento	sorry, i had a wrong one, its 195.63	16:55
stekern	wallento left again...	20:47
stekern	I'll put something in the log to paste when he get back	20:47
stekern	I probably need to read the snoop code more in depth, because to me it's not clear he invalidation works at all	20:48
stekern	to me, the natural implementation would be as follows:	20:49
stekern	1) create a duplicate of the tag mem, to do snoop lookups from	20:50
stekern	2) when a snoop lookup is a hit, the pipeline is stalled and the cache line is invalidated	20:50
stekern	as far as I can tell, 1 is now implemented	20:51
stekern	but to me it seems that 2 doesn't happen, instead the snoop invalidation just hijacks the write port of the tag mem and discards the current operation	20:57
stekern	I don't understand how that's supposed to work	20:57
stekern	because, the current operation is most likely completely unrelated to the snoop hit	20:58
stekern	oh, and I think we could just use a true_dpram for the "duplicate" tag mem	21:14
stekern	actually, I don't think we'd need to stall at all then	21:20
stekern	just use the other port to invalidate the snooped cahceline	21:21
stekern	...it'll need some changes to the state machine, since the true_dpram doesn't have the bypass logic though	21:21
stekern	ah.. no, that's of course not going to work	21:42
stekern	we'll still need to stall on the write	21:43
--- Log closed Wed Jun 25 00:00:49 2014

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!