- Aug 29, 2020
- 487
- 2,560
Well, that's more of a supplement than a refutation of what I said.Just want to correct Your point about cache and CPU rendering.
The point about Blender is incorrect. For CPU tile size should be small, for GPU large. This is due to amount of memory available to the device (CPU or GPU - talking about device memory, not RAM here).
You must be registered to see the links
CPU has L1 / L2 / L3 and GPU have L1 / L2 caches and if "task" can fit in it it will produce performance benefits. The slowest L3 (CPU) cache is magnitude faster then RAM. Speed of that memory is from where performance comes.
Of course layout of cores also matters, for example Zen CCX has it's own memory pools that can be accessed only by that CCX so as You can see if this memory could be accessed by others CCX then we don't need to duplicate memory for given task ("tile" while rendering) and share that data across all cores and having performance boost that way. On GPU side more "cores" (compute units) have access to that data, and of course there is more "cores", so it perform better then CPU. There are exceptions from that rule of course, i.e. Threadripper and depending on GPU that You compare it to it can perform better.
Now with GPU the problem is amount of that VRAM and so called "out of core" mode with has huge performance penalty.
You must be registered to see the links
You must be registered to see the links
Note that by "task" I mean set of instructions and data required to perform that task. There are multiple tasks types in rendering pipeline and rendering single "tile" consist of multiple of such tasks. It has nothing to do with i.e. texture size as this is "storage" kind of memory for data that are not needed to perform current task. If current, or next, task require data that is not in cache then that data is fetched from slower and larger memory, usually it's L3 or VRAM and then, if data can't fit there or is simply not accessed earlier, it's fetched from RAM (for GPU assuming out of core mode support). So it's slow.
As for the value in iRay (Daz) - dunno didn't researched that, but most likely it doesn't matter in Your renderings (i.e. doesn't influence rendering time) because, most likely, other limitations kinck in first, i.e. total amount of iterations.
Indeed, the size of the optimal "tile" depends on many parameters. (My first renders on my ancient laptop with the first i5 series in Blender I've made with 16x16 tiles, oh my god it took me sooooooo long!)
But the fact is that "tiled" mode where you don't have to pull data back and forth from RAM to cache in Blender gives me a nice performance boost (I've measured it too, it's almost a 40% saving in time). In DAZ/Iray, as I said, there's no such mode: the renderer jumps around the whole picture at once, so there's not much to compare it with.
That leaves me only to compare by eye: does the render quality parameter give any savings? For me it does: the renderer can stop at a smaller number of iterations and at approximately the same quality, and the time itself is a single iteration increases, though not by much. That is, the overall gain is evident. But, as I said, this is a subjective view, because in this variant we deliberately discard the conversion ratio parameter.