Jump to content


Photo

Wii U eDRAM vs x360 eDRAM


  • Please log in to reply
43 replies to this topic

#21 Chronos21

Chronos21

    Green Koopa Troopa

  • Members
  • 48 posts

Posted 13 March 2015 - 10:58 AM

I dont know where the problem is Megafenix. The eDRAM in WiiU is rumored to be between 35 and 70 GB/s as much i heard. Thats enough. You dont need 500 GB/s... that would be an overkill. And if you remember correctly, Shin'en said in the interview with HD WARRIORS, that bandwidth is not the bottleneck of todays GPU's. Latency is the real problem, and that the WiiU is great at latency.

#22 megafenix

megafenix

    Blooper

  • Members
  • 169 posts

Posted 13 March 2015 - 11:01 AM

"

I dont know where the problem is Megafenix. The eDRAM in WiiU is rumored to be between 35 and 70 GB/s as much i heard. Thats enough. You dont need 500 GB/s... that would be an overkill. And if you remember correctly, Shin'en said in the interview with HD WARRIORS, that bandwidth is not the bottleneck of todays GPU's. Latency is the real problem, and that the WiiU is great at latency.

"

 

that wouldnt do since g-buffer for defered rendering requires lots of bandwidth, and while shinen commented of bandwidth not being a problem on modern hardware, they were refering to the RAM not the Edram.

https://dromble.word...he-wii-u-power/

"

When testing our first code on Wii U we were amazed how much we could throw at it without any slowdowns, at that time we even had zero optimizations. The performance problem of hardware nowadays is not clock speed but ram latency. Fortunately Nintendo took great efforts to ensure developers can really work around that typical bottleneck on Wii U. They put a lot of thought on how CPU, GPU, caches and memory controllers work together to amplify your code speed. For instance, with only some tiny changes we were able to optimize certain heavy load parts of the rendering pipeline to 6x of the original speed, and that was even without using any of the extra cores.

 

"

 

The comment is not refering to the gpu memory and as you know every gpu has their own memory called vram and the wii u edram is basically that, system ram is for other stuff. The bandwidth requirements depend on which techniques you use, if you use forward rendering then you dont need that much memory bandwidth but it costs you lots of processing power, with deffered rendering(confirmed by shinen on fast racing neo) you save up lots of processing power but requires much more memory bandwidth, its a trade-off

here

http://jcgt.org/publ...02/04/paper.pdf

"

The size of the surface attribute buffer—the g-buffer—is typically 16 to 32 bytes per visibility sample in optimized high-quality real-time systems. The DRAM bandwidth consumed in writing this buffer, then reading it for each light pass is signifi- cant, even with only a single light pass. For example, a screen with a four-megapixel display, using four 24-byte samples per pixel at 60 Hz, would consume 46 GB/s of bandwidth, assuming only one lighting pass, just for the uncompressed g-buffer write and subsequent read. Thus, in practice, either anti-aliasing or pixel resolution (or both!) is often sacrificed to maintain high frame rates on economical hardware. This is perhaps the most serious issue with the technique, as low visibility sampling rates confound simple solutions to efficiently rendering partially transparent surfaces, edge anti-aliasing, and higher-dimensional rasterization.

 

"

thats just an example with one light pass, even xbox one with 200GB/s of sram bandwidth has troubles handling the framebuffer and using a g-buffer(ryse of rome for example which is 900p), 70GB/s of edram bandwidth are very short for wii u edram to handle triple buffering 720p+g-buffer+intermediate buffers in games like fast racing neo and other games as well

 

As for 500GB being an overkill, no it wouldnt, actually gpus from amd hd4000 to current ones can handle terabytes of bandwidth and i have the proof just under my pocket.


Edited by megafenix, 13 March 2015 - 12:31 PM.


#23 grahamf

grahamf

    The Happiness Fairy

  • Members
  • 2,532 posts

Posted 13 March 2015 - 11:48 AM

As long as this thread stays civil, I suppose


Edited by grahamf, 14 March 2015 - 04:01 PM.

$̵̵͙͎̹̝̙̼̻̱͖̲̖̜̩̫̩̼̥͓̳̒̀ͨ̌̅ͮ̇̓ͮ̈͌̓̔̐͆ͩ̋͆ͣ́&̾̋͗̏̌̓̍ͥ̉ͧͣͪ̃̓̇̑҉͎̬͞^̸̠̬̙̹̰̬̗̲͈͈̼̯̞̻͎ͭ̐ͦ̋́̆̔̏̽͢$̻̜͕̜̠͔̮͐ͬ̍ͨͩͤͫ͐ͧ̔̆͘͝͞^̄̋̄͗̐ͯͮͨͣ͐͂͑̽ͩ͒̈̚͏̷͏̗͈̣̪͙̳̰͉͉̯̲̘̮̣̘͟ͅ&̐ͪͬ̑̂̀̓͛̈́͌҉҉̶̕͝*̗̩͚͍͇͔̻̬̼̖͖͈͍̝̻̪͙̳̯̌̅̆̌ͥ̊͗͆́̍ͨ̎̊̌͟͡$̶̛̛̙̝̥̳̥̣̥̞̝̱̺͍̭̹̞͔̠̰͇ͪ͋͛̍̊̋͒̓̿ͩͪ̓̓͘^̈ͥͩͭ͆͌ͣ̀̿͌ͫ̈́̍ͨ̇̾̚͏̢̗̼̻̲̱͇͙̝͉͝ͅ$̢̨̪̝̗̰͖̠̜̳̭̀ͥͭͨ̋ͪ̍̈ͮͣ̌^ͦ̏ͬ̋͑̿́ͮ̿ͨ̋̌ͪ̓̋̇͆͟҉̗͍$̛̪̞̤͉̬͙̦̋ͣͬ̒͗̀̍͗̾̽̓̉͌̔͂̇͒̚̕͜^̧͎̖̟̮͚̞̜̮̘͕̹͚̏ͩ͐ͯ͑̍̍̀͒͘*̿ͨ̽̈́͐ͭ̌̈͋̚͟͝҉͕̙*̨̢̭̭̤̺̦̩̫̲͇͕̼̝̯̇ͨ͗̓̃͂ͩ͆͂̅̀̀́̚̚͟%̨͚̙̮̣̭͖͕͙ͣ̽ͮͤ́ͫ̊̊̐̄̌ͣ͌̉̔͊̽̾ͨ^̢̹̭͍̬̖͇̝̝̬̱͈͔̹͉̫̿͛̄̿͊͆ͦ̃ͮͩ͌ͭ̔ͫ̆͞ͅͅ%̵̼̖̻̘ͪͤ̈̃̓̐̑ͩͭ̄̑͊ͫ̆̌̄͡*̴̮̪͕̗̩͇͇ͪ̑̊̈́́̀͞^̼̝̥̦͇̺̘̤̦͕̦̞͑̑ͯ̂ͯ̕͞%ͮͫ̿ͫ̊̈̔̍҉҉̴̸̡*̛̭̖͇͚̝̤̬̰̅̎ͥͯ̓͑̾ͬͨͮ́̕͝^̧̽͋̈ͤͮ̈́́̍ͧ̊҉͇̙̣̯̀́%̴̡̛̘͚͈̗̖̮̫̏̆ͦ̽̔̈̽͒͛̈

 


#24 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 14 March 2015 - 08:06 AM

"
I dont know where the problem is Megafenix. The eDRAM in WiiU is rumored to be between 35 and 70 GB/s as much i heard. Thats enough. You dont need 500 GB/s... that would be an overkill. And if you remember correctly, Shin'en said in the interview with HD WARRIORS, that bandwidth is not the bottleneck of todays GPU's. Latency is the real problem, and that the WiiU is great at latency.
"
 
that wouldnt do since g-buffer for defered rendering requires lots of bandwidth, and while shinen commented of bandwidth not being a problem on modern hardware, they were refering to the RAM not the Edram.
https://dromble.word...he-wii-u-power/
"
When testing our first code on Wii U we were amazed how much we could throw at it without any slowdowns, at that time we even had zero optimizations. The performance problem of hardware nowadays is not clock speed but ram latency. Fortunately Nintendo took great efforts to ensure developers can really work around that typical bottleneck on Wii U. They put a lot of thought on how CPU, GPU, caches and memory controllers work together to amplify your code speed. For instance, with only some tiny changes we were able to optimize certain heavy load parts of the rendering pipeline to 6x of the original speed, and that was even without using any of the extra cores.[/size]
 
"
 
The comment is not refering to the gpu memory and as you know every gpu has their own memory called vram and the wii u edram is basically that, system ram is for other stuff. The bandwidth requirements depend on which techniques you use, if you use forward rendering then you dont need that much memory bandwidth but it costs you lots of processing power, with deffered rendering(confirmed by shinen on fast racing neo) you save up lots of processing power but requires much more memory bandwidth, its a trade-off
here
http://jcgt.org/publ...02/04/paper.pdf
"
The size of the surface attribute buffer—the g-buffer—is typically 16 to 32 bytes per visibility sample in optimized high-quality real-time systems. The DRAM bandwidth consumed in writing this buffer, then reading it for each light pass is signifi- cant, even with only a single light pass. For example, a screen with a four-megapixel display, using four 24-byte samples per pixel at 60 Hz, would consume 46 GB/s of bandwidth, assuming only one lighting pass, just for the uncompressed g-buffer write and subsequent read. Thus, in practice, either anti-aliasing or pixel resolution (or both!) is often sacrificed to maintain high frame rates on economical hardware. This is perhaps the most serious issue with the technique, as low visibility sampling rates confound simple solutions to efficiently rendering partially transparent surfaces, edge anti-aliasing, and higher-dimensional rasterization.
 
"
thats just an example with one light pass, even xbox one with 200GB/s of sram bandwidth has troubles handling the framebuffer and using a g-buffer(ryse of rome for example which is 900p), 70GB/s of edram bandwidth are very short for wii u edram to handle triple buffering 720p+g-buffer+intermediate buffers in games like fast racing neo and other games as well
 
As for 500GB being an overkill, no it wouldnt, actually gpus from amd hd4000 to current ones can handle terabytes of bandwidth and i have the proof just under my pocket.

First off, I never said bandwidth was not important, stop trying to put words in my mouth to make your broken arguments look better, I said there are 3 important factors to ram performance, and you generally only get to focus on 2 in the real world. A design with two high performing peices to that factor will always outperform a design with only one high performing factor like bandwidth which you are obsessing on, like the 360 edram had, will suck because of having crappy latency and low capacity.

Fast racing neo, and every engine nintendo uses on wii u is a deferred rendering engine, and Nintendo has been heavy on multipass since the gamecubes tev, which was an 8 pass, and the wii's was 16 passes. Your quote is about a forward rendering engine. Your quote also heavily relies on bandwidth because its about a system design where you have to travel across a high latency bus to get the data. The lower the latency, the more often you can send data, instead of having to wait. Most engines are designed around pc gpu's, the latency is horrible, they have to wait many cycles every time, so they need to send as much data as possible at one time to catch up when they get an opening. Although I shouldnt have made this explanation, as it will only confuse you because you still dont understand the difference between operations bandwidth and transportation bandwidth.

The wii u edram, like the cube and wii before it, is designed for minimal to no waiting on latency. When you can constantly send data whenever you want without having to wait, you dont desperately need super high bandwidth to play catch up.

You are still confusing operational bandwidth with bandwidth for transporting/holding data. The ram attached to the logic on the wii u gpu has very very high operational bandwidth as well. It cant be used to transport data across and between the system. Its operational bandwidth. Its for operations. That is the ram and the bandwidth your quote is tlking about, the 32Mb edram is NOT THAT. The wii u's edram doesnt render the image or rasterize it, it does NOT do calculations per pixel, thats the simd engines job, and it has its OWN MEMORY attached to the logic for that purpose. the Edram pool just holds the finished product, a 3.6 Mb image, for 60 fps that image needs 216Mb/s X3 thats 648Mb/s in bandwidth, to move that data, Half a Gb. those high operational bandwidth operations talked about in your quote, are handled by the rops and simd engines, which have their own memory attached directly to the logic, then they SEND the FINISHED product, to the edram, which any part in the system can access from there. The edram is a bucket, its a scratch pad, it does NOT do render operations. Your quote is talking about the bandwidth required to do render operations, like per pixel lighting calculations. That has NOTHING TO DO WITH THE EDRAM.

banner1_zpsb47e46d2.png

 


#25 megafenix

megafenix

    Blooper

  • Members
  • 169 posts

Posted 14 March 2015 - 11:10 AM

 

First off, I never said bandwidth was not important, stop trying to put words in my mouth to make your broken arguments look better, I said there are 3 important factors to ram performance, and you generally only get to focus on 2 in the real world. A design with two high performing peices to that factor will always outperform a design with only one high performing factor like bandwidth which you are obsessing on, like the 360 edram had, will suck because of having crappy latency and low capacity.

Fast racing neo, and every engine nintendo uses on wii u is a deferred rendering engine, and Nintendo has been heavy on multipass since the gamecubes tev, which was an 8 pass, and the wii's was 16 passes. Your quote is about a forward rendering engine. Your quote also heavily relies on bandwidth because its about a system design where you have to travel across a high latency bus to get the data. The lower the latency, the more often you can send data, instead of having to wait. Most engines are designed around pc gpu's, the latency is horrible, they have to wait many cycles every time, so they need to send as much data as possible at one time to catch up when they get an opening. Although I shouldnt have made this explanation, as it will only confuse you because you still dont understand the difference between operations bandwidth and transportation bandwidth.

The wii u edram, like the cube and wii before it, is designed for minimal to no waiting on latency. When you can constantly send data whenever you want without having to wait, you dont desperately need super high bandwidth to play catch up.

You are still confusing operational bandwidth with bandwidth for transporting/holding data. The ram attached to the logic on the wii u gpu has very very high operational bandwidth as well. It cant be used to transport data across and between the system. Its operational bandwidth. Its for operations. That is the ram and the bandwidth your quote is tlking about, the 32Mb edram is NOT THAT. The wii u's edram doesnt render the image or rasterize it, it does NOT do calculations per pixel, thats the simd engines job, and it has its OWN MEMORY attached to the logic for that purpose. the Edram pool just holds the finished product, a 3.6 Mb image, for 60 fps that image needs 216Mb/s X3 thats 648Mb/s in bandwidth, to move that data, Half a Gb. those high operational bandwidth operations talked about in your quote, are handled by the rops and simd engines, which have their own memory attached directly to the logic, then they SEND the FINISHED product, to the edram, which any part in the system can access from there. The edram is a bucket, its a scratch pad, it does NOT do render operations. Your quote is talking about the bandwidth required to do render operations, like per pixel lighting calculations. That has NOTHING TO DO WITH THE EDRAM.

 

 

 

And when did i say edram was used for rendering or rasterizing?

i was marely talking about memory bandwidth and latency in separate topics, i never did a direct comaprasion between the two, the mentioning of the deffered rendering was due that the technique requires a g-buffer and g-buffers are very well known to be very hungry on memory bandwidth

here

https://hacks.mozill...ferred-shading/

"

Deferred Shading

Deferred shading takes a different approach than forward shading by dividing rendering into two passes: the g-buffer pass, which transforms geometry and writes positions, normals, and material properties to textures called the g-buffer, and the light accumulation pass, which performs lighting as a series of screen-space post-processing effects.

// g-buffer pass
foreach visible mesh {
write material properties to g-buffer;
}
 
// light accumulation pass
foreach light {
compute light by reading g-buffer;
accumulate in framebuffer;
}

This decouples lighting from scene complexity (number of triangles) and only requires one shader per material and per light type. Since lighting takes place in screen-space, fragments failing the z-test are not shaded, essentially bringing the depth complexity down to one. There are also downsides such as its high memory bandwidth usage and making translucency and anti-aliasing difficult.

"

 

Crytek also has a mentioning about the memory bandwidth problems with deffered rendering(the tchnique requires g-buffer)

http://wccftech.com/...andwidth-gains/

"

Crytek Shares a Secret Method for Utilizing Xbox One eSRAM’s Full Potential – Resulted In High Bandwidth Gains
Recently, GamingBolt published a snippet of their interview with Crytek’s US Engine Business Development Manager Sean Tracy. Talking about utilization of CryEngine with tiled textures, Tracy talked about the role of Xbox One eSRAM in saving ‘big’ bandwidths, and shared a secret method that the Ryse development used to unlock Xbox One eSRAM’s full potential. He said:

 

This technique helped the developer a lot in optimizing Ryse: Son of Rome on Xbox One as it resulted into high bandwidth gains and allowed the development team to use just a single compute shader for lighting and culling.

“CryEngine has a unique and novel solution for this and was shipped with Ryse. One of the problems when using Deferred Shading is that it’s very heavy on bandwidth usage/memory traffic. This gets exponentially worse as overlapping lights cause considerable amounts of redundant read and write operations. In Ryse our graphics engineers created a system called tiled shading to take advantage of the Xbox One.”

“This splits the screen into tiles and generates a list of all the lights effective each title using a compute shader. It then cull’s light by min/max extents of the tile. We then loop over the light list for each tile and apply shading.”

 

“In practice this made for the biggest bandwidth save we could have hoped for, as just reading the Gbuffer once and writing shading results once at the end for each pixel. Only a single compute shader was used in Ryse for light culling and executing entire lighting and shading pipelines (with some small exceptions for complex surfaces like skin and hair).”

"

 

So, if even the xbox one esram high memory bandwidth of 200GB/s can run into troubles with the bandwidth requirements of the deffered rendering for even to crytek to come up with additional soltions, then how in the world wii u edram could handle the bandwidth requirements of the triple 720p buffering+gbuffer(for the deffered rendering)+intermediate buffes all at 60fps with less then xbox one esram memory bandwidth?

 

And yea, i already know that gpu have their own tiny memories like lovcal data shares, texture caches and such, thats precisely why i told shy guy that 500GB/s of memory bandwidth is not an overkill for the gpu at all

here

http://developer.amd...nsform-part-ii/

"

Why to use Local Memory?

Local memory or Local Data Share (LDS) is a high-bandwidth memory used for data-sharing among work-items within a work-group. ATI Radeon™ HD 5000 series GPUs have 32 KB of local memory on each compute unit. Figure 1 shows the OpenCL™ memory hierarchy for GPUs [1].

Fig1.png

Figure 1: Memory hierarchy of AMD GPUs

Local memory offers a bandwidth of more than 2 TB/s which is approximately 14x higher than the global memory [2]. Another advantage of LDS is that local memory does not require coalescing; once the data is loaded into local memory, it can be accessed in any pattern without performance degradation. However, LDS only allows sharing data within a work-group and not across the borders (among different work-groups). Furthermore, in order to fully utilize the immense potential of LDS we have to have a flexible control over the data access pattern to avoid bank conflicts. In our case, we used LDS to reduce accesses to global memory by storing the output of 8-point FFT in local memory and then performing next three stages without returning to global memory. Hence, we now return to global memory after 6 stages instead of 3 in the previous case. In the next section we elaborate on the use of local memory and the required data access pattern.

 

"

 

So, if each local data share on an amd  hd 5000 gpu(wii u is either based on hd4000 to hd6000 sicne there is also a rumor about the e6760, not to mention that from hd4000 to hd6000 are all based on the rv770 architecture) can have as much as 2TB/s of memory bandwidth, why an edram with 500GB/s of memory bandiwdth would be an overkill?

thats precisely why i told shy guy that 500GB/s of edram memory bandwidth wouldnt be a problem for the gpu to handle.


Edited by megafenix, 14 March 2015 - 11:43 AM.


#26 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 14 March 2015 - 11:20 AM

And when did i say edram was used for rendering or rasterizing?
i was marely talking about memory bandwidth and latency in separate topics, i never did a direct comaprasion between the two, the mentioning of the deffered rendering was due that the technique requires a g-buffer and g-buffers are very well known to be very hungry on memory bandwidth
here
https://hacks.mozill...ferred-shading/
"
Deferred Shading
Deferred shading takes a different approach than forward shading by dividing rendering into two passes: the g-buffer pass, which transforms geometry and writes positions, normals, and material properties to textures called the g-buffer, and the light accumulation pass, which performs lighting as a series of screen-space post-processing effects.
// g-buffer pass
foreach visible mesh {
write material properties to g-buffer;
}
 
// light accumulation pass
foreach light {
compute light by reading g-buffer;
accumulate in framebuffer;
}
This decouples lighting from scene complexity (number of triangles) and only requires one shader per material and per light type. Since lighting takes place in screen-space, fragments failing the z-test are not shaded, essentially bringing the depth complexity down to one. There are also downsides such as its high memory bandwidth usage and making translucency and anti-aliasing difficult.
"
 
Crytek also has a mentioning about the memory bandwidth problems with deffered rendering(the tchnique requires g-buffer)
http://wccftech.com/...andwidth-gains/
"
Crytek Shares a Secret Method for Utilizing Xbox One eSRAM’s Full Potential – Resulted In High Bandwidth Gains[/size]
Recently, GamingBolt published [/size]a snippet of their interview with Crytek’s US Engine Business Development Manager Sean Tracy. Talking about utilization of CryEngine with tiled textures, Tracy talked about the role of Xbox One eSRAM in saving ‘big’ bandwidths, and shared a secret method that the Ryse development used to unlock Xbox One eSRAM’s full potential. He said:[/size]
 

“CryEngine has a unique and novel solution for this and was shipped with Ryse. One of the problems when using Deferred Shading is that it’s very heavy on bandwidth usage/memory traffic. This gets exponentially worse as overlapping lights cause considerable amounts of redundant read and write operations. In Ryse our graphics engineers created a system called tiled shading to take advantage of the Xbox One.”
“This splits the screen into tiles and generates a list of all the lights effective each title using a compute shader. It then cull’s light by min/max extents of the tile. We then loop over the light list for each tile and apply shading.”

This technique helped the developer a lot in optimizing Ryse: Son of Rome on Xbox One as it resulted into high bandwidth gains and allowed the development team to use just a single compute shader for lighting and culling.

“In practice this made for the biggest bandwidth save we could have hoped for, as just reading the Gbuffer once and writing shading results once at the end for each pixel. Only a single compute shader was used in Ryse for light culling and executing entire lighting and shading pipelines (with some small exceptions for complex surfaces like skin and hair).”

 
"
 
So, if xbox one esram high memory bandwidth of 200GB/s can even have problems with the bandwidth requirements of the deffered rendering, then how could wii u edram could handle the bandwidth requirements of the triple 720p buffering+gbuffer+intermediate buffes with less then xbox one esram memory bandwidth?
 
And yea, i already know that gpu have their own tiny memories like lovcaql data shares, texture caches and such, thats precisely why i told shy guy that 500GB/s of memory bandwidth is not an overkill for the gpu at all
here
http://developer.amd...nsform-part-ii/
"
Why to use Local Memory?
Local memory or Local Data Share (LDS) is a high-bandwidth memory used for data-sharing among work-items within a work-group. ATI Radeon™ HD 5000 series GPUs have 32 KB of local memory on each compute unit. Figure 1 shows the OpenCL™ memory hierarchy for GPUs [1].
Fig1.png
Figure 1: Memory hierarchy of AMD GPUs
Local memory offers a bandwidth of more than 2 TB/s which is approximately 14x higher than the global memory [2]. Another advantage of LDS is that local memory does not require coalescing; once the data is loaded into local memory, it can be accessed in any pattern without performance degradation. However, LDS only allows sharing data within a work-group and not across the borders (among different work-groups). Furthermore, in order to fully utilize the immense potential of LDS we have to have a flexible control over the data access pattern to avoid bank conflicts. In our case, we used LDS to reduce accesses to global memory by storing the output of 8-point FFT in local memory and then performing next three stages without returning to global memory. Hence, we now return to global memory after 6 stages instead of 3 in the previous case. In the next section we elaborate on the use of local memory and the required data access pattern.
 
"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
So, if each local data share on an amd  hd 5000 gpu can have as much as 2TB/s of memory bandwidth, why an edram with 500GB/s of memory bandiwdth would be an overkill?
thats precisely why i told shy guy that 500GB/s of edram memory bandwidth wouldnt be a problem for the gpu to handle.

You are making the mistake of thinking bandwidth is the only solution, when your quote itself states '/memory trafficking'.

Nintendo uses low latency for its deferred rendering. Nintendo doesnt need high bandwidth, to send over massive piles of data that build up during latency waits, because there is no waiting on latency. The wii u is also not as powerful as those systems, and doesnt need to trafic a fraction of what GCN does.

Also, that bandwidth is operational bandwidth, it is used specifically FOR calculating each pixel. Yes, you are still confusing operational bandwidth, with what the edram does, which is store and transport. The very fact you keep making quotes ABOUT OPERATIONAL BANDWIDTH, and then referencing it to a storage pool is proof of that.

banner1_zpsb47e46d2.png

 


#27 megafenix

megafenix

    Blooper

  • Members
  • 169 posts

Posted 14 March 2015 - 11:23 AM

3Dude, on 14 Mar 2015 - 1:20 PM, said:snapback.png

You are making the mistake of thinking bandwidth is the only solution, when your quote itself states '/memory trafficking'.

Nintendo uses low latency for its deferred rendering. Nintendo doesnt need high bandwidth, to send over massive piles of data that build up during latency waits, because there is no waiting on latency. The wii u is also not as powerful as those systems, and doesnt need to trafic a fraction of what GCN does.

Also, that bandwidth is operational bandwidth, it is used specifically FOR calculating each pixel. Yes, you are still confusing operational bandwidth, with what the edram does, which is store and transport. The very fact you keep making quotes ABOUT OPERATIONAL BANDWIDTH, and then referencing it to a storage pool is proof of that.

 

 

 

nope, as i said before i consider both latency and bandwidth important factors, neverthless i am giving more focus on memory bandwidth since its the thing that its being more understimated on the wii u edram(take for example shy guy's quote), thats why i brought topics like deffered rendering, g-buffer quotes from shinen like the triple 720p buffering+g-buffer(for deffered rendering)+intermediate buffers, and of course cryteks quote about the memory bandwidth requirements for the deffered rendering to even be a burden for the xbox one esram of 200GB/s


Edited by megafenix, 14 March 2015 - 11:39 AM.


#28 NintendoReport

NintendoReport

    NintendoChitChat

  • Moderators
  • 5,906 posts
  • NNID:eddyray
  • Fandom:
    Nintendo Directs and Video Presentations

Posted 14 March 2015 - 03:49 PM

Just a quick note to the other posters.. while you may not like the conversation or are tired of the dead horse being beaten over and over again, there really isn't anything wrong with the current conversation. 


Keep Smiling, It Makes People Wonder What You Are Up To!
PA Magician | Busiest PA Magician | Magician Reviewed | Certified Magic Professionals

nccbanner_by_sorceror12-d9japra.png-- nintendoreportbox.png -- nintendo_switch_logo_transparent___wordm

#29 Segata

Segata

    wall crusher

  • Members
  • 4,738 posts
  • NNID:Ryudo9
  • Fandom:
    Dreamcast,Retro SEGA

Posted 14 March 2015 - 04:57 PM

Just a quick note to the other posters.. while you may not like the conversation or are tired of the dead horse being beaten over and over again, there really isn't anything wrong with the current conversation. 

Actually there is. They take over topics. No one else can say anything without those two MASSIVE quotes and pictures and never leads anywhere. The topics just end up being locked up so yes taking the topic hostage is very much wrong with the conversation when they can take it to a PM.

 

There is literally no point in a "convo" when in 500 words they are just saying "YOU'RE WRONG NO YOU ARE" that is not a conversation. it's taking the thread hostage.


Game Collection

Life?!...What console is that on?

g6gtaHT.png

GEN2GEN YouTube

 


#30 GAMER1984

GAMER1984

    Lakitu

  • Members
  • 2,036 posts
  • NNID:gamer1984
  • Fandom:
    Nintendo

Posted 14 March 2015 - 05:25 PM

Well as the person that started this thread I feel partly responsible. I was just curious why developers mainly the third parties that are still supporting Wii U (warner bro) are not taking advantage of the eDRAM if it could give them a boost in graphics and performance. I feel bad because it seem like Shin'en might be one of the few that will show what Wii U can really do. I believe FRN will be the game to shut many up. I think we wil have graphics on par with driveclub/etc. They said that was there mission with FRN on Wii U to bring high quality pc type graphics to it. It still bugs me that outside of Nintendo and I guess Criterion (if you want to include them) Shin'en will be the only devs that take advantage of what the hardware can do.



#31 Segata

Segata

    wall crusher

  • Members
  • 4,738 posts
  • NNID:Ryudo9
  • Fandom:
    Dreamcast,Retro SEGA

Posted 14 March 2015 - 05:47 PM

Well as the person that started this thread I feel partly responsible. I was just curious why developers mainly the third parties that are still supporting Wii U (warner bro) are not taking advantage of the eDRAM if it could give them a boost in graphics and performance. I feel bad because it seem like Shin'en might be one of the few that will show what Wii U can really do. I believe FRN will be the game to shut many up. I think we wil have graphics on par with driveclub/etc. They said that was there mission with FRN on Wii U to bring high quality pc type graphics to it. It still bugs me that outside of Nintendo and I guess Criterion (if you want to include them) Shin'en will be the only devs that take advantage of what the hardware can do.

nah look at many topics in Wii U specs forum and find these two spamming it up with that tripe and the topic is lost. it's their fault ESP a MODERATOR who is supposed to set an example.


Game Collection

Life?!...What console is that on?

g6gtaHT.png

GEN2GEN YouTube

 


#32 GAMER1984

GAMER1984

    Lakitu

  • Members
  • 2,036 posts
  • NNID:gamer1984
  • Fandom:
    Nintendo

Posted 14 March 2015 - 06:04 PM

nah look at many topics in Wii U specs forum and find these two spamming it up with that tripe and the topic is lost. it's their fault ESP a MODERATOR who is supposed to set an example.

 

what I meant is that I guess some asked why I made another thread like this. I was just curious because it kind of pisses me off that hardware is there and no one is taking advantage of it.



#33 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 14 March 2015 - 08:12 PM

nah look at many topics in Wii U specs forum and find these two spamming it up with that tripe and the topic is lost. it's their fault ESP a MODERATOR who is supposed to set an example.


This IS the topic, ryudo, if you dont like it, stop spamming off topic, and go to another topic.

banner1_zpsb47e46d2.png

 


#34 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 14 March 2015 - 08:42 PM

3Dude, on 14 Mar 2015 - 1:20 PM, said:snapback.png
 

 

nope, as i said before i consider both latency and bandwidth important factors, neverthless i am giving more focus on memory bandwidth since its the thing that its being more understimated on the wii u edram(take for example shy guy's quote), thats why i brought topics like deffered rendering, g-buffer quotes from shinen like the triple 720p buffering+g-buffer(for deffered rendering)+intermediate buffers, and of course cryteks quote about the memory bandwidth requirements for the deffered rendering to even be a burden for the xbox one esram of 200GB/s

The reason its a burden on the xbone is because there are more rendertargets, and exponentially more pixels to calculate, and the esram doesnt have the capacity it needs to do its job, because its sram instead of dram, and they could only have a fraction of the capacity in that footprint with sram, as opposed to dram. Sram's higher bandwidth, is of pretty much no use at all for that situation, but MS had no choice, because their edram deals fell through, and the only people they could get contracts with could only do it on a seperate die that had to be bussed to the system, which would make it pretty worthless. So they had to go with a fraction of the capacity with esram.

You are constantly confusing bandwidth used for operations, like every single thing from every quote you have used, with bandwidth used for moving data.n You are also confusing capacity with bandwidth.

The memory attached to the wii u gpu's logic, has bandwidth is much higher than the bandwidth to the edram, because it is needed to perform the operations in the quotes you keep bringing up. Thats not what the edram is for. Once again, ive already shown exactly how much bandwidth is required for shin ens 3 720p framebuffers, its half Gb a second. You are confusing bandwidth with capacity... and dont seem to really have any idea what latency is.

The reason the edram can save so much bandwidth isnt because it has super high bandwidth, but because it has high capacity. The things in the edram dont need super high bandwidth, as, again, 3 720p framebuffers are simply 3.6 Mb images, theyve already been processed and rasterized,all they are now, are 3.6 Mb images, at 60fps and with 3 of them, thats only half a Gb of bandwidth thats needed.

The problem is, you dont get to pick and choose how much bandwidth you use, or pick and choose whic threads from a wavefront to send in along with something else all at once. These things work in groups caled wavefronts, thats why its a single input, multiple data. You either send the whole wavefront, or you send none of the wavefront to send something else. You only get one opening at a time, and then you have to wait for another shot to send the data (This is latency) if you have 500Gb a second of bandwidth, and you use it to send half a Gb of rasterized image, instead of crunching a wavefront, you just wasted all the rest of your bandwidth during that access. If your logic memory is full, it cant be used at all, if its waiting on a result it needs from another calculation, it cant be used at all. You want finished data, to get the hell out of your pipeline as soon as possible so something else can take its place. For that, you need something with high capacity, and low latency, not so much high bandwidth, to send it to.

We have the serial number of the edram in the wii u: Renesas / NEC D813301 its range goes from 32Gb a second, to 70Gb a second. ANd that is more than enough.

The reason you got kicked out of the beyond 3d thread, is because you were blatantly wrong and werent listening to anyone.

Again, the edram is Renesas D813301, and no version of it gets anywhere near the bandwidth you are talking about.

http://www.chipworks...303-801_TOC.pdf

latency is important, i know that since the gamecube era, but bandwidth is important factor too, you can read documentation about the importance of bandwidth for deffered rendering using the g-buffer which is very demanding, do not ignore that fact either

Fenix, the g-buffer, is a buffer. the OPERATIONS to MAKE the product you SEND to the g-buffer (which can be any pool of ram large enough to store it) are very bandwidth intensive, that is what is sent to the edram, which they are USING for a buffer, it is the finished product of those operations, which is no longer bandwidth intensive, because its finished, and being stored ahead of time, in a buffer, you just need the capacity to store it, and the latency to access it when it needs to be accessed.

Buffers are for storing things, like rasterized images, not for high bandwidth simd operations.

The high bandwidth memory that is required to process that information, the memory you are talking about is not the edram, its the ram attached to the thousands of parallel arithmetic logic units that do the calculations for each and every pixel. Thats why its bandwidth is so freaking high, each cell is attached to its own alu. Its like a factory assembly line.


I am going to try to explain this very clearly for you, using your own misinterpereted copypastas:

"And yea, i already know that gpu have their own tiny memories like lovcaql data shares, texture caches and such, thats precisely why i told shy guy that 500GB/s of memory bandwidth is not an overkill for the gpu at all"

The problem with this, is you are freely exchanging any bandwidth from anywhere, on any gpu, with the wii u's edram. It doesnt work like that. There IS memory on the wii u's gpu, that has high bandwidth like that. It is NOT the edram. Because there is some memory, on the die, that gets bandiwdth similar to that, it does NOT mean the edram does, or even should, have that kind of bandwidth. It doesnt, and it shouldnt.

"Why to use Local Memory?
Local memory or Local Data Share (LDS) is a high-bandwidth memory used for data-sharing among work-items within a work-group. ATI Radeon™ HD 5000 series GPUs have 32 KB of local memory on each compute unit. Figure 1 shows the OpenCL™ memory hierarchy for GPUs [1]. Local memory offers a bandwidth of more than 2 TB/s which is approximately 14x higher than the global memory [2]."

This ram, has high bandwidth, low latency (Lower is better for latency), but very very tiny capacity. The capacity for this ram is only 32 Kb. This is in line with the rule of thumb that you can only realistically choose 2 out of the 3 important performance factors for ram. It has good bandwidth, good latency, but 'BAD' capacity (It doesnt need high capacity, it would be a waste). This, is is a good ram configuration for performing OPERATIONS. Its operational memory.

You are under the false impression, that the edram performance should mirror this rams performance. It should not, that would cripple the system and turn it into a worthless pile of junk. The edram should COMPLIMENT the operational ram, and do well, what this ram DOES NOT do well.

The wii u already has ram like this, again, it is the small capacity ram, that is attached to each alu/logic/compute unit (Sound familiar?) in the wii u gpu. This is the highest bandwidth ram in the gpu, no contest. It is operational ram. This ram has crappy capacity, it cant store anything it works on, it works on a small peice of a whole and sends it somewhere else to make room for the next peice.

That is where the edram comes in, the local ram, sends it finished products to the edram. IT has HIGHER capacity so it can store it all, in fact the capacity in the wii u's edram is 842 times higher, than the capacity of the operational ram in your very example. The edram, is NOT local 1 ram, like the ram in your example, its an intermediary between local and global ram, which is REALLY important.

banner1_zpsb47e46d2.png

 


#35 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 14 March 2015 - 09:20 PM

Kids, this is a thread about technical specifications. If you dont want to talk about technical specifications, or read about them, go to another thread, do not spam this one with off topic rainbowposting.

banner1_zpsb47e46d2.png

 


#36 Chronos21

Chronos21

    Green Koopa Troopa

  • Members
  • 48 posts

Posted 15 March 2015 - 12:16 AM

I agree with 3Dude. Its not a bad discussion here, actually really interessting and i dont see the point to close this thread.

#37 Segata

Segata

    wall crusher

  • Members
  • 4,738 posts
  • NNID:Ryudo9
  • Fandom:
    Dreamcast,Retro SEGA

Posted 15 March 2015 - 12:34 AM

I agree with 3Dude. Its not a bad discussion here, actually really interessting and i dont see the point to close this thread.

Then get used to walls of those posts a lot in these topics.

what I meant is that I guess some asked why I made another thread like this. I was just curious because it kind of pisses me off that hardware is there and no one is taking advantage of it.

Shin'en is,Nintendo of course will,PG did. Published games will and many indies. Big AAA games likely not as they abandoned Wii U but most of them not all but most are not worth it anway. Now smaller publishers that make some great but hidden gems kinda sucks no on Wii U. XCX and Zelda U make Wii U groan like a all night orgy baby.


Edited by Ryudo, 15 March 2015 - 01:33 AM.

Game Collection

Life?!...What console is that on?

g6gtaHT.png

GEN2GEN YouTube

 


#38 megafenix

megafenix

    Blooper

  • Members
  • 169 posts

Posted 21 March 2015 - 08:49 PM

Then get used to walls of those posts a lot in these topics.

Shin'en is,Nintendo of course will,PG did. Published games will and many indies. Big AAA games likely not as they abandoned Wii U but most of them not all but most are not worth it anway. Now smaller publishers that make some great but hidden gems kinda sucks no on Wii U. XCX and Zelda U make Wii U groan like a all night orgy baby.

Besides shinen i would say that those behind fatal frame for the wii u also did good, we also have to wait if devil´s third graphics have vastly improved as itagaki claims and i am also hope that the wii u version of shadow of the eternals is still on the works but right now we only know that Shadow of the eternals development has started again at Quantum Entanglement Entertainment, that was confirmed past year on october 31

http://www.polygon.c...t-entertainment


Edited by megafenix, 21 March 2015 - 08:49 PM.


#39 GAMER1984

GAMER1984

    Lakitu

  • Members
  • 2,036 posts
  • NNID:gamer1984
  • Fandom:
    Nintendo

Posted 21 March 2015 - 10:27 PM

we will see. looking forward to Devils Third. I really hope the next showing of the game can back up his words. Fast racing neo, zelda, and xcx though are going to be the showstoppers



#40 dorinne45

dorinne45

    Goomba

  • Members
  • 2 posts

Posted 30 March 2015 - 11:29 PM

waiting for his good news !

 

 

 

 

--------------------------------------------------

housse galaxy A3 coque galaxy A3


Edited by dorinne45, 31 March 2015 - 06:30 PM.





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

Anti-Spam Bots!