Jump to content


Photo

Wii U's Memory Bandwidth, GPU More Powerful Than We Thought?


  • This topic is locked This topic is locked
79 replies to this topic

#41 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 27 February 2014 - 03:27 PM

seriosuly?

i got anotther article about how developers wnated to use the edram of 360 for particle effects and other stuff

but hell, have you even been reading and putting attention?

 

is more than a devcade and using embedded memory or edram for textures has been out there

 

learn

flipper%20datapath.gif

 

 

seriously, seems all you do is to make random comments

 

Gamecube has a fixed function gpu you idiot. It doesnt have any simd, nor the local data share that accompanies it, because its not a programmable shader architecture.

 

And the edram is a COMPLETELY different thing than the memory the simd uses for its calculations. Which, in case you forgot, or more likely are just too dumb to understand, is why I am already laughing at you.


banner1_zpsb47e46d2.png

 


#42 megafenix

megafenix

    Blooper

  • Members
  • 169 posts

Posted 27 February 2014 - 03:52 PM

Gamecube has a fixed function gpu you idiot. It doesnt have any simd, nor the local data share that accompanies it, because its not a programmable shader architecture.

 

And the edram is a COMPLETELY different thing than the memory the simd uses for its calculations. Which, in case you forgot, or more likely are just too dumb to understand, is why I am already laughing at you.

 

so, genius, if neither the SIMD nor the texture units are gona use the edram, then whats gonna use it?

 

i told you, i got this info about using 360 edram for particle system, yes, and they were going to use vertex texture fetch data

is 360 gpu fix?

again , i am not joking, i got this on the pocket too


Edited by megafenix, 27 February 2014 - 04:03 PM.


#43 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 27 February 2014 - 04:12 PM

so, genius, if neither the SIMD not the texture units are gona use the edram, then whats gonna use it?
 
i told you, i got this info about using 360 edram for particle system, yes, and they were going to use vertex texture fetch data
is 360 gpu fix?
again , i am not joking, i got this on the pocket too


Its a memory heiarchy. The main ram takes assets off disc, and is a fallback for any missed operations from higher teirs, or directly fetched assets but is slow. The main ram feeds predicted assets to the edram, which is much faster, some things like textures, are cached and virtualized for effeciency. The edram can be used as memory for operations (your 360 particle system you are blabbering about whilst completely clueless, though this would be an idiotic waste on the wii u when you could have geometry shaders do the general purpose calculations), or a scratchpad, or to hold frequently used assets like textures, and the frame buffer.

The edram could send peices of virtualized textures in the form of data arrays to the simd's local data share to perform calculations on each pixel multiple times, to change the color value of the texture according to data also sent like local lighting, point lights, lights set to infinite, bump maps, transparancies to draw in the particles that the edram holds in memory as a coordinate array. The high bandwidth is required because of the sheer volume of accessess the thousands of alu units make each time they do something. Its meant for storing a single solution to an arithmetic operation (small capacity) but from thousands of arithmitic logic units (wide bandwidth pipeline). It cant do a damn thing for moving data off disc or out of main, like you are desperately shouting from the rooftops... And it SURE AS HELL wouldnt be used as the number touted for the memory bandwidth.

banner1_zpsb47e46d2.png

 


#44 megafenix

megafenix

    Blooper

  • Members
  • 169 posts

Posted 27 February 2014 - 04:21 PM

Its a memory heiarchy. The main ram takes assets off disc, and is a fallback for any missed operations from higher teirs, or directly fetched assets but is slow. The main ram feeds predicted assets to the edram, which is much faster, some things like textures, are cached and virtualized for effeciency. The edram can be used as memory for operations (your 360 particle system you are blabbering about whilst completely clueless), or a scratchpad, to hold frequently used assets like textures, and the frame buffer.

The edram could send peices of virtualized textures in the form of data arrays to the simd's local data share to perform calculations on each pixel multiple times, to change the color value of the texture according to data also sent like local lighting, point lights, lights set to infinite, bump maps, transparancies to draw in the particles that the edram holds in memory as a coordinate array. The high bandwidth is required because of the sheer volume of accessess the thousands of alu units make each time they do something. It cant do a damn thing for moving data off disc or out of main, like you are desperately shouting from the rooftops... And it SURE AS HELL wouldnt be used as the number touted for the memory bandwidth.

do you feel you sound smart using those kind of random explanations and comments?

its actually the opposite, what has what you said anything to do with bandwidth anyway?

 

why if the gpus components from the hd4000 series and above have no trouble with terabytes of bandwidth with their caches like local data shares and texture caches they would have trouble with a big cache of 563.2GB/s weather is edram or sram or whatever?

 

no to mention that we are not even talking about the same amount of bandwidth

internal caches in gpu are terabytes

edram cache is only half a terabyte


Edited by megafenix, 27 February 2014 - 04:23 PM.


#45 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 27 February 2014 - 04:39 PM

Because those caches only send and access fragment data you idiot. They have 'high bandwidth' usage because they constantly access the memory for calculations, or to send small fragments of a texture. Its from WORKING ON the assets, or feeding small peices, not MOVING THEM en masse, in their entirety, which is the job main memory levels 1 and 2 which you are talking about.

If they were actually being fed terrabytes of data to work on:

1. There would need to be hundreds or thousands of simd arrays.
2. The operations bandwidth for the lds would shoot up into the petabytes... as it outputs petaflops of data.... Like it does for gpgpu super computers.

And just in general.
1. Local data shares are not caches, as they are entirely under programmer control.
2. Edram is nowhere near half a Tb in bandwidth, and you have no proof whatsoever to show that it is, only an assumption that its bussed 1024 pins a cell.
3. You are a moron.

banner1_zpsb47e46d2.png

 


#46 Nollog

Nollog

    Chain Chomp

  • Banned
  • 776 posts
  • NNID:Nollog
  • Fandom:
    Creepy Stalker Girl

Posted 27 February 2014 - 04:42 PM

fcAjd04.jpg
This thread in a jpg.

Warning: Cannot modify header information - headers already sent by (output started at /home/thewiiu/public_html/ips_kernel/HTMLPurifier/HTMLPurifier/DefinitionCache/Serializer.php:133) in /home/thewiiu/public_html/ips_kernel/classAjax.php on line 328
{"success":1,"post":"\n\n
\n\t\t<\/a>\n\t\t\n\t\n\t\t\n\t\t
\n\t\t\t\n\t\t\t\t


#47 megafenix

megafenix

    Blooper

  • Members
  • 169 posts

Posted 27 February 2014 - 06:03 PM

 

Because those caches only send and access fragment data you idiot. They have 'high bandwidth' usage because they constantly access the memory for calculations, or to send small fragments of a texture. Its from WORKING ON the assets, or feeding small peices, not MOVING THEM en masse, in their entirety, which is the job main memory levels 1 and 2 which you are talking about.

If they were actually being fed terrabytes of data to work on:

1. There would need to be hundreds or thousands of simd arrays.
2. The operations bandwidth for the lds would shoot up into the petabytes... as it outputs petaflops of data.... Like it does for gpgpu super computers.

And just in general.
1. Local data shares are not caches, as they are entirely under programmer control.
2. Edram is nowhere near half a Tb in bandwidth, and you have no proof whatsoever to show that it is, only an assumption that its bussed 1024 pins a cell.
3. You are a moron.

 

 

dude, i already knew that local data cache was under programmers control and that werent caches, I just used caches so that you would undestand i was refering to all the internal memories on the GPU

 

so what, does that change the facts that each SIMD cores can handle 2TB/s with the local data share?

does that change the fact that texture caches are 480GB/s of bandwidth or more per texture unit in the hd4000 gpus?

 

seriosuly, you seem just try to avoid the obvious, even if i dont account the simd cores, i still have the texture caches of 480GB/s and there could be like 16 of them or more cause each texture unit has its own texture cache

 

Seems you like to act smart, yet you fail

 

you get it?

16x 480GB/s=?

terabytes dude, for the caches

 

who is the moron?(this happens when you try to play the smart, is not gonna work)

hqdefault.jpg

 

 

wii u gpu, either a derivate of an hd4000 or hd5000 or whatever, obviously can handle 563.2GB/s without problems with the edram, thats natural cause gpus are more advanced and if a gpu from 2000 can handle 18GB/s of bandwidth with its edram or embedded memory its obvious that a new gpu from 10 years later can surely handle more

 

have no proof?

1.-made by NEC, the same that made xbox 360 edram

2.- xbox 360 edram had 1024bits, and the formula proves it cause you get the 256GB/s

3.-wii u uses a new edram 7 years more upated fro the same NEC who made the xbox 360 edram

4,-renesas confirmed that the wii u edram uses the latest technologies on that plant from NEC

5.-shinen mentions that wii u has plenty of high bandwidth

 

6.- wii u requires about 7MB for 720p while xbox 360 requires the whole 10MB for 720p(confirmed by micro at the msdn), which menas that those 7MB should provide similar bandwidth that those old 10MB OF 360 HAVE

 

7.- Have that reportfrom a respected writer Bob Peterson

 

 

and again, i am not saying this, about the terabytes of bandwidth with the caches or the local data sahres is not an assumption, is a fact

 

read dude, read 

http://www.tomshardw...850,1957-5.html

"

With the RV770, the AMD engineers didn’t stop at optimizing their architecture to only slightly increase the die real-estate— they also borrowed a few good ideas from the competition. The G80 had introduced a small, 16-KB memory area per multiprocessor that’s entirely under the programmer’s control, unlike a cache. This memory area, accessible in CUDA applications, can share data among threads. AMD has introduced its version of this with the RV770. It’s called Local Data Share and is exactly the same size as its competitor’s Shared Memory. It also plays a similar role by enabling GPGPU applications to share data among several threads. The RV770 goes even further, with another memory area (also 16 KB) called Globalicon1.png Data Share to enable communication among SIMD arrays.

 

Texture units

While the ALUs haven’t undergone a major modification, the texture units have been completely redesigned. The goal was obvious – as with the rest of the GPU, it was to increase performance significantly while maintaining as small a die area as possible. The engineers set fairly ambitious goals, aiming for an increase of 70% in performance for an equivalent die area. To do that, they focused their efforts largely on the texture cache. The bandwidth of the L1 texture cache was increased to 480 GB/s.

 

But that’s not all; the L1 cache that was shared by all the SIMD arrays has been broken down into 10 cache memories, one per SIMD array, and each contains only data exclusive to the corresponding SIMD array. Shared data are now stored in an L2 cache, which has also been completely redesigned, now having a bandwidth 384 GB/s to the L1 cache. In order to reduce latency, this L2 cache has been positioned near the memory controllers. Let’s see what the results of these improvements are in practice:

 

 

"

 

 

what?

are you gonna say that page is fake?

 

so i suppose these ones are too right?

http://www.anandtech.com/show/2556/4

 

"

AMD did also make some enhancements to their texture units as well. By doing some "stuff" that they won't tell us about, they improved the performance per mm^2 by 70%. Texture cache bandwidth has also been doubled to 480 GB/s while bandwidth between each L1 cache and L2 memory is 384 GB/s. L2 caches are aligned with memory channels of which there are four interleaved channels (resulting in 8 L2 caches).

 

Now that texture units are linked to both specific SIMD cores and individual L1 texture caches, we have an increase in total texturing ability due to the increase in SIMD cores with RV770. This gives us a 2.5x increase in the number of 8-bit per component textures we can fetch and bilinearly filter per clock, but only a 1.25x increase in the number of fp16 textures (as fp16 runs at half rate and fp32 runs at one quarter rate). It was our understanding that fp16 textures could run at full speed on R600, so the 1.25x increase in performance for half rate texturing of fp16 data makes sense.

Even though AMD wouldn't tell us L1 cache sizes, we had enough info left over from the R600 time frame to combine with some hints and extract the data. We have determined that RV770 has 10 distinct 16k caches. This is as opposed to the single shared 32k L1 cache on R600 and gives us a total of 160k of L1 cache. We know R600's L2 cache was 256k, and AMD told us RV770 has a larger L2 cache, but they wouldn't give us any hints to help out.

 

"

 

or maybe this one is a fake too

http://techreport.co...ics-processor/5

 

"

With 10 texture units onboard, the RV770 can sample and bilinearly filter up to 40 texels per clock. That's up from 16 texels per clock on RV670, a considerable increase. One of the ways AMD managed to squeeze down the size of its texture units was taking a page from Nvidia's playbook and making the filtering of FP16 texture formats work at half the usual rate. As a result, the RV770's peak FP16 filtering rate is only slightly up from RV670. Still, Hartog described the numbers game here as less important than the reality of measured throughput.

To ensure that throughput is what it should be, the design team overhauled the RV770's caches extensively, replacing the R600's "distributed unified cache" with a true L1/L2 cache hierarchy.

 

texture-l2-aligned-512.gif

 

 

Each L1 texture cache is associated with a SIMD/texture unit block and stores unique data for it, and each L2 cache is aligned with a memory controller. Much of this may sound familiar to you, if you've read aboutcertain competitors to RV770. No doubt AMD has learned from its opponents.

Furthermore, Hartog said RV770 uses a new cache allocation routine that delays the allocation of space in the L1 cache until the request for that data is fulfilled. This mechanism should allow RV770 to use its texture caches more efficiently. Vertices are stored in their own separate cache. Meanwhile, the chip's internal bandwidth is twice that of the previous generation—a provision necessary, Hartog said, to keep pace with the amount of data coming in from GDDR5 memory. He claimed transfer rates of up to 480GB/s for an L1 texture fetch and up to 384GB/s for data transfers between the L1 and L2 caches.

 

"

 

thats not enough, then how about his cousin which is not evw gpu but rather an optimized rv770 with some modifications?

http://www.bit-tech....ure-analysis/11

 

"

ATI Radeon HD 5870 Architecture Analysis

Published on 30th September 2009 by Tim Smalley

 

The L1 texture cache has remained unchanged in terms of size and associativity - it still has effectively unlimited access per clock cycle - but the increased core count means that the number of texture caches has doubled. There are now twenty 8KB L1 texture caches, meaning a total of 160KB L1 texture cache GPU-wide. The four L2 caches, which are associated with each of the four memory controllers, have doubled in capacity as well and are now 128KB each, meaning a total of 512KB across the GPU.

Texture bandwidth has also been bolstered, with texture fetches from L1 cache happening at up to 1TB/sec (one terabyte per second) - that's more than double the L1 texture cache bandwidth available in RV770. I said so earlier, but it's worth reiterating again - that's a phenomenal amount of bandwidth. What's more, bandwidth between L1 and L2 caches has been increased to 435GB/sec from 384GB/sec on RV770 - another impressive figure.

 

"


Edited by megafenix, 27 February 2014 - 07:02 PM.


#48 grahamf

grahamf

    The Happiness Fairy

  • Members
  • 2,532 posts

Posted 27 February 2014 - 08:23 PM

I forgot how to blink again.

$̵̵͙͎̹̝̙̼̻̱͖̲̖̜̩̫̩̼̥͓̳̒̀ͨ̌̅ͮ̇̓ͮ̈͌̓̔̐͆ͩ̋͆ͣ́&̾̋͗̏̌̓̍ͥ̉ͧͣͪ̃̓̇̑҉͎̬͞^̸̠̬̙̹̰̬̗̲͈͈̼̯̞̻͎ͭ̐ͦ̋́̆̔̏̽͢$̻̜͕̜̠͔̮͐ͬ̍ͨͩͤͫ͐ͧ̔̆͘͝͞^̄̋̄͗̐ͯͮͨͣ͐͂͑̽ͩ͒̈̚͏̷͏̗͈̣̪͙̳̰͉͉̯̲̘̮̣̘͟ͅ&̐ͪͬ̑̂̀̓͛̈́͌҉҉̶̕͝*̗̩͚͍͇͔̻̬̼̖͖͈͍̝̻̪͙̳̯̌̅̆̌ͥ̊͗͆́̍ͨ̎̊̌͟͡$̶̛̛̙̝̥̳̥̣̥̞̝̱̺͍̭̹̞͔̠̰͇ͪ͋͛̍̊̋͒̓̿ͩͪ̓̓͘^̈ͥͩͭ͆͌ͣ̀̿͌ͫ̈́̍ͨ̇̾̚͏̢̗̼̻̲̱͇͙̝͉͝ͅ$̢̨̪̝̗̰͖̠̜̳̭̀ͥͭͨ̋ͪ̍̈ͮͣ̌^ͦ̏ͬ̋͑̿́ͮ̿ͨ̋̌ͪ̓̋̇͆͟҉̗͍$̛̪̞̤͉̬͙̦̋ͣͬ̒͗̀̍͗̾̽̓̉͌̔͂̇͒̚̕͜^̧͎̖̟̮͚̞̜̮̘͕̹͚̏ͩ͐ͯ͑̍̍̀͒͘*̿ͨ̽̈́͐ͭ̌̈͋̚͟͝҉͕̙*̨̢̭̭̤̺̦̩̫̲͇͕̼̝̯̇ͨ͗̓̃͂ͩ͆͂̅̀̀́̚̚͟%̨͚̙̮̣̭͖͕͙ͣ̽ͮͤ́ͫ̊̊̐̄̌ͣ͌̉̔͊̽̾ͨ^̢̹̭͍̬̖͇̝̝̬̱͈͔̹͉̫̿͛̄̿͊͆ͦ̃ͮͩ͌ͭ̔ͫ̆͞ͅͅ%̵̼̖̻̘ͪͤ̈̃̓̐̑ͩͭ̄̑͊ͫ̆̌̄͡*̴̮̪͕̗̩͇͇ͪ̑̊̈́́̀͞^̼̝̥̦͇̺̘̤̦͕̦̞͑̑ͯ̂ͯ̕͞%ͮͫ̿ͫ̊̈̔̍҉҉̴̸̡*̛̭̖͇͚̝̤̬̰̅̎ͥͯ̓͑̾ͬͨͮ́̕͝^̧̽͋̈ͤͮ̈́́̍ͧ̊҉͇̙̣̯̀́%̴̡̛̘͚͈̗̖̮̫̏̆ͦ̽̔̈̽͒͛̈

 


#49 NintendoReport

NintendoReport

    NintendoChitChat

  • Moderators
  • 5,907 posts
  • NNID:eddyray
  • Fandom:
    Nintendo Directs and Video Presentations

Posted 27 February 2014 - 08:28 PM

kingofpopcorn1.gif


Keep Smiling, It Makes People Wonder What You Are Up To!
PA Magician | Busiest PA Magician | Magician Reviewed | Certified Magic Professionals

nccbanner_by_sorceror12-d9japra.png-- nintendoreportbox.png -- nintendo_switch_logo_transparent___wordm

#50 Raiden

Raiden

    wall crusher

  • Members
  • 4,738 posts

Posted 27 February 2014 - 08:46 PM

Every single Mega and 3Ddude debates. Tho we all know 3Ddude is right


Edited by Ryudo, 27 February 2014 - 08:49 PM.


#51 GAMER1984

GAMER1984

    Lakitu

  • Members
  • 2,036 posts
  • NNID:gamer1984
  • Fandom:
    Nintendo

Posted 27 February 2014 - 10:43 PM

Every single Mega and 3Ddude debates. Tho we all know 3Ddude is right

My thing is it do9es not matter anymore. Wii U is never going to be put into the same breath as ps4/X1. Just enjoy the beautiful games for what they are. We will never find out exactly what the Wii U is made out of and thats the way Nintendo plans to keep it. There will be visual treats, so lets look forward to MK8, X, Project Cars, Fast Racing Neo... the games that should actually push the boundaries a little and show the hardware is more capable.



#52 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 28 February 2014 - 02:32 AM

lmfao wow.

1. Wii u doesnt use gddr5
2. The wii u edram is not the texture cache, nor is it the l1 cache, you cant just pretend l1 band width is usable as main mem or l2.
3. The wii u only has 2 simd arrays. So you can remove 8 of those from the picture you posted, along with 8 of those l1
texture caches, along with three of those l2 caches, and and their memory busses, and all that bandwidth you have been sputtering about like a moron.

And finally, if you were actually able to comprehend anything you copy and paste, you would realize you just proved yourself wrong. First off, you are confusing internal and external bandwidth, thinking they can just be comnbined for the same purpose, which is stupid.

Second off, just because the gpu's internal operational bandwidth reaches 300-400-500 Gb/s, THAT DOESNT MEANT IT CAN HANDLE BEING FED 400 500 Gb/s.

" Meanwhile, the chip's internal bandwidth is twice that of the previous generation—a provision necessary, Hartog said, to keep pace with the amount of data coming in from GDDR5 memory. He claimed transfer rates of up to 480GB/s for an L1 texture fetch and up to 384GB/s for data transfers between the L1 and L2 caches.

All that bandwidth, is needed to handle processing the load being fed by the gddr5, as specifically stated by the text you cluelessly copy and pasted. Gddr5 is 20 Gb/s on a 32 bit bus, 40 on a 64 bit bus WHich is generally on average with most set ups of 2-64/4 32 modules is around 80 Gb/s.... clocked at an effective 7 Ghz. WHich nothing in the wii u is clocked at.

THATS HOW MUCH IT CAN HANDLE, those 300-400-500 Gb/s are whats required to process around 80 Gb/s of incoming assets. If they were being fed 400-500 Gb/s the amount of operational internal bandwidth required would need to increase exponentially in kind.

So, AGAIN, now that you have stumbled your way across every single mistake on your way back to the beginning where I started.

The processors CAN NOT make use of that much edram bandwidth.

banner1_zpsb47e46d2.png

 


#53 megafenix

megafenix

    Blooper

  • Members
  • 169 posts

Posted 28 February 2014 - 07:32 AM

lmfao wow.

1. Wii u doesnt use gddr5
2. The wii u edram is not the texture cache, nor is it the l1 cache, you cant just pretend l1 band width is usable as main mem or l2.
3. The wii u only has 2 simd arrays. So you can remove 8 of those from the picture you posted, along with 8 of those l1
texture caches, along with three of those l2 caches, and and their memory busses, and all that bandwidth you have been sputtering about like a moron.

And finally, if you were actually able to comprehend anything you copy and paste, you would realize you just proved yourself wrong. First off, you are confusing internal and external bandwidth, thinking they can just be comnbined for the same purpose, which is stupid.

Second off, just because the gpu's internal operational bandwidth reaches 300-400-500 Gb/s, THAT DOESNT MEANT IT CAN HANDLE BEING FED 400 500 Gb/s.

" Meanwhile, the chip's internal bandwidth is twice that of the previous generation—a provision necessary, Hartog said, to keep pace with the amount of data coming in from GDDR5 memory. He claimed transfer rates of up to 480GB/s for an L1 texture fetch and up to 384GB/s for data transfers between the L1 and L2 caches.

All that bandwidth, is needed to handle processing the load being fed by the gddr5, as specifically stated by the text you cluelessly copy and pasted. Gddr5 is 20 Gb/s on a 32 bit bus, 40 on a 64 bit bus WHich is generally on average with most set ups of 2-64/4 32 modules is around 80 Gb/s.... clocked at an effective 7 Ghz. WHich nothing in the wii u is clocked at.

THATS HOW MUCH IT CAN HANDLE, those 300-400-500 Gb/s are whats required to process around 80 Gb/s of incoming assets. If they were being fed 400-500 Gb/s the amount of operational internal bandwidth required would need to increase exponentially in kind.

So, AGAIN, now that you have stumbled your way across every single mistake on your way back to the beginning where I started.

The processors CAN NOT make use of that much edram bandwidth.

 

 

i am not claiming anything the experts in the material havent, and clearly says that it applies for all gpus, from low, middle and high end, the major difference in gpus from the same familly series is the amount of stream processors, tmus and rops, or are you expecting each of the gpus of the hd4000 or hd5000 to be so different from one another in their internal components and architexture?

 

and the hd5000 has 1TB/s of bandwidth with the l1 texture cache, for each one of them

erre cache using edram has been used for years, fix or whatevever doesnt matter

 

flipper%20datapath.gif



Every single Mega and 3Ddude debates. Tho we all know 3Ddude is right

every single debate or whatever with 3dude, we already know 3dude is a big troll, much like yourself, the big adventage is that he is administrator, why dont you show up on hd warriors forums to really see what you are made off?

 

the only reason you act so cocky is because this is your corral, why not change that and debate elsewhere?

 

whats the mattr?, are you a

La+gallina.jpg

 

 

its not my fault that experts on gpus say they can handle terabytes of bandwidth, and wii u gpu is no exception cause comes from either the hd4000 or hd5000 or the e6760 or whatever but still is and AMD GPU just like the examples

 

seriously, ten years since gamecube debuted and already had like 18GB/s with its edram and you still cant believe that wii u gpu has like 31x times that?

already xbox 360 had 256GB/s between the edram and the rops, why a more modern gpu like the wii u would have troubles having double that bandwidth?


Edited by megafenix, 28 February 2014 - 08:04 AM.


#54 Alex Atkin UK

Alex Atkin UK

    Boo

  • Members
  • 528 posts

Posted 28 February 2014 - 09:18 AM

My head hurts!


Sheffield 3DS | Steam & XBOX: Alex Atkin UK | PSN & WiiU: AlexAtkinUK

 

How to improve the Wii U download speed.


#55 Azure-Edge

Azure-Edge

    Chain Chomp

  • Members
  • 782 posts
  • NNID:Azure-X

Posted 28 February 2014 - 09:24 AM

My head hurts!

 

You were actually reading all of that? O.o


pNgecl.gif


#56 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 28 February 2014 - 10:28 AM

i am not claiming anything the experts in the material havent, and clearly says that it applies for all gpus, from low, middle and high end, the major difference in gpus from the same familly series is the amount of stream processors, tmus and rops, or are you expecting each of the gpus of the hd4000 or hd5000 to be so different from one another in their internal components and architexture?

and the hd5000 has 1TB/s of bandwidth with the l1 texture cache, for each one of them
erre cache using edram has been used for years, fix or whatevever doesnt matter

flipper%20datapath.gif


every single debate or whatever with 3dude, we already know 3dude is a big troll, much like yourself, the big adventage is that he is administrator, why dont you show up on hd warriors forums to really see what you are made off?

the only reason you act so cocky is because this is your corral, why not change that and debate elsewhere?

whats the mattr?, are you a
La+gallina.jpg


its not my fault that experts on gpus say they can handle terabytes of bandwidth, and wii u gpu is no exception cause comes from either the hd4000 or hd5000 or the e6760 or whatever but still is and AMD GPU just like the examples

seriously, ten years since gamecube debuted and already had like 18GB/s with its edram and you still cant believe that wii u gpu has like 31x times that?
already xbox 360 had 256GB/s between the edram and the rops, why a more modern gpu like the wii u would have troubles having double that bandwidth?

Good God you are an idiot.

1. edram just means ram that has been embedded. That doesnt mean every peice of embedded memory can be combined and stated as main memory bandwidth like you are trying to claim.

You cant combine the bandwidth of l1, l2, and the cubes 1tsram and claim the sum total can do anything you want. The rams have different purposes that do different jobs SIMULTANEOUSLY.

The 360's rops had 256Gb a second because they were embedded onto the ram die. That bandwidth was also useless as hell once it made its way to the bridge off the die.

YOU STILL have no.proof whatsoever about how the wii u's edram is bussed, and tge reason why is because its not bussed at 1k bits a cell, because thats idiotic.

And YES, YOU ARE SPECIFICALLY CLAIMING EVER
RYTHING those documents you are blindly copy pasting arent.

Those documents SPECIFICALLY STATE that bandwidth is for WORKING ON data fed by gddr5.

Which means the amount of incoming data they are designed to handle IS THE BANDWIDTH OF GDDR5, WHICH IS NOWHERE NEAR A TERABYTE YOU MORON.

The REASON the internal operational bandwidth shoots up so high is because multiple calculations are performed on EACH INDIVIDUAL PIXEL of EVERY TEXTURE.

In other words,
In order to process 80 Gb/s of texture data, your simd package needs that TB of low capacity bandwidth (Not the one in the wii u gpu, wich only has 1/4 of the simd units in your example.

Which means, if you wanted to feed the gpu a TERABYTE you would need 20 TERABYTES operations bandwidth for the simd operations to keep up.

banner1_zpsb47e46d2.png

 


#57 megafenix

megafenix

    Blooper

  • Members
  • 169 posts

Posted 28 February 2014 - 10:56 AM

3Dude, on 28 Feb 2014 - 10:28 AM, said:

Good God you are an idiot.

1. edram just means ram that has been embedded. That doesnt mean every peice of embedded memory can be combined and stated as main memory bandwidth like you are trying to claim.

You cant combine the bandwidth of l1, l2, and the cubes 1tsram and claim the sum total can do anything you want. The rams have different purposes that do different jobs SIMULTANEOUSLY.

The 360's rops had 256Gb a second because they were embedded onto the ram die. That bandwidth was also useless as hell once it made its way to the bridge off the die.

YOU STILL have no.proof whatsoever about how the wii u's edram is bussed, and tge reason why is because its not bussed at 1k bits a cell, because thats idiotic.

And YES, YOU ARE SPECIFICALLY CLAIMING EVER
RYTHING those documents you are blindly copy pasting arent.

Those documents SPECIFICALLY STATE that bandwidth is for WORKING ON data fed by gddr5.

Which means the amount of incoming data they are designed to handle IS THE BANDWIDTH OF GDDR5, WHICH IS NOWHERE NEAR A TERABYTE YOU MORON.

The REASON the internal operational bandwidth shoots up so high is because multiple calculations are performed on EACH INDIVIDUAL PIXEL of EVERY TEXTURE.

In other words,
In order to process 80 Gb/s of texture data, your simd package needs that TB of low capacity bandwidth (Not the one in the wii u gpu, wich only has 1/4 of the simd units in your example.

Which means, if you wanted to feed the gpu a TERABYTE you would need 20 TERABYTES operations bandwidth for the simd operations to keep up.

 

God, what an idiot

 

dude only 7MB of edram on wii u are reuired for framebuffer, which means that those 7MB of new edram should have a bandwidth similar to the 256GB/s of the 360, not saying are 256GB/s on 7MB on wii u cause the 360 had to return the data via an external bus of 32GB/s, so those 7MB of bandwidth could be like 120GB/s teback and forward. Not to mention that if you only require 7MB for the 720p with double buddering, then what is the rest o the 28MB of edram(accounting the extra 3MB of faster edram) going to be used for?
antialaising is even cheaper than resolution and that has been confirmed by microsoft on his wepage of msdn
so, even
supposing like 20MB of edram for 720p+4MSSA+zbuffer+ stencial, what are the other 15MB of edram gonna be used for?


on the ram die?
like this?
http://www.notenough.../wiiugpudie.jpg

wiiugpudie.jpg


Edited by megafenix, 28 February 2014 - 11:03 AM.


#58 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 28 February 2014 - 11:36 AM

God, what an idiot

dude only 7MB of edram on wii u are reuired for framebuffer, which means that those 7MB of new edram should have a bandwidth similar to the 256GB/s of the 360, not saying are 256GB/s on 7MB on wii u cause the 360 had to return the data via an external bus of 32GB/s, so those 7MB of bandwidth could be like 120GB/s teback and forward. Not to mention that if you only require 7MB for the 720p with double buddering, then what is the rest o the 28MB of edram(accounting the extra 3MB of faster edram) going to be used for?
antialaising is even cheaper than resolution and that has been confirmed by microsoft on his wepage of msdn
so, even
supposing like 20MB of edram for 720p+4MSSA+zbuffer+ stencial, what are the other 15MB of edram gonna be used for?
on the ram die?
like this?http://www.notenough.../wiiugpudie.jpg
wiiugpudie.jpg

The 360 doesnt get to use that 256Gb/s for the frame buffer the way you think it does. That bandwidth is fot OPERSTIONS, NOT comminications.

As soon as that data leaves the daughter die, the bandwidth drops to 32Gb/s read/write. And it still has a long way to go and more bridges at lower bandwidths to cross until it makes it out to screen, Not too mention all the bandwidth in the world is irrelevent if you dont have the capacity to do the job.

You are incapable of understanding the difference between memory used for operations, and communication memory used to store and move assets. And you dont understand the fundamental aspects of memory and what each is for, constantly confusing one for the other.

As for what you would do with the extra edram? you would fill it with cached texture and use it for a cpu scratchpad. Thats a no brainer.

What you WOULDNT DO is use it to perform raster operations to create the frame buffer like the 360 does. The reason why is, unlike the 360, the rops arent on the edram.

What you would do is have the small cache of memory thsts attached to the rops, use its operational bandwidth to fill the fb, and send the finished peices of product into the l2 edram until its whole and sent out.

banner1_zpsb47e46d2.png

 


#59 meitantei_conan

meitantei_conan

    Boo

  • Members
  • 515 posts
  • NNID:qublin_triforce

Posted 28 February 2014 - 12:03 PM

:l   

 

 

 

I just....


Edited by meitantei_conan, 28 February 2014 - 12:04 PM.


#60 megafenix

megafenix

    Blooper

  • Members
  • 169 posts

Posted 02 March 2014 - 06:02 PM

The 360 doesnt get to use that 256Gb/s for the frame buffer the way you think it does. That bandwidth is fot OPERSTIONS, NOT comminications.

As soon as that data leaves the daughter die, the bandwidth drops to 32Gb/s read/write. And it still has a long way to go and more bridges at lower bandwidths to cross until it makes it out to screen, Not too mention all the bandwidth in the world is irrelevent if you dont have the capacity to do the job.

You are incapable of understanding the difference between memory used for operations, and communication memory used to store and move assets. And you dont understand the fundamental aspects of memory and what each is for, constantly confusing one for the other.

As for what you would do with the extra edram? you would fill it with cached texture and use it for a cpu scratchpad. Thats a no brainer.

What you WOULDNT DO is use it to perform raster operations to create the frame buffer like the 360 does. The reason why is, unlike the 360, the rops arent on the edram.

What you would do is have the small cache of memory thsts attached to the rops, use its operational bandwidth to fill the fb, and send the finished peices of product into the l2 edram until its whole and sent out.

 

 

doenst?

here

http://www.ign.com/a...tation-3?page=3

 

"

E3 2005: Microsoft's Xbox 360 vs. Sony's PlayStation 3

 

by Douglass C. Perry
May 20, 2005

 

Bandwidth
The PS3 has 22.4 GB/s of GDDR3 bandwidth and 25.6 GB/s of RDRAM bandwidth for a total system bandwidth of 48 GB/s.

The Xbox 360 has 22.4 GB/s of GDDR3 bandwidth and a 256 GB/s of EDRAM bandwidth for a total of 278.4 GB/s total system bandwidth.

 

Why does the Xbox 360 have such an extreme amount of bandwidth?

Even the simplest calculations show that a large amount of bandwidth is consumed by the frame buffer. For example, with simple color rendering and Z testing at 550 MHz the frame buffer alone requires 52.8 GB/s at 8 pixels per clock. The PS3's memory bandwidth is insufficient to maintain its GPU's peak rendering speed, even without texture and vertex fetches.

The PS3 uses Z and color compression to try to compensate for the lack of memory bandwidth. The problem with Z and color compression is that the compression breaks down quickly when rendering complex next-generation 3D scenes.

 

HDR, alpha-blending, and anti-aliasing require even more memory bandwidth. This is why Xbox 360 has 256 GB/s bandwidth reserved just for the frame buffer. This allows the Xbox 360 GPU to do Z testing, HDR, and alpha blended color rendering with 4X MSAA at full rate and still have the entire main bus bandwidth of 22.4 GB/s left over for textures and vertices.

 

"

 

 

or here

http://www.techpower...ox-360-gpu.html

 

 

how about ths

http://www.cs.wustl....M/HC17.S8T4.pdf

 

 

 

now let see people from microsoft and ati

here

http://meseec.ce.rit...ing2012/2-4.pdf

 

"
Console Architecture
By: Peter Hood & Adelia Wong
 
Xenos Specs

• 500 MHz parent GPU on 90nm,65nm (since 2008) or 45nm (since 2010) TSMC process of total 232 million transistors

 

      - 48 floatingpoint vector processors for shader execution, divided in three dynamically scheduled SIMD groups of 16 processors each.

 

• Unified shading architecture (each pipeline is capable of running either pixel or vertex shaders)

• 10 FP ops per vector processor per cycle (5 fused multiplyadd)

• Maximum vertex count: 6 billion vértices per second ((48 shader vector processors ×2 ops per cycle × 500 MHz) / 8 vector ops per vertex) for simple transformed and lit polygons

• Maximum polygon count: 500 million triangles per second

• Maximum shader operations: 96 billion shader operations per second (3 shader pipelines ×16 processors ×4 ALUs × 500 MHz)

• 240 GFLOPS

• MEMEXPORT shader function

 

• 500 MHz, 10 MiB daughter embedded DRAM (at 256GB/s) framebuffer on 90 nm, 80 nm (since 2008) or 65nm (since 2010).

      - NEC designed eDRAM die includes additional logic (192 parallel pixel processors) for color, alpha compositing, Z/stencil buffering, and anti aliasing called “Intelligent Memory”, giving developers 4 sample antialiasing at very Little performance cost.

       -   105 million transistors

       - 8 Render Output units

 

•  Maximum pixel fillrate: 16 gigasamples per second fillrate using 4X multisample anti aliasing (MSAA), or 32 gigasamples using Z only operation; 4 gigapixels per second without MSAA (8 ROPs × 500 MHz)

•  Maximum Z simple rate: 8 gigasamples per second (2 Z samples ×8 ROPs × 500 MHz), 32 gigasamples per second using 4X anti aliasing (2 Z samples ×8 ROPs × 4X AA × 500 MHz)

•  Maximum anti aliasing simple rate: 16 gigasamples per second (4 AA samples ×8 ROPs × 500 MHz)

"

 

 

you see?

the ROPS are in the same die chip with the edram (check the info above)

 

lets have bynd3d explain that

or here byind3d articles explains

http://www.beyond3d....nt/articles/4/3

 

"

The one key area of bandwidth, that has caused a fair quantity of controversy in its inclusion of specifications, is that of bandwidth available from the ROPS to the eDRAM, which stands at 256GB/s. The eDRAM is always going to be the primary location for any of the bandwidth intensive frame buffer operations and so it is specifically designed to remove the frame buffer memory bandwidth bottleneck - additionally, Z and colour access patterns tend not to be particularly optimal for traditional DRAM controllers where they are frequent read/write penalties, so by placing all of these operations in the eDRAM daughter die, aside from the system calls, this leaves the system memory bus free for texture and vertex data fetches which are both read only and are therefore highly efficient. Of course, with 10MB of frame buffer space available this isn't sufficient to fit the entire frame buffer in with 4x FSAA enabled at High Definition resolutions and we'll cover how this is handled later in the article.

"

 

here we have more official info

http://fileadmin.cs....nos-doggett.pdf

 

here
"
Xenos: XBOX360 GPU
Michael Doggett Architect
October 26, 2005

 

Rendering performance
Alpha and Z logic to EDRAM interface
256GB/s
Color and Z - 32 samples
32bit color, 24bit Z, 8bit stencil
Double Z - 64 samples

24bit Z, 8bit stenci

 

"

 

 

here

http://www.hotchips....8/HC17.S8T4.pdf

 

"

Xbox 360 System Architecture
Jeff Andrews
Nick Baker
Xbox Semiconductor Technology
Group
 
GPU Specs
500 MHz graphics processor
48 parallel shader cores (ALUs); dynamically scheduled; 32bit IEEE
FLP
24 billion
shader
instructions per second
Superscalar design: vector, scalar and texture ops per instruction
Pixel fillrate: 4 billion pixels/sec (8 per cycle); 2x for depth /
stencil only
AA: 16 billion samples/sec; 2x for depth / stencil only
Geometry rate: 500 million triangles/sec
Texture rate: 8 billion bilinear filtered samples / sec
• 10 MB EDRAM
 256 GB/s fill
Direct3D 9.0-compatible
 
High-Level Shader Language (HLSL) 3.0+ support
Custom features
Memory export: Particle physics, Subdivision surfaces
Tiling acceleration: Full resolution Hi-Z, Predicated Primitives
XPS:
 
©2005 Microsoft Corporation.
Microsoft, Xbox 360, Xbox, XNA, Visual C++, Windows, Win32, Direct3D
, and the Xbox 360 logo and
Visual Studio logo are either registered trademarks or trademarks of Mi
crosoft Corporation in the United
States and/or other countries.
IBM, PowerPC, and VMX are trademarks of International Business Machines C
orporation in the United
States, or other countries, or both.
IEEE is a registered trademark in the United States, owned by the
Institute of Electrical and Electronics
Engineers.
OpenMP
is a trademark of the
OpenMP
Architecture Review Board.
The names of actual companies and products mentioned herein may
be the trademarks of their
respective owners

""

 

As you can see, the 256GB/s were avilable for the 8rops there on the edram and used for frmebuffer

so obviously since wii u has to slow bandwidth with ddr3 then the edram has to provide at least 256GB/s for even port to work on wiiu, and since shinen can get 720p of framebuffer with just 7MB of edram on wiiu while in 360 you need the whole 10MB, then at least those 7MB should be providing somehing close to the 256GB/s

 

those 32GB/s are only when returnng the data that the rops processes with the edram, the rest of the gpu doesnt need the 256GB/s of bandwidth cause is just for framebuffer, but even so, this means that for 360 ports to wiiu to work, the bandwidth with the edram has to be 256GB/s for the framebuffer or more since shinen can use the edram not just for framebuffer but also for many other things


Edited by megafenix, 02 March 2014 - 11:34 PM.





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

Anti-Spam Bots!