Jump to content


Photo

Wii U GPGPU?


  • Please log in to reply
45 replies to this topic

#41 Alex Atkin UK

Alex Atkin UK

    Boo

  • Members
  • 528 posts

Posted 26 November 2012 - 02:12 PM

Wii u does not have a unified memory architecture.

It has a good old fashioned orthodox memory heiarchy. (Uma attempts to cheap out on two seperate pools by sharing one pool, while newer dynamic video memory technology attempts to cheap out all the way by using dynamically regulated high bandwidth low latency gddr5 main memory instead of dedicated video/cpu caches) Cpu and gpu do not share a memory pool, that 32Mb on the gpu is all gpu. The cpu has a tiny (less than 1 Mb) l1 with psychotic bandwidth and low latency, and another 'tiny' l2 (considerably larger but also under 1Mb) with nearly as high bandwidth and as low latency to buffer between the cpu and the low speed high capacity main ram.


Isn't this thread purely down the fact this ISN'T confirmed either way? I was just going on the findings from the iFixIt tear down which claimed to only find 2GB of DDR3 memory on the board.

It is my understanding that the reason Xbox 360 had eDRAM was for upscaling and just general post processing. That is the point of the unified memory, it still needed GPU memory to do all the main GPU processing. (I admit, I am no expert)

I figured that the increase in eDRAM on Wii U was mostly for GPGPU purposes and to make 1080p more practical, the Xbox 360 didn't really have enough eDRAM to do 1080p effectively.

Initially I did wonder if it was 1GB of GDDR5 (or even just GDDR3) and 1GB of DDR3, this would make a lot of sense. However the tear down would seem to contradict that theory.

Sheffield 3DS | Steam & XBOX: Alex Atkin UK | PSN & WiiU: AlexAtkinUK

 

How to improve the Wii U download speed.


#42 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 26 November 2012 - 03:27 PM

Isn't this thread purely down the fact this ISN'T confirmed either way? I was just going on the findings from the iFixIt tear down which claimed to only find 2GB of DDR3 memory on the board.

It is my understanding that the reason Xbox 360 had eDRAM was for upscaling and just general post processing. That is the point of the unified memory, it still needed GPU memory to do all the main GPU processing. (I admit, I am no expert)

I figured that the increase in eDRAM on Wii U was mostly for GPGPU purposes and to make 1080p more practical, the Xbox 360 didn't really have enough eDRAM to do 1080p effectively.

Initially I did wonder if it was 1GB of GDDR5 (or even just GDDR3) and 1GB of DDR3, this would make a lot of sense. However the tear down would seem to contradict that theory.


Try the iwata asks teardown, its confirmed.

It has more to do than with just the 10Mb edram on the gpu, but with how the systems memory works as a whole, for example often that 10Mb wasnt enough, so the system had to pull memory out of the main pool, which, since it was unified, denied some other compenent that needed the ram, Now a branch prediction cant be completed, and since the system is in order, it cant just jump to the next dependentless operation, it has to sit and wait for the other pipelines to lay their instructions to rest, and since the 360 has deep pipelines to mask latency and mitigate an in order cpu, that can take a LONG time, but oh no, that branch prediction was a miss, now we have to flush the pipelin and start ALL OVER AGAIN calculating the correct branch, while everything else waits for FIVE HUNDRED CYCLES (360 branch prediction penalty, deep pieplines= BIG penalties) you can see the problem.

GDDR5 is what you use for a system using an advanced form of UMA, that no longer sucks horribly (but is cheaper) in comparison to dedicated memory heiarchy: DVMT

Since Gddr5 has such high bandwidth and low latency (especially for external main ram, very impressive stuff, as long as its clocked high enough) the system can actually decide how much to use as dedicated video memory, set it aside, use it just like it was dedicated to the gpu, and when its done, return it to the system to be used as ram however the system sees fit. Best of all,when you expand your ram, naturally the amount of ram that can be set aside for your video ram increases as well. Pretty cool stuff. However, Gddr5 does NOT function well at lower clock speeds, at that point you might as well use cheaper ddr3 since you will see the same or even worse performance from gddr5, so that inately restricts DVMT to high power draw high thermal envelope systems. However, gddr5 main memory is much cheaper than seperate pools of high density edram.

Nintendo is using an old fashioned 'orthodox memory heiarchy' *right out of iwatas mouth in the iwata asks teardown interview* It's an oldie but a goody, no matter what situation you find yourself in, you can rely on this straightforwarddesign to get you high effeciency.

Main memory is split into 2 even groups, system, and games (This is likely decided in firmware, so can be changed later down the road if 1Gb is more than the system reallyneeds)

The 1Gb for games IS shared between the gpu and cpu, but not in the volatile high demand tug of war way it was in the 360, which ALSO had to share between the system tasks as well. The 1Gb main ram is more of a lobby, a waiting pool, a bucket, where all manner of not necessarily related things can chill after being pulled from the disc.

On the gpu end, the 32 Mb edram stores 'need now' data from that bucket, and feeds it to the graphics processor with high bandwidth and low latency, getting rid of, and picking up new data from the main pool to keep a smooth cycle of always relevent data for the graphics processor.

On the cpu end, you have very special proprietary ibm edram (read IBM's official press release). Two very tiny caches per core, that have psychotic bandwidth and near no latency. One immediately feeds the cpu (l1) while the other bigger one (l2) acts as a smaller bucket (thats faster to fill and dump than the big bucket) between the l1 cache and the ' big bucket' of main ram, so it never runs out of what it needs.

Edited by 3Dude, 26 November 2012 - 03:33 PM.

banner1_zpsb47e46d2.png

 


#43 Alex Atkin UK

Alex Atkin UK

    Boo

  • Members
  • 528 posts

Posted 26 November 2012 - 08:43 PM

Thanks for the explanation, I had been reading about how the Xbox 360 CPU was pretty inefficient but I had no idea just how bad it was. It makes the old Pentium 4 look like a dream.

That is why I am puzzled about some of the developer comments as the difference in architecture between Xbox 360 and Wii U CPU wise sounds quite similar to what happened going from the Pentium 4 to the Core architecture. In other words, the Wii U CPU has a smaller much more efficient pipeline which in turn means it can do more at a lower clock rate. So it might simply be down to developers having to optimise their engines to the new instruction set to shave off those excess CPU cycles.

Its certainly interesting that the Wii U is only pulling 35W max from a 70W PSU, makes you wonder if the firmware is literally holding the hardware back from its full potential right now. Would Nintendo be so crazy as to forcibly under-clock the hardware so they can unlock the full potential later on with a firmware update? Perhaps they didn't want to scare everyone off with a super loud fan whirring from day one, it certainly was one of my pet hates with Xbox 360.

Edited by Alex Atkin UK, 26 November 2012 - 08:44 PM.

Sheffield 3DS | Steam & XBOX: Alex Atkin UK | PSN & WiiU: AlexAtkinUK

 

How to improve the Wii U download speed.


#44 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 27 November 2012 - 05:32 AM

Thanks for the explanation, I had been reading about how the Xbox 360 CPU was pretty inefficient but I had no idea just how bad it was. It makes the old Pentium 4 look like a dream.

That is why I am puzzled about some of the developer comments as the difference in architecture between Xbox 360 and Wii U CPU wise sounds quite similar to what happened going from the Pentium 4 to the Core architecture. In other words, the Wii U CPU has a smaller much more efficient pipeline which in turn means it can do more at a lower clock rate. So it might simply be down to developers having to optimise their engines to the new instruction set to shave off those excess CPU cycles.

Its certainly interesting that the Wii U is only pulling 35W max from a 70W PSU, makes you wonder if the firmware is literally holding the hardware back from its full potential right now. Would Nintendo be so crazy as to forcibly under-clock the hardware so they can unlock the full potential later on with a firmware update? Perhaps they didn't want to scare everyone off with a super loud fan whirring from day one, it certainly was one of my pet hates with Xbox 360.




Yeah, the 360 cpu was BAAAAAAD.

Fortunately the debugger was godlike.

Well, I think I said this before, but not sure if here. The system doesnt get the maximum wattage the psu draws from the wall, some of that energy is wasted as heat, Id say the wii u probably gets around 60W of that 75.

I dont think its being held back, so much as nothing is actually taxing it enough to draw more power. Iwata said a typical load while playing a typical game would be 45 watts, a demanding game would bring the system up to 50 or near 60. It only needed 3 more watts to play super mario bros u. The systems not even breaking a sweat yet.

Edited by 3Dude, 27 November 2012 - 05:35 AM.

banner1_zpsb47e46d2.png

 


#45 esrever

esrever

    Paragoomba

  • Members
  • 20 posts

Posted 28 November 2012 - 02:40 PM

The 360 cpu wasn't bad. It was designed for getting a decent foat point performance which is what you needed for gaming. The shared cache and such is where the most problem was but it was still the most efficient console cpu of this generation. The CPU in the 360 is about as powerful as a modern atom for general purpose code but that is still pretty good for what it needed to do which was heavy floatpoint. The memory optimizations in software is what is used to by pass the inefficiencies.
The gpu/cpu memory acess worked well enough with the implementation since the gpu is really the only one acessing memory a lot in that situation.

The Wii-u's cpu is much worse. From what I can gather, the memory archietecture is ancient along with very little fast cache means its akin to a pentium 3. It does have a edram to buffer the memory but that is slower than the cache and can only be used as a high level buffer to memory. It doesn't make up for all the cache misses. The prefetching is also slowed down due to the 12.8mb/s main memory.

The edram in the gpu would be used for frame buffers which allows for AA. It can do higher resolution and high AA levels than the 360. For gpgpu, it is very unlikely that the gpu edram is used because the system ram is where the memory is stored and accessing that would still have to go throu the CPU.

#46 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 28 November 2012 - 05:36 PM

The 360 cpu wasn't bad. It was designed for getting a decent foat point performance which is what you needed for gaming. The shared cache and such is where the most problem was but it was still the most efficient console cpu of this generation. The CPU in the 360 is about as powerful as a modern atom for general purpose code but that is still pretty good for what it needed to do which was heavy floatpoint. The memory optimizations in software is what is used to by pass the inefficiencies.
The gpu/cpu memory acess worked well enough with the implementation since the gpu is really the only one acessing memory a lot in that situation.

The Wii-u's cpu is much worse. From what I can gather, the memory archietecture is ancient along with very little fast cache means its akin to a pentium 3. It does have a edram to buffer the memory but that is slower than the cache and can only be used as a high level buffer to memory. It doesn't make up for all the cache misses. The prefetching is also slowed down due to the 12.8mb/s main memory.

The edram in the gpu would be used for frame buffers which allows for AA. It can do higher resolution and high AA levels than the 360. For gpgpu, it is very unlikely that the gpu edram is used because the system ram is where the memory is stored and accessing that would still have to go throu the CPU.




The wii was by FAAAAAAAAAR by ordewrs of magnitude, the most effecient console of this generation. It sustains itself within 92% of its peak theoretical performance. Xbox 360 cant even sustain 65% of its peak theoretical performance.


Wii u is OUT OF ORDER, none of what you said matters, instead of waiting on the low bandwidth main memory for dependent instructions (which actually has a REALLY low latency by the way) unlike the 360 and ps3, which are forced to wait, the wii u can calculate bajillions of other instructions. But thats nothing compared to the hilarious mistake you just made.

An edram cache of 56 Kb (l1) and 256 Kb (l2) turn the 1.2 billion transistor power 7 into a processor with the equivilent performance of a 2.7 billion transistor processor.

You have no clue what you are talking about.

Edited by 3Dude, 28 November 2012 - 05:39 PM.

banner1_zpsb47e46d2.png

 





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users

Anti-Spam Bots!