Jump to content


Photo

Wii U's RAM is slower than PS3/Xbox 360.


  • Please log in to reply
270 replies to this topic

#121 Goodtwin

Goodtwin

    Bullet Bill

  • Members
  • 356 posts

Posted 03 April 2013 - 11:28 AM

It definitely doesn't jive with what we have heard from developers about memory performance.  Remember when ubisoft said they forgot to compress the textures and everything still ran fine.  To make matters worse, its very hard to find good info to make sense of it all for a novice like myself. 

 

I am still trying to follow you on how this works 3Dude, so bear with me.  Are you saying that even if the ram modules are are setup in a 16bit by 256MB organization that it doesn't matter because their are multiple banks of ram in each module?  So the controller is essentially tapping into each of those banks in the module?  Im still a little lost on how we are getting the 12.8 GB/s bandwidth to a single module, that would require a 64 bit bus at the ram module.  This is all about principle for me at this point, memory performance has been praised by multiple developers, but if it really was a clever workaround because of the large amount of edram, you would think Ubisoft would have noticed right away when trying to pull their uncompressed textures from the main ram. 



#122 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 03 April 2013 - 11:33 AM


Alex Atkin UK, on 03 Apr 2013 - 05:36, said:So what are we saying now then, potentially slightly under 4x the bandwidth of Xbox 360 (due to being clocked lower) as the Wii U might use 4 channels and likely at LEAST dual-channel?
I certainly have to agree that it would make little sense to have LESS bandwidth, and with the low-power architecture of the Wii U design overall I would be amazed if it wasn't at least dual-channel as its an almost free doubling of RAM bandwidth without drastically increasing the power/heat.


It DOES use 4 channels. We have pictures now, clear enough to know everything.

4 Chips of ram, 512 MB in each chip, each chip with its own bus.

Each channel has 11 lanes, ddr3 gets 2 bits per clock from each lane.

Each channel is 22 bits. 4 channels makes for an 88 bit bus.
 


Edited by 3Dude, 03 April 2013 - 11:47 AM.

banner1_zpsb47e46d2.png

 


#123 Goodtwin

Goodtwin

    Bullet Bill

  • Members
  • 356 posts

Posted 03 April 2013 - 11:43 AM


Alex Atkin UK, on 03 Apr 2013 - 05:36, said:So what are we saying now then, potentially slightly under 4x the bandwidth of Xbox 360 (due to being clocked lower) as the Wii U might use 4 channels and likely at LEAST dual-channel?
I certainly have to agree that it would make little sense to have LESS bandwidth, and with the low-power architecture of the Wii U design overall I would be amazed if it wasn't at least dual-channel as its an almost free doubling of RAM bandwidth without drastically increasing the power/heat.


It DOES use 4 channels. We have pictures now, clear enough to know everything.

4 Chips of ram, 512 MB in each chip, each chip with its own bus, or 'channel'.

Each channel has 11 lanes, ddr3 gets 2 bits per clock from each lane.

Each channel is 22 bits. 4 channels makes for an 88 bit bus.

 

 

Would that make it 17.6GB/s then?  800Mhz x 2 x 88= 17.6GB/s?  When you say channel, is that every line that you can see going from the ram modules to the MCM?



#124 routerbad

routerbad

    Lakitu

  • Section Mods
  • 2,013 posts
  • NNID:routerbad
  • Fandom:
    Zelda, Mario, Halo, Star Trek

Posted 03 April 2013 - 11:45 AM


Alex Atkin UK, on 03 Apr 2013 - 05:36, said:So what are we saying now then, potentially slightly under 4x the bandwidth of Xbox 360 (due to being clocked lower) as the Wii U might use 4 channels and likely at LEAST dual-channel?
I certainly have to agree that it would make little sense to have LESS bandwidth, and with the low-power architecture of the Wii U design overall I would be amazed if it wasn't at least dual-channel as its an almost free doubling of RAM bandwidth without drastically increasing the power/heat.


It DOES use 4 channels. We have pictures now, clear enough to know everything.

4 Chips of ram, 512 MB in each chip, each chip with its own bus, or 'channel'.

Each channel has 11 lanes, ddr3 gets 2 bits per clock from each lane.

Each channel is 22 bits. 4 channels makes for an 88 bit bus.

So you're basically getting 88bits per clock, per chip.  They are all going into the same memory controller, on the GPU.  we have a 352bit bus overall, and that entire bus is available for accessing RAM in game.  With one memory controller, and only software addressing restrictions, the memory controller can address the RAM in the most efficient way possible, which is alternating chips per address, to reduce refresh latency.  The Memory controller addresses it in this fashion, and creates an abstract address pool for software to access.  The games don't know how many memory controllers there are, all it recognizes are sequential memory addresses delivered from the controller.



Would that make it 17.6GB/s then?  800Mhz x 2 x 88= 17.6GB/s?  When you say channel, is that every line that you can see going from the ram modules to the MCM?

No, DDR3 is quad pumped, meaning you multiply the clock by 4 in the throughput calculation.

 

800 x 2 (bit channel) x 4 (clock multiplier) x 88 (bus width) / 8 (bits per byte)

 

I was using 700MHz in my calculations, but with 800 it comes to 70.4Gb/s

 

Dual Channel referring to the duplexing of the bus.

 

and he is referring to each chip being on it's own 11 lane channel. with 2 bits per lane.


Edited by routerbad, 03 April 2013 - 11:48 AM.


#125 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 03 April 2013 - 11:51 AM


Goodtwin, on 03 Apr 2013 - 05:57, said:Would that make it 17.6GB/s then?  800Mhz x 2 x 88= 17.6GB/s?  When you say channel, is that every line that you can see going from the ram modules to the MCM?


Should be at least double that. Each lane =2 bits, ps3 had 8 lanes, for 16 bit busses,x4 ram chips = 64 bit bus.


banner1_zpsb47e46d2.png

 


#126 Goodtwin

Goodtwin

    Bullet Bill

  • Members
  • 356 posts

Posted 03 April 2013 - 12:01 PM

But looking at charts for DDR3-1600 the memory clock is actually 200mhz, with the bus clock being 800Mhz, and then double data gives the 1600.  So they are already giving you the 4x multiplier when you read the 800Mhz. 



#127 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 03 April 2013 - 12:11 PM


Goodtwin, on 03 Apr 2013 - 06:15, said:But looking at charts for DDR3-1600 the memory clock is actually 200mhz, with the bus clock being 800Mhz, and then double data gives the 1600.  So they are already giving you the 4x multiplier when you read the 800Mhz. 


Yes. I think i brought that up earlier.

Wii u's ram bandwidth should still be higher though.

PS3 has a bandwidth of 22.4 GB/s

for its 700MHz gddr3.

And im pretty sure it was 4 chips on a 64bit bus.

http://www.ps3devwiki.com/wiki/RAM

Yeah, original models had 4 64MB ram chips, and it added up to a 64 bit bus, even though it was 'on die' with the rsx.

argh. I know this is something stupid too.


Edited by 3Dude, 03 April 2013 - 12:42 PM.

banner1_zpsb47e46d2.png

 


#128 routerbad

routerbad

    Lakitu

  • Section Mods
  • 2,013 posts
  • NNID:routerbad
  • Fandom:
    Zelda, Mario, Halo, Star Trek

Posted 03 April 2013 - 12:12 PM

But looking at charts for DDR3-1600 the memory clock is actually 200mhz, with the bus clock being 800Mhz, and then double data gives the 1600.  So they are already giving you the 4x multiplier when you read the 800Mhz. 

The memory clock hasn't really changed from generation to generation of DRAM.  smaller processes (DDR3 started at 90nm) but largely the same memory arrays.  The bus is the main determining factor in throughput, an "all things being equal" situation.  All things being equal, DDR3 has exactly twice the throughput of DDR2, and by extension GDDR3.  GDDR3 has a little better throughput by bumping the bus clock significantly, but has to wait for memory refresh much more.



#129 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 03 April 2013 - 12:29 PM

oh yeaah! Interleaved memory!

banner1_zpsb47e46d2.png

 


#130 GAMER1984

GAMER1984

    Lakitu

  • Members
  • 2,036 posts
  • NNID:gamer1984
  • Fandom:
    Nintendo

Posted 03 April 2013 - 12:31 PM

All sounds good... But we need to SEE it in action. So tired of numbers and debating. Games have to show what just numbers cant

#131 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 03 April 2013 - 12:33 PM


GAMER1984, on 03 Apr 2013 - 06:45, said:All sounds good... But we need to SEE it in action. So tired of numbers and debating. Games have to show what just numbers cant


Get out of this thread with that right now. We are working. Contribute or watch quietly.
 


Edited by 3Dude, 03 April 2013 - 12:34 PM.

banner1_zpsb47e46d2.png

 


#132 GAMER1984

GAMER1984

    Lakitu

  • Members
  • 2,036 posts
  • NNID:gamer1984
  • Fandom:
    Nintendo

Posted 03 April 2013 - 12:41 PM

Working on what... Are you guys going to submit this to neogaf, digital foundry and others... Will they take it seriously?


GAMER1984, on 03 Apr 2013 - 06:45, said:All sounds good... But we need to SEE it in action. So tired of numbers and debating. Games have to show what just numbers cant


Get out of this thread with that right now. We are working. Contribute or watch quietly.


Also watch your mouth 3dude. Don't really care if people see you as this forums resident tech guru... Still doesn't give you the right to talk to people any way you want.

#133 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 03 April 2013 - 12:43 PM


why would we submit this to places incapable of so much as counting lanes on a picture?


banner1_zpsb47e46d2.png

 


#134 routerbad

routerbad

    Lakitu

  • Section Mods
  • 2,013 posts
  • NNID:routerbad
  • Fandom:
    Zelda, Mario, Halo, Star Trek

Posted 03 April 2013 - 12:48 PM

But looking at charts for DDR3-1600 the memory clock is actually 200mhz, with the bus clock being 800Mhz, and then double data gives the 1600.  So they are already giving you the 4x multiplier when you read the 800Mhz. 

The memory clock hasn't really changed from generation to generation of DRAM.  smaller processes (DDR3 started at 90nm) but largely the same memory arrays.  The bus is the main determining factor in throughput, an "all things being equal" situation.  

 

Looked up the part number for the Micron MT41K256M16HA-125:E 

 

The memory clock rate is 800MHz, according to the Micron website.  That number still stands.  Probably 200MHz per 256MB array within the chip.



Working on what... Are you guys going to submit this to neogaf, digital foundry and others... Will they take it seriously?


Also watch your mouth 3dude. Don't really care if people see you as this forums resident tech guru... Still doesn't give you the right to talk to people any way you want.

Well that isn't nice.  We are in this thread to learn something, and constantly vomiting "show me" doesn't help at all.  What the games look like are up to developers, we are just having a little fun decoding the numbers.


Edited by routerbad, 03 April 2013 - 12:46 PM.


#135 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 03 April 2013 - 12:48 PM


AH! The gpu ddr3 interface!

wiiudie_blocks.jpg

Okay, who wants to count the pins?


banner1_zpsb47e46d2.png

 


#136 routerbad

routerbad

    Lakitu

  • Section Mods
  • 2,013 posts
  • NNID:routerbad
  • Fandom:
    Zelda, Mario, Halo, Star Trek

Posted 03 April 2013 - 12:53 PM

oh yeaah! Interleaved memory!

Yep, full duplex interface as well.



80 per edge it looks like, though I may be off a little (between 2-6 pins more perhaps)  Looks like there are 4 clusters of 11 pins, and supporting pins in between, possibly for CPU I/O to give the CPU access to Mem2

 

That is on each side.


Edited by routerbad, 03 April 2013 - 12:54 PM.


#137 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 03 April 2013 - 12:56 PM


routerbad, on 03 Apr 2013 - 07:07, said:Yep, full duplex interface as well.

80 per edge it looks like, though I may be off a little (between 2-6 pins more perhaps)  Looks like there are 4 clusters of 11 pins, and supporting pins in between, possibly for CPU I/O to give the CPU access to Mem2
That is on each side.

Thats pretty much what i got, though it was hard on my eyes, so i thought i may have been off.


Okay, I know something isnt right. So im going to stick with goodtwins 17.6GB/s from the wiki formula and add bandwidth per feature until the bug is ironed out, im just happy 12.8 is 100% busted.

So, those clusters of 11 pins make sense.

44 on each side gives us 88.


Edited by 3Dude, 03 April 2013 - 01:07 PM.

banner1_zpsb47e46d2.png

 


#138 routerbad

routerbad

    Lakitu

  • Section Mods
  • 2,013 posts
  • NNID:routerbad
  • Fandom:
    Zelda, Mario, Halo, Star Trek

Posted 03 April 2013 - 01:07 PM


routerbad, on 03 Apr 2013 - 07:07, said:Yep, full duplex interface as well.

80 per edge it looks like, though I may be off a little (between 2-6 pins more perhaps)  Looks like there are 4 clusters of 11 pins, and supporting pins in between, possibly for CPU I/O to give the CPU access to Mem2
That is on each side.

Thats pretty much what i got, though it was hard on my eyes, so i thought i may have been off.

So looks like DDR3 IO incoming, then exiting to the CPU in paired clusters of 11 pins.  Not sure what the extra pins are being used for.  A total of ~160 pins.



oh yeaah! Interleaved memory!

16 banks per 512MB module, based on 256Mb depth from Micron.

 

Nvm, 8 banks x16 columns, Anandtech thought the x16 bank width referred to bus width, which is not the case.  Bus has little to do with the module itself, and more to do with the interface, because any cell can be accessed at any point inside the module with no variance in access time.

 

256Mb column depth, organized into 8 banks, 88bit interface per chip.


Edited by routerbad, 03 April 2013 - 01:19 PM.


#139 3Dude

3Dude

    Whomp

  • Section Mods
  • 5,482 posts

Posted 03 April 2013 - 01:29 PM


routerbad, on 03 Apr 2013 - 07:21, said:So looks like DDR3 IO incoming, then exiting to the CPU in paired clusters of 11 pins.  Not sure what the extra pins are being used for.  A total of ~160 pins.

16 banks per 512MB module, based on 256Mb depth from Micron.
Nvm, 8 banks x16 columns, Anandtech thought the x16 bank width referred to bus width, which is not the case.  Bus has little to do with the module itself, and more to do with the interface, because any cell can be accessed at any point inside the module with no variance in access time.
256Mb column depth, organized into 8 banks,
______________________________________________________________



This is all spot on. In fact, the ram pdf says the same thing.

__________________________________________

88bit interface per chip.
____________________________________________
88bit interface per chip? This is the info I want.


We need to get the housings off those damn chips.

going by that, wed have 800x4x2x88 = 563,200/8 70,400,

70.4 GB/s.

but i want to see it....


Edited by 3Dude, 03 April 2013 - 01:33 PM.

banner1_zpsb47e46d2.png

 


#140 routerbad

routerbad

    Lakitu

  • Section Mods
  • 2,013 posts
  • NNID:routerbad
  • Fandom:
    Zelda, Mario, Halo, Star Trek

Posted 03 April 2013 - 01:35 PM

But looking at charts for DDR3-1600 the memory clock is actually 200mhz, with the bus clock being 800Mhz, and then double data gives the 1600.  So they are already giving you the 4x multiplier when you read the 800Mhz. 

Aha, it would be 

 


routerbad, on 03 Apr 2013 - 07:07, said:Yep, full duplex interface as well.

80 per edge it looks like, though I may be off a little (between 2-6 pins more perhaps)  Looks like there are 4 clusters of 11 pins, and supporting pins in between, possibly for CPU I/O to give the CPU access to Mem2
That is on each side.

Thats pretty much what i got, though it was hard on my eyes, so i thought i may have been off.


Okay, I know something isnt right. So im going to stick with goodtwins 17.6GB/s from the wiki formula and add bandwidth per feature until the bug is ironed out, im just happy 12.8 is 100% busted.

So, those clusters of 11 pins make sense.

44 on each side gives us 88.

Ahh, it would be 17.6 PER MODULE though, each module is on a separate 88 bit channel width.

 

Funny thing, 17.6 * 4 modules is 70.4, which isn't far off from what we've been saying.

 

Basically the 800MHz number we are using for the memory clock rate rather than 200MHz is effectively giving us the same number based on four chips.  So I think goodtwins was correct with using 200MHz as the memory clock rate (though strangely the Micron website lists it as an 800MHz memory clock, not IO clock).  Going by that, we have:

 

200*2*4*88/8=17,600Mb/s * 4 Modules = 70,400Mb/s  exact same number  :ph34r:


Edited by routerbad, 03 April 2013 - 01:43 PM.





3 user(s) are reading this topic

0 members, 3 guests, 0 anonymous users

Anti-Spam Bots!