Font Size




PCSX2 is perhaps one of the few projects that releases a constant series of videos demonstrating development progress, and whilst I'm not 100% sure on this, I believe the only emulator to have had two "promo" videos demonstrating that yes indeed, PS2 emulation is possible!

These videos are delivered to you via Bit Torrent on our videos page, which for an emulation site is somewhat unique. In fact so "odd", I've had emails asking me if the use of Bit Torrent is legal, surprisingly Bit Torrent can and is used for legal distribution!

You might wonder why such "promo" videos are made, it's somewhat unorthodox, well aside from the fact they are good fun to make, they serve to demonstrate what this project is doing as many people still don't believe that this is even possible!

How are these videos made? Simple!

When using zeroGS you merely have to press F7 to start recording to your preferred codec, you then press F7 to stop recording, at this point you can just quit the emulator and marvel at your video, or you can keep playing! Press F7 again to add more footage to your video, and again press F7 to stop recording, this will produce one video based on where you stop/start recording.

zeroGS automatically speeds up footage to "normal" speeds, which causes issues if you want to record audio in sync with the video footage. P.E.Op.S SPU2 DSound Driver records at "emulated" speed and if the recording was muxed with the video recording, it would be significantly out of sync.

However, you can get around this issue by disabling MTGS / DC modes from the CPU settings dialog, and setting P.E.Op.S SPU2 DSound Driver to Thread Mode. To ensure sync with the video, set P.E.Op.S SPU2 DSound Driver to start recording before you confirm the codec you wish to record with.

You then can put the audio and video streams together in your preferred video editing software, for simple tasks like this I recommend VirtualDub.

The ability to record video directly from a PS2 game has significant benefits over camcorder or TV card recording as you can get fantastic quality at higher resolutions than the PS2 can natively output.

For the gaming enthusiast it lends itself nicely to demonstrating how kick-ass you are at a game, or lets you demonstrate secret areas, advanced combos.

YouTube obviously offers a space to share such videos, and many people from all over the world already show fantastic videos from PCSX2, even the team use YouTube as a low bandwidth option to our high quality torrented videos.

YouTube is also used by some testers to demonstrate intermediate work, and this videos are normally only shown on the official forums as a bonus to our forum browsers, an example of this would be the following video demonstrating Final Fantasy XII no longer suffering from vanishing text:

So go on! Search YouTube for "PCSX2" and check out what our users are posting! I personally get a great deal of pleasure from seeing this emulator being enjoyed and used.

If you feel inclined why not record your awesome combo powa's, or you defeating some insanely tough boss and then place them on YouTube.

If something catches our eye, you may just get a honorable mention here! We already place user made screenshots from the current release in our screenshots section!

Latest Development Video - Kingdom Hearts II

Resident Evil 4 - Pervert (RPGWizard)

FFX-2 Secret Ending (NexXxus86)

PCSX2 Developers 0 - 1 Betatesters
Nothing to do with PCSX2, but testers RULE KTHX! HIHI


Many people have visited the forums giving ideas on how and where Pcsx2 should be optimized. While most ideas sound solid on the outside, they usually will not work in practice for various reasons. This blog will answer some of those burning questions on what Pcsx2 optimizations are important and where development work should be put in to make things run faster. We will touch on why the GPU is the bottleneck on some games and why the CPU is on others. We will also go into the distribution of workload of the various components of Pcsx2 as it is computing away. And most important, we will cover plugin design so that system resources are distributed nicely.

First a note to the people that have played around with optimization or will play around with it. Be careful when measuring performance with frames per second! If anyone told me their optimization gained 5 fps for a certain game, I would not understand what that means! Why? Well if a game went from 5 fps to 10 fps, that means each frame took 200ms and now it takes 100ms. The optimization saved 100ms of CPU time per frame and now the game is 2x faster (this doesn't happen anymore)! If a game went from 60 fps to 65 fps, each frame took 16.6 ms and now it takes 15.4 ms. This is only 1.2 ms of saved time per frame, and the game is only 1.06 times faster. Which optimization do you think is better? Also a 1-2% speed difference is not statistically big enough to say that the optimization is useful. In fact, the fps counter in the title bar fluctuates between 1-2% all the time. So you'll just be picking up noise.


It's been a while since the last site improvements, however this time we have some nice and maybe unique features on our site.

Our compatibility page has been upgraded. The new feature is the AJAX powered toggle boxes which allow you to see games with a particular status.
The status will be remembered when switching between pages.


One of the greatest questions mankind has ever asked is; "How long does Saqib take to give testers a beta?". It's a common question raised by many testers over the years.

One could argue that such questions are not to be answered by mankind, for such knowledge would surely destroy us, well I for one believe that mankind must know, for it could be the key to unlocking one of the greatest advances in Quantum Theory, since zerofrog learned how to collapse entire galaxies with his zeroGS KOSMOS.

To calculate the real world time it will take for saqib to deliver a beta to any given tester, you have to take into account the following variables and factors.

Nagging Factor (N), does the tester have the strength to 'push' Saqib into hurrying along, for most testers this strength is given as merely a fraction of 1 as Saqib is remarkably stubborn!

Thus often N has little impact on the rate of beta delivery, However a tester can use the offer of pornography (P) to entice Saqib to speed up, but this is a double edged sword, a value greater than 3 (3 videos, 3 photosets etc), will create a - ahem 'W' effect, causing the divider in this calculation to be reset to 1.

Coffee Power ( C ) truly an awesome cosmic force in the universe, it's magical beans can break through temporal barriers allowing for the user to work faster! In fact it's powers are so truly incredible, a single mug of coffee acts as a massive multiplier, with each mug being worth ^2.

Laz0ritus  is a common side effect caused by lack of daylight and social interaction, creating an almost coma like state, this is a severe factor and can extend waiting duration by significant amounts of time, such is the effect that the multiplier for this variable can be set as high as 24 hours (1440 minutes).

KOSMOS Temporal Pull (K) When working with KOSMOS all testers are effected by the dragging effect caused by this massive Energy Black Hole (EBH), which consumes entire galaxy clusters constantly, however developers are exposed to higher levels of KTP, which causes their time to move extremely slowly, for every minute that passes in their time, 10,080 minutes pass in our time. The longer a developer has been exposed to this effect the greater the effect.

Temporal Reality Flux Syndrome (F) Early years of PCSX2 development was a risky business, those who ventured into this unknown, often came back 'different', medically the issue is little understood, but is believed to be caused by watching motion at sub 1 FPS, this causes the developer to perceive time outside of PCSX2 to be incredibly fast.

This causes the sufferer to slow down to the more comfortable PCSX2 speeds they became accustomed to in those early days, doctors and scientists have learned in recent studies that the brain and motor functions slow down by 60x normal values.

In some extreme cases it's known to produce such a slowdown that the developer is apparently petrified (Frozen in time). Others consider this is merely a visual side effect as such low levels of motion cannot be perceived by humans in normal space.

Thus the following calculation can be made:
R=Real World Time (Minutes)
S=Saqib Time (Minutes)
K=10080, L=1440, F=6, P=2, C=10, N=2, S=1.


R=10503.43726 minutes.

So a single minute in Saqib Time is equal to 7.29 Days in our time, this work is theoretical at the moment and needs a great deal of refinement, however one can see via this simple equation that we'll be long dead by the time PCSX2 has Saqib's code.

One hopes that a scientist or time traveler gets chance to see this and can offer help and advice for Saqib and his somewhat unique dilemma.


Many 64 bit architectures have been proposed; however, the x86-64 (aka AMD64) architecture has picked up a lot of speed since its initial proposal a couple of years ago. Most 64bit CPUs today support it, so it looks like a good candidate for 64bit recompilation. The x86-64 architecture offers many more registers and can potentially speed up games by a significant amount. Up to now, Pcsx2 has largely been ignoring the 64 bit arena because there have been massive compatibility issues, the developers weren't sure if it was really worth it, and adding a new bug-free and fast recompiler to the existing code base is a very painful process. Anyone seriously suggesting this to a dev would have been laughed out of the chat room. However, the upcoming 0.9.2 release is looking very stable and after doing some research, we have decided to add support for x86-64 recompilation, both for 64bit versions of Linux and Windows (yes, Linux support is returning).

Before going into technical details, I want to cover the current Pcsx2 recompilation model.

Pcsx2 Recompilation

Every different instruction set requires either an interpreter or a recompiler to execute it on the PC. Both are important in emulation. Interpreters are implemented with regular high-level languages and are platform independent. They are easy to program, easy to debug, but slow. They are extremely important for testing and debugging purposes. For example, interpreting a simple 32bit EE MIPS instruction (code) might look like:

switch(code>>26) {
case 0x02: // J - jump to
  pc = (code & 0x03ffffff)*4; // change the program counter
case 0x23: // LW - load word, sign extend
  gpr[Rt] = (long long)*(long*)(memory+gpr[Rs]+(short)code);

Recompilers, on the other hand, try to cut as many corners as possible. For example, we know the instruction at address 0x1000 will never change, so there is no reason why the CPU needs to execute the switch statement and decode the instruction every single time it executes it. So recompilers generate the minimal amount of assembly the CPU needs to execute to emulate that instruction. Because we're working with assembly, recompilation is a very platform dependent process.

Simple recompilers look at one instruction at a time and keep all target platform (in this case, the PS2) registers in memory. For every new instruction, the used registers are read from memory and stored in real CPU registers, then some instructions are executed, and finally the register with the result is stored back in memory. Before 0.9, Pcsx2 used to employ this type of recompilation.

More complex recompilers divide the code into simple blocks (no jumps/branches) and try to preserve target platform registers across instructions in the real CPU registers. There are many different types of register allocation algorithms using graph coloring. Such compilers might also do constant propagation elimination. A common pattern in the MIPS Emotion Engine is something like:

lui s0, 0x1000
lw s0, 0x2000(s0)

If we propagated the constants at the lw, we know that the read address is 0x10002000.

A little more complex recompiler will know that 0x10002000 corresponds to the IPU, so the assembly will call the IPU straight away (without worrying about memory location translation).

There are many such local optimizations, however they aren't enough. At the end of every block, all the registers will have to be pushed to memory because the next simple block that needs to be executed can't be predicted at recompilation time (ie: branch if x >= 0 depends on the value of x at runtime).

An even more complex recompiler can work on the global scale by finding out which simple blocks are connected to which. Once it knows, it can get rid of the register flushing at the end of every simple block by simply telling the next blocks to allocate the same real CPU register to the same target platform register. This is called global register allocation and sometimes uses Markov blankets for block synchronization. For those people that know Bayes nets, this is very similar, except it applies to the global simple block graph. Just think about the nodes necessary for making a specific node independent with respect to the whole graph. This will include the node's parents, children, and the children's parents. For those that just got lost... don't worry.

The Pcsx2 recompilers also use MMX and SSE(1/2/3) interchangeably. So an EE register can be in an MMX, SSE, or regular x86 register at any point in time depending on the current types of instructions (this is a nightmare to manage).

Console emulators rarely need to go through such complex recompilers because up until a couple of years ago, consoles weren't that powerful. But starting with the PS2, consoles got powerful and the Pcsx2 recompilers for the EmotionEngine and Vectors Units got complex really fast. Pcsx2 0.9.1 supports all the above mentioned optimizations plus many more unmentioned ones. The VU recompiler (code named SuperVU) is by far the most complex and fastest. Anyone who wants to keep their sanity should stay away from it.

For those that remember what it was like in the 0.8.1 days can appreciate how powerful the 0.9.1 Pcsx2 optimizations are.


So why isn't x86-32 enough? Well, for starters the Playstation 2 EE has 32 128bit regular registers, 32 32bit floating point registers, and some COP0 registers. Most instructions work on 64 bits, the MMI instructions work on the full 128bits. On the other hand, the x86 CPU has 8 32bit general purpose registers (one is for stack), 8 64bit registers (MMX), and 8 128bit registers(SSE). And you can't combine the three that easily (ie: you can't add an x86 register with a SSE register before first transferring the x86 to SSE or vice versa). So there's a very big difference in registers sizes. Because of the small number of x86 registers, the recompiler does a lot of register thrashing (registers are spilled to memory very frequently). Each memory read/write is pretty slow, so the more thrashing, the slower the recompiler becomes. Also, x86-32 is inherently 32bit, so a 64bit add would require 2 32bit instructions and 4 regular x86 registers for the source and result (2 if reading from memory). The EE recompiler tries to alleviate the register pressure by using the 64bit arithmetic capabilities of MMX, but MMX has a pretty limited ISA and intra-register set transfers kill performance.

The registers on the x86-64 architecture are: 16 64bit general purpose registers, 8 64bit MMX registers, and 16 128bit SSE registers. This amounts to twice the number of register memory! This means much less register thrashing. On top of that, 64bit adds/shifts/etc can all be done in one instruction.

However, the story isn't as simple as it sounds. The recompiler has to interface with regular C++ code constantly (ie: calling plugin functions), so the calling conventions on the recompiler boundaries must be followed exactly. The x86-64 specification can be found here and is pretty straightforward. However, Microsoft decided that it wanted its own specification (for reasons not quite known to anyone else).. so now there are two different calling conventions with a different set of registers specifying arguments to functions and another different set acting as non-volatile data! (Thanks Microsoft, it wasn't difficult enough)

Because the size of the registers changed, all pointers are now 64 bits, which adds many difficulties to reading and writing from memory, incrementing the stack, etc.

Virtual memory is yet another obstacle to overcome with 64bit OSs. The AWE mapping trick (described in an early blog) has to be refined. But now that the address range is much bigger, there are less limitations. VM builds for Linux also need a completely new implementation.

Finally, if anyone has seen Pcsx2 code, they would know that inline assembly is pretty frequent in the recompilers. The reasons we use inline assembly rather than C++ code are many. Actually, some things like dynamic dispatching become impossible to do with C++ code. So, inline is necessary... and it looks like Microsoft has disabled all functionality for inline assembly in 64bit editions of Visual C++!!!! (Thanks again Microsoft, you just know where to strike hardest)

With all the mentioned challenges, it will take a couple of months to get things working reasonably stable. By that time, more people would have switched to 64bit OSs. If we're even half right in our estimates, Pcsx2 will run much faster on a 64bit OS than on a 32bit OS on the same computer once x86-64 recompilation is done.


Moral of the blog Most recompiler theory discussed here actually comes straight from compiler theory. Compilers will always be necessary as long as engineers keep coming with new instruction set architectures (ISAs). Learn how a compiler works. I recommend Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman.

You are here: Home Developer Blog