- Created on 22 July 2009
- Written by Jake Stine
One of the major changes planned for the next release of PCSX2 is a complete overhaul of the gui/frontend, switching from the current "native" Windows/Linux interfaces to a single unified interface that's (more or less) the same for both platforms. To make such a grand vision possible we needed the help of a third party gui tooolkit, from which there were several to choose from. After much consideration we settled on wxWidgets, which is the same toolkit used by a few other notable emulators, such as Dolphin.
So what are the benefits of the new PCSX2 interface in development?
1. A vastly improved Linux interface.
... and not just in terms of what you see, but how the emulator cooperates with the operating system as a whole. wxWidgets provides dozens of very useful cross-platform tools that are much more well-suited to Linux functionality than the current PCSX2 codebase. The new interface will have a smarter plugin loader, and multilingual support as well. It will also be easier to install and run as a user other than root.
2. Windows XP/Vista Themed appearances will finally work!
Until now, PCSX2 used the old Win95 ASCII-based Common Controls libraries, and this forced Windows to disable themes support when rendering the PCSX2 interface. Buttons were plain and unshaded, and dialog boxes such as the File and Folder pickers were small, ugly, and lacking in features. All of this will be fixed in the next release.
3. More complete internationalization support.
Translations of the PCSX2 interface will be easier to make and maintain, and shouldn't become broken or obsolete from program updates, which was a persistent problem in the older PCSX2 interface design. In addition, PCSX2 will be fully Unicode compliant on both Windows and Linux platforms, which should help resolve various oddball problems the emulator may have encountered on non-English operating systems.
4. A more responsive multithreaded interface which will remain fully accessible even while games are running!
The new interface will run on its own thread separate from the emulator's EE/MTGS threads, allowing it to remain open and accessible in the background, without the need to shut down the GS window. This should improve the stability of "pausing" emulation, and also opens the door for expanding on interface integration; such as binding popup menus, toolbars, status bars, or other useful things to the GS window.
Furthermore, if a game deadlocks the system, the interface will usually remain responsive even as the rest of the emulator freezes up (which might not be much of a feature for end users, but is quite handy for us developers).
5. The Teaser Image!
And here's a sneak peek at the current GUI-Work-in-Progress:
... yeah it's not a whole lot to see yet -- I might post some more as development progresses.
From a coder's perspective:
In all the wxWidgets experience so far has been a decent one. There are a few annoyances, but those tend to be more the fault of cross-platform considerations (some things are not supported well under Linux, or vice versa), or more commonly due to limitations and design flaws in the C++ language itself rather than of wxWidgets (in particular, C/C++ make it especially difficult to work with unicode strings in a 'nice' way). For the most part wx tries to model itself in the image of the .NET Framework and Java Framework API designs, which are good designs to follow.
- Created on 04 July 2009
- Written by Jake Stine
While not likely to win the prize of being featured in PCSX2's next release (presumably 0.9.7, but you can never be too sure), this logo might deserve honorable mention. I present for you a moment of Zen:
... is that an exploding cheeseburger on his robe?
- Created on 23 June 2009
- Written by Jake Stine
There are a lot of buzzwords in emulator recompilation. Popular ones include:
- Intermediate Representation
- Intermediate Language
- Register Mapping
- Register Allocation
- Constant Propagation
- Constant Folding
- Register coloring
- SSA (static single assignment optimization)
What do they mean? And perhaps more importantly, what do they mean to a user of the emulator? Truth is, not much usually. Most of these things are technologies and strategies borrowed from high level language compilers like those for C/C++, C#, Java, etc. Some of them are useful to emulator recompilers, some not so much.
The first thing to consider when working on a recompiler is that we are working with what is most likely optimized code. The machine code that your favorite games are running on your PS2 is already parsed and optimized by a compiler (or in the case of older consoles, hand-optimized). Secondly, recompilers typically have a single-block scope limitation (which is hard to explain, but basically means that compilation stops when branch conditionals are encountered). This all but eliminates the usefulness of SSA and register coloring techniques, since their main benefits are in applying optimizations over a series of conditional code blocks, and elimination of dead code. Furthermore, even when higher level optimizations can be applied, the emulated CPU/Register states must still be guaranteed at frequent intervals. Unlike high level languages, an emulator must manually track things like the Program Counter, instruction pipeline stalls, and other hardware complications. So typically the benefits of such cross-block optimizations get watered down anyway.
Constant Folding and Constant Propagation are, for all intents and purposes, the same thing. They always work together, and most people use the terms interchangeably. You'll never really find a practical situation where one is used without the other. Constant folding refers to the evaluation of constant expressions like 100 * 5. Constant propagation refers to the substitution of variables with known constants, like:
x = 100 * 5;
y = x * 10;
... in which case, the value of 'y' is known at compilation time, and can be further substituted (propagated) anywhere 'y' is used (this should remind you of your 5th grade algebraic homework!). As mentioned before, many people use the terms interchangeably, so when you see another emulator talk of Constant Folding, they mean the same thing I do when I talk of Constant Propagation. The term 'propagation' is technically more correct, but folding is easier to type and looks nicer when naming functions in your recompiler.
Intermediate Language (IL) and Intermediate Representation (IR) are more or less the same thing as well. IL simply implies an IR that has a (mostly) human-readable form. That is, an IL is in fact a programming language, usually bearing some similarity to assembly code, but simpler and more sensible. An IR is just a raw collection of data sufficient to represent all information needed to optimize code and generate the final recompiled product.
In either case, the dual purposes of an IR/IL are:
- To simplify the instruction set as much as possible so that optimizations can be analyzed without requiring a lot of special-case code.
- To provide a platform-independent "stage" to the compilation process, so that ports of the recompiler to new target platforms need not be rewritten from the ground up.
Because of the complicated nature of an emulator, an IL itself is virtually useless. There's typically too much per-instruction cpu state information that has to be tracked for a human-readable language output to be viable. But for the same reason an IR can be remarkably helpful in reducing the overall complexity of a recompiler implementation, and is almost a foregone necessity when implementing Register Mapping.
Register Mapping and Register Allocation are once again a set of fairly interchangeable terms. I prefer the term mapping, but other folks like to call it allocation. Register mapping/allocation is typically one of the final stages of recompilation since it's dependent on the target platform (in our case x86), and is also one of the most complex. It's also typically not very beneficial for performance when the target platform is an x86 machine (which in our case it is), unless register mapping algorithms are very clever.
As I mentioned in my previous blog entry, the most significant factor in a recompiler's speed is simply the fact that it's acting as a decoded-instruction cache, and that it executes instructions in bursts (a block at a time instead of one-by-one). All the rest of these techniques tend to be more for the factors of code maintainability and academic challenge than for end-user performance gains.
- Created on 02 June 2009
- Written by Jake Stine
One of the less obvious things that has plagued Pcsx2's compatibility over the years is its event handling system. The system in place as of 0.9.6 is adequate for interpreter-based emulation but is not well-equipped to handle the methods that a recompiler uses to manage cpu cycle updates. This is something we aim to fix in the coming weeks.
Cycle-based Timing Explained
All cpus have a cycle rate, which is typically the Mhz/Ghz values you're most familiar with when talking about any cpu. An i7 clocked at 2.83ghz has a 2.83ghz cycle rate. Now, the actual throughput of instructions can vary greatly since each cycle of the cpu consists of several stages and multiple piplines, each of which can have dependency stalls and has varying rules for when such stalls occur. The cycle rate, however, is always 2.83ghz. Because cycle rates are a known constant, they make a good barometer for synchronizing the activities of a multi-processor design like the Playstation 2.
Why do Recompilers Complicate Event Testing?
Recompilers work as a significant speedup over interpreters by doing two things:
- Recompile the machine code of a emulated CPU (in our case MIPS instructions) into code native to the host machine (ix86 instructions).
- Prefetch and pre-decode emulated instructions, and inline them into blocks.
The thing recompilers are most well-known for -- recompiling to native machine code -- is actually the less effective of the two things recompilers do for speeding up emulation. The primary speedup typically comes from the prefetching and inlining of instructions, which in addition to eliminating the instruction fetch/decode stage (by far the slowest part of any interpreter), also allows for cross-instruction optimizations such as constant propagation and register caching/mapping. In other words, a recompiler is effectively executing emulated instructions in pre-compiled bursts. This is so important to performance that a recompiler without block-level execution would hardly be any faster than an interpreter.
As part of the design of block-level execution, the recompiled code only updates cpu cycle counts and tests for scheduled events at block boundaries. Blocks typically span 5 to 35 cycles, but in some cases can span a hundred cycles or more. When the subsequent Event Test is performed, several scheduled events may be pending execution. This is where problems can occur: The current event system implemented into Pcsx2 executes all pending events in no particular order, leading to events being executed out-of-order when multiple events time-out during a single block. Typically most events don't have dependencies on each other, or games don't use them in a way that execution order matters. But sometimes they do, and in those cases behavior can be unpredictable, or can cause the game to fail outright. To make matters worse, the pending events typically don't know how late they are, and will re-schedule subsequent events in increasingly belated fashion. The current implementation of EE and IOP counters have tons of complicated code meant to compensate for this limitation (both slow and were nearly impossible to get right).
The fix for this is to use an event system I'll call decremental delta time. It has three advantages:
- Makes it easy to execute events in scheduled order regardless of the amount of time which has passed since the last Event Test.
- Maintains relative cycle scheduling at a high level so that none of the events being re-scheduled "lose time" due to belated block-boundary event testing.
- Simplifies event handling on all levels, and provides significant speedups for event testing and event dispatching.
It's hard to know beforehand just how beneficial in-order execution of events will be. I'm anticipating that it might actually fix a few emulation problems on the IOP recompiler in particular, since it has a slow cycle rate and also has a handful of events which can have potential inter-dependencies. For that reason I'll be implementing the system first into the IOP, and then when all the chinks in its armor are worked free we'll port the EE side of the emulator over to it.
- Created on 13 February 2009
- Written by Jake Stine
Zerofrog documented the concepts of Virtual Memory a few years back. So now I figure it's VTLB's turn, since it's the new exclusive memory model used in current AVNs and any future releases.
So what is VTLB? VTLB stands for Virtual Translation Look-aside Buffer, which for most of us is a lot of common everyday words that, when put together like that, don't mean much at all.
Firstly, the memory model names VM and VTLB refer to the systems in place inside PCSX2 for emulating the Playstation2's memory, and don't actually refer to what's being emulated. In fact, neither VM or VTLB builds emulate the PS2's own TLB memory model, which can be misleading since VTLB contains the letters "TLB." The VM build was incapable of emulating the PS2's TLB without significant speed penalties and complications. VTLB on the other hand has the potential to emulate the PS2 TLB, but we haven't added functionality for it since it also depends on some other not-yet-complete areas of emulation (namely MIPS TLBMiss exceptions). As Zerofrog explained in an earlier blog, very few games utilize the TLB features of the PS2 anyway, so it's pretty much at the bottom of our wishlist at this time.
Conceptually, VTLB is surprisingly simple. It works by building a look-up of the PS2's physical memory on a per-page basis, and then defining actions or "handlers" for some pages, and defining other pages as "direct access" (fast mode). A page of memory is 4096 bytes long, so the PS2's 32Meg physical address space translates into 8192 total pages of memory, which ends up being a pretty small and efficient size as far as lookup tables are concerned. By comparison the current EErec uses a lookup table with 8 million entries!
So when a PS2 instruction performs a memory operation (usually referred to as a memOp), the VTLB grabs the lookup address. If the address has the "special handler" bit set, it routes to that handler. If the handler bit is not set, the address is treated like a normal pointer to physical memory. As an optimization, VTLB uses the sign bit of the 32 bit address for the purpose of differentiating handled memory pages from direct memory pages.
The pseudo-code looks like this, as performed for a write memOp:
uint page = ps2_addr/4096;
uptr pc_addr = vtlb_lookup[page];
if( pc_addr & 0x80000000 ) // sign bit check
*pc_addr = data;
handler[page]( ps2_addr, data );
By default, Pcsx2 utilizes the VTLB's handlers for several areas of Ps2 memory that hold hardware registers. Hardware registers are memory addresses that control the whats, whens, and hows of the Ps2 -- write to a specific memory address and the PS2 starts a DMA transfer, or changes the video mode, or plays a new audio sample. These writes have to be intercepted and handled by the emulator. The VTLB allows us to do that very efficiently.
The reason why VTLB is able to emulate the PS2's own TLB is thanks to the handlers, which can remap memory anywhere at any time, with any set of permissions. If a game decided to remap some pages of memory to a different virtual address, those pages would have the "handler bit" set TRUE, and then the handler for those pages would be instructed to remap the memOp to the appropriate physical address.
Thus, the flow would be as such:
Ps2_Virtual_Address -> VTLB_Lookup -> Handler -> Ps2_Physical_Address
The benefits of this model are three-fold: efficiency, extensibility, and ease of debugging. If a game doesn't use the Ps2's TLB, then the VTLB will simply use direct memOps (fast!). If a game does happen to use the TLB, then VTLB can remap the memory as needed, allowing that game to emulate correctly also without having to needlessly burden other games with the overhead of virtual memory remapping logic. And to top it all off, handler mappings can be traced and dumped quickly and easily at any stage of emulation.