Q4 2021 Progress Report
Looks at the date and notices something. Yes, my dear readers (or glancers) Q1 2022 should’ve been published 2 months ago and this is the predecessor to that. Sorry for the delay but there have been several reasons to this almost half year delay. Some of those reasons being the process of making the progress reports:
- Publish on private forum thread who has access
- Needing it to be HTML without a working preview
- Having multiple middlemen to check, update and publish on preview site
- Finally go mad at how inefficient the process was that 80% of the time was spent on making sure that all the Pull Requests are listed and pointed correctly instead of actual writing
Ok, moving on to better news is the the fantastic work from everybody involved in making sure that PCSX2 doesn’t cave into a lesser program. In case you didn’t notice we have a new site that brings PCSX2 from 2002 to 2022 as PCSX2 is reborn, and besides, seeing that amazing animation on the homepage is something that you have to decide for yourself on how you feel about it.
So all around we will improve all aspects of the emulator and have better progress reports that cater to the technical, light-hearted or the changelog oriented people. There are also a ton of pictures in this edition of the progress report and I’m sure that quite a few of you readers are waiting on the new look of the Qt GUI, but that will still take a while to see in fruition to the public.
Here is a glimpse of what to expect:
Core Improvements #
Fixes some more videos not playing like Digital Devil Saga and fixes texture issues on Shadowman.
Disc-Swapping has always been a highly requested feature for multi-disc games which may ask you to swap to the second disc half-way through the game.
Others that use it as more as an addendum to unlocking more items in games such as Dynasty Warriors 5. Another game like Alter Echo which requests to put the original disc back when you eject the game even though it has only 1 disc. Surprisingly the SingStar series might handle this the best as it can swap songs via disc-swapping and sure there is a button to do disc-swapping but it correctly handles swapping without it.
As you see, there are multiple behaviors possible when you are using the disc-swapping functionality (e.g. ejecting games) ranging from quite a few games being fine and nothing happening until it requires new data and thus likely crash to giving you error messages or simply freezing.
Another refinement on how the disc emulation handles reading and latencies which again improves the timing which the PS2 so desires.
Queues 1 sector from a future read as ready which fixes Star Ocean 3 stuttering and Winning Eleven 8 crashing.
Instead of forcefully ‘pulling the cord’ when you reset or press the power button on a real console once, it will send a signal to reset. So it acts more like how the console worked.
Buffers up to 16 sectors on a DVD from its current position.
Documentation from developers shows that the DVD drive will always read 16 sectors as a minimum, so this means if it reads in advance, it can buffer up a bunch of sectors to be read off quickly by the DMA.
Changes how the date is handled for input recording as some games like Metal Gear Solid 3 are sensitive about the date for time-based events.
The DMA timing calculations were changed to be based on the PS1 DMA timings, but with the difference in bus widths considered, leading to closer DMA timings for CDVD which in turn fixed SpongeBob Lights, Camera, Pants.
Improves the timing of read times when buffering sectors but also corrects the status of the CDVD when running certain commands.
Corrects some of the statuses and IRQ reasons available to the CDVD, correcting the status of the CDVD controller in certain games. In turn it fixes certain games that were spinning eternally in a black screen or seem to just freeze on the spot like Spyro, Aberenbou Princes, Evergrace and more.
This was causing broken lighting in 64-bit mode, due to the larger number of registers available. Ft gets loaded and cached, but then the FMAC instructions clamp it in all modes except none, which disturbs the cached value (mismatched with the VU state). Jak happens to rely on this value not being clamped, so it was “okay” in 8-register mode because it had to be reloaded.
Makes savestates more robust by giving more information so as to lessen the chance of breaking the game.
Preserve XGKick cycles calculated when there is a memory write in a delay slot, also added handling for xgkick sync on single instructions.
Previously there was no handling on single instructions (evil blocks) so that’s sorted. The other problem was if there was a mem write in a branch delay slot, it would add the XGkick cycles it needed to run, then erase them! causing the sync to go out, this resolves it.
The current stable (1.6 as of writing) had multiple back-ends namely Xaudio2, DirectSound, PortAudio, WaveOut. DirectSound was being a buggy mess to maintain, WaveOut wasn’t much better, PortAudio was fine and Xaudio2 was de facto standard on the Windows side. Now Cubeb replaces PortAudio as its successor and keep Xaudio2 as a back-up. Keep in mind in Cubeb the latency slider states 100ms in the GUI but isn’t exactly true as it automatically uses a very low latency automatically based on your system:
(Cubeb) Minimum latency: 10.00 ms (480 audio frames)
(Cubeb) Minimum latency: 25.00 ms (1200 audio frames)
If it’s above 25.00 ms you either have a computer issue like corrupt drivers or your computer is far too weak.
Xaudio2 can’t handle the same low latency that Cubeb has without bad skipping and warping even on better systems. I hope you guys like the sound. How time-stretching actually works is that you see the first video frame and the sound comes after the targeted sound latency, which for years was defaulted to 0.1 seconds of latency.
This Pull Request made the last non-working game work and looks more like a current Linus Tech Videos about VR with all those wires than a PS2 accessory. I’ll stick to my Black Nintendo Wii instead of this seemingly weird copy-cat.
Most users won’t see any usage with this and that is fine but the goal of emulating the endless USB and PAD devices do scare me, here is a small subset of still needing to be emulated: https://github.com/PCSX2/pcsx2/issues/4763
Shadowing dwStatus for the return value of GetAdaptersAddresses will prevent the return value of second call from being inspected in the following if statement.
If the user had a large amount of network adapters, this would prevent the code from getting the adapter information of a the selected pcap adaptor.
The equivalent TAP adapter code is already correct.
Changes how DMA Transfers are handled for example some games like them to be in a specific order.
Fixes https://github.com/PCSX2/pcsx2/issues/5168 (Top Trumps).
Fixes https://github.com/PCSX2/pcsx2/issues/4063 (Phase Paradox). Improves the moving billboard quality in Test Drive (Master has corruption).
Fixes video hang in Eggo Mania/Egg Mania - Eggstreme Madness (patch no longer required).
Fixes Smackdown Shut Your Mouth Titantrons.
Fixes Gladiator - Sword of Vengeance videos (patch no longer required) Partial https://github.com/PCSX2/pcsx2/issues/3489 .
Fixes https://github.com/PCSX2/pcsx2/issues/4360 (Flipnic UFO mission hang).
Adds a new member function to the DebugInterface for retrieving the symbol map for the CPU and uses this where relevant instead of accessing the map directly.
Previously there was only one symbol map, which doesn’t make a whole lot of sense. This prepares for future work on IOP symbol detection.
When multiple lines of opcodes are selected, the ‘Assemble Opcode(s)’ context menu and M-key shortcut will in turn reassemble all of those opcodes.
Now you can finally go into a search for specific memory address or string instead of scrolling just like in cheat engine.
- I think Fobes needs a hug for this secretive message.
Fixes the debugger view for registers when the system DPI is not the default 100%.
Not sure why certain memory was blocked from being modified, this will alleviate all memory restrictions.
Make sure certain text doesn’t hard-crash the search. (Print size_t with %zu instead of %d)
Input Recording #
Gives a specific date for input recording (speedrunning).
Miscellanous Core #
Folder memory cards weren’t recognized as a memory card being plugged-in unless you opened the config dialog.
This pull request has brought permanent downloadable (pre)releases on GitHub itself instead of just using Orphis which will not make everything more central but makes it easier to tag commits that are made outside of a pull request and just force-pushed the changes to the project. stares at certain people that have been naughty.
If you want to see more details, Vaser has written an essay-like detail on it - https://github.com/PCSX2/pcsx2/pull/4914
So it will pre-compile working versions of the nightlies/dev and future stable versions on GitHub forever instead of only temporary on GitHub or what was used in the past being AppVeyor (nickname: Slowveyor) which took easily 10-20 minutes per build.
The nice thing about actions is that it can do multiple builds in parallel for free and can also publish these now permanent builds which are again linked on pcsx2.net as GitHub requires you to have an account in addition to being logged in so you have enough options.
GS Improvements #
This PR contains the following changes:
- Prevents clang from optimizing out our denormal-removal shuffles (10x faster than before for people who compile with clang!)
- Run divides on four elements at a time instead of two elements and two useless numbers
- Remove inaccurate stq
- With the above division improvements, on processors with partially-pipelined division (Ivy Bridge and later, Bulldozer and later), accurate stq is actually faster (according to both IACA of inaccurate vs accurate and LLVM MCA). On older CPUs expect performance to be about 2/3 of the old algorithm before taking into account improvements from not double-checking vertices.
- There seems to be a check of accurate_stq when the OGL backend is deciding whether to use geometry shaders to process sprites.
In the end, ignoring clang issues, GSVertexTrace::FindMinMax goes from taking about 3% of MTGS thread runtime to 1.5% on my computer. (Most of the time was spent doing OpenGL things so if you have a more efficient OpenGL driver it might make more of a difference for you)
Burnout games weren’t emulated correctly due to the texture cache being the biggest pain in the GS side, which you can avoid by switching to the Software renderer and then switching to HW. The game downloaded the texture and then modified it to finally draw it on the CPU side.
Now no more shenanigans in having to switch or ignore the sky issue, there is another game that has a similar issue a shooter called ‘Black’ (A recurring theme that PS2 game title names fit with their badly emulated issues). Black hasn’t been fixed yet, perhaps in the future.
“Drawing triangles is super easy to parallelize! GPUs are super parallel, and it should be nice and easy to spread a software renderer across lots of CPU cores!” is probably what most people think. And this is generally true, if you’re trying to software render for a normal PC graphics API like OpenGL. But we’re emulating a GS. That’s okay though, it’s still a just bunch of triangles, right? PCSX2 has a software renderer, and it lets you set the number of cores it used, so we can see how well it scales. And if you messed with that number, you’d generally find that going above 3 worker threads didn’t help much. In some games, like Hitman Blood Money, you’d get the best performance with 0 worker threads. What?
The culprit is thread synchronization. But why would you need to synchronize worker threads when drawing a bunch of triangles? It turns out some PS2 games like to use the result of a previous draw as the texture for another draw. Whenever a game does this, one thread could need pixels drawn on another thread, so we have to sync them all up. PC games do this too, but they’ll usually only do this a few times per frame, rendering an entire image before reading it. When processing effects, PS2 games like to do this on small parts of images at a time, resulting in huge numbers of thread synchronizations. Ratchet & Clank, which previously saw speed start to die off at around 3 worker threads, requires about 500 synchronizations to draw a single frame. Hitman Blood Money, whose title screen’s FPS previously ran in the single digits, sees a whopping 10,000 synchronizations in a single frame, or 600,000 per second! Ouch!
So what went into one of these synchronizations? Before this PR, the control thread would see the need for synchronization, go to the first worker thread, see that it was still running, and go to sleep. When that worker thread finished its work, it would wake up the control thread, which would then go through the rest of the worker threads, checking that they were finished as well. While this was happening, the worker threads would each see that they have no work to do, and go to sleep themselves. Then, the control thread would decode the image they just rendered into the format worker threads can read, at which point it would submit the next piece of work for the worker threads to process, and go through and wake all the worker threads back up.
Sleeping and waking threads is pretty fast, but it’s not that fast. One round trip seems to take around 5-10 microseconds on most computers. Which seems pretty fast, until you compare that to the average time one of the threads spends rendering a pixel: 5-10 nanoseconds. So in the time all these threads spent going to sleep and waking up, a single thread could have rendered 1000 pixels! Another way to look at it is how much time Hitman Blood Money spends watching its threads sleep and wake rendering a frame: 50-100ms. That’s like 10-20 fps, and we haven’t even rendered anything!
So how do we solve this? Well you could say “just sync less”, and we are working on that (some of the more recent autoflush PRs have helped here), but that’s not an especially easy solution. The easy solution is “just don’t go to sleep”. In the new implementation, threads keep spinning, staring at the status variable, for up to 50 microseconds before going to sleep. Maybe not the best use of CPU time, but it brings the time for one round trip communication from 5-10 microseconds down to just 0.2 microseconds, or 200 nanoseconds, an over 10x improvement. With this, using multiple threads manages to not be slower than 0 threads on even Hitman Blood Money, though it isn’t much faster either. For less sync-intensive games, the falloff point for increasing threads has increased and is now usually in the 4-5 thread area (assuming you have that many cores on your CPU), bringing PAL R&C3’s FPS from having dips into the mid 30s to a smooth locked 50 on a Ryzen 5 5600X.
The way the PS2’s GS stores images in memory is a bit special. If you took a piece of memory and just read it like normal, your image would look like a scrambled mess. The GS actually has a different scrambling for each of its image formats, often referred to as swizzles, which allowed devs to do more effects on the shader-free GS. As a result, PCSX2 has to deal with a lot of different possible swizzles.
Calculating swizzled offsets for every pixel of an image every time you read it would be pretty slow, so GSOffset was created to work around this. A GSOffset stored a list of memory offsets from the top left pixel in an image to each pixel on the left edge, and a list of offsets from a pixel on the left edge to each pixel not on the left edge. This took up 16kb, but it allowed PCSX2 to go from some pixel coordinates to the memory address of that pixel with two table lookups and two additions. Since a GSOffset was only valid for a specific combination of texture format, texture size, and texture starting address, a new GSOffset was generated for each unique combination of those a game used.
For most games, this worked fine. But a few games, including MLB Power Pros, Remote Control Dandy SF, and Ultimate Spider-man, liked to throw garbage data at the GS register that describes the information for the current texture. For for each piece of garbage data, PCSX2 would create a new 16kb GSOffset, filling up a 32-bit application’s 4GB address space and running out of memory in just 40 seconds. (Sure, 64-bit has access to more memory, but if it filled up 4GB in 40 seconds, even a computer with 64GB of ram wouldn’t last long…) So why not just clean up unused GSOffsets and free them? GSOffset was clearly not made with this in mind. It didn’t provide anything to help with actually using its data, so any code that wanted to use it had to do all the calculations manually. Some code pulled pointers from the GSOffset and threw them all over the place. Cleaning all that up for liveness tracking would have been a pain, and the lack of nice methods for using its data was also annoying, so we instead opted for replacing GSOffset entirely with a new class that didn’t require any runtime allocations (and therefore had nothing to track).
The new GSOffset takes advantage of the fact that the table of offsets from a pixel on the left edge to any other pixel doesn’t actually depend on the texture size or starting address, only on the texture format. One of these tables is created for each of the PS2’s supported texture formats by the compiler, which is possible because there’s a fairly small number of them. The rest of the calculations (e.g. the left edge calculation) are done whenever they’re needed, taking advantage of the fact that images are usually read left to right, top to bottom, so while a 512x512 texture has about 260 thousand pixels, reading it using the GSOffset would only run the slightly slower left edge pixel calculation 512 times. In the end, this slightly slower calculation was offset by the reduction in time it took to look up and create GSOffsets, resulting in no performance regression from this change.
For what feels like forever, transparency didn’t work correctly for Metal Gear Solid 3 and other games such as Gran Turismo 4.
If you want to printout your stats from the SW renderer, it doesn’t look as readable as it was badly aligned and has no header.
Fixes broken shadows from Kingdom Hearts Re-Chain which produced lines.
Due to the way some effects work on the GS and how our Texture Cache handles it, sometimes the wrong texture format can be remembered, which will cause it to get stored in an incorrect memory format when it is saved back to the real GS memory for downloading to the EE core. This PR corrected this behaviour and properly fixes the flashlight in the Silent Hill series:
This PR reverts an older commit from 2013 (1.2 era) which had wrong assumptions on texture region repeating and how the clamping is handled along with it. Multiple games were affected on the visual side or even performance-side.
Test Drive Limited (Blue roads):
Kaan Barbarian Blade (Black Character Model):
The Chronicles of Narnia - The Lion, The Witch and The Wardrobe (Heat Haze Effect of the fire):
This will certainly help AMD GPUs on Windows but it does help NVIDIA GPU users too as the default behaviour was to stall (essentially wait and stop for new instructions) which caused bad performance.
These charts below lists 3 different systems that will give you an easier way to tell how much it could help:
This new behavior improves the software renderer quite drastically in a good way and handles it more specifically in certain situations. I would even state that the software renderer has never been this fast even compared to the older stable versions as I always tried to play True Crime NYC and never got full speed in the software renderer and now it’s handling it with ease.
In some scenarios, the ZBUF or FRAME values may get set to invalid data, but on draws where they are not getting used. Before this would completely ignore the draw which caused problems, however this lets it continue since the data it needs is valid.
This pull request fixes several games with the wrong colors, which depended on the CLUT (Color Look-up table) which basically is a value corresponding to a color palette an example being the RGB values (255, 0, 0) which means full red color, for example. The issue arose around the fact that the 256 colour palette is split in to 16 chunks of 16 colours and it can update it at any one of these chunks, unfortunately memory writes only really worked if it was updated on either chunks 0, 3, 7, 11 or 15, others would break down. Further to this the behaviour when it was asked to update the end of the buffer only was incorrect (The dirt in GTA: San Andreas relied on this!), which has now been corrected.
A few examples but not limited to:
Fixes a bug where ‘f’ would be incorrect for the blue channel when fogging was enabled on the rendering of a sprite on avx2 x64.
Like you see this purple grass on Hitman which everybody should agree isn’t realistic and is a bug:
The handling of MTBA was previously not handled very well and was somewhat of a mystery, we have since discovered the correct behaviour for when it can be triggered through restrictions in texture sizes and formats, which this PR attempted to correct, enough to fix Parappa The Rapper 2 and Ape Escape 3.
Actually this pull request is fine, but we discovered later is that the formula for updating the MIP addresses was incorrect, but also the pending draws needed to be flushed if these new MIPMAP values were different. It does remain that hardware mode has difficulty handling mip-maps.
Large floats are not handled very well in the software renderer due to the range being limited by signed integers but also some precision is lost by being single floats which only have a precision of 24bits. This PR makes it so flat triangles are treated like sprites and the Z values are passed as integer so no precision is lost, which fixes games which use flat triangles to draw UI/2D screens.
The calculation of how to handle texture sizes wasn’t perfect and would cause graphical issues when upscaling such as Final Fantasy X.
In the last progress report (Q3 2021) there have been improvements to how Edge Anti-Aliasing works for the software renderer (lines and triangles type), this time the hardware renderer has also been improved for several games such as Doko Demo Issho series, FIFA 2002 and other unknown games.
However it has only been fixed on the lines type but not the triangles type which is used a ton for a game like Final Fantasy X. Hopefully in the future we can get feature parity with the software renderer which handles both types correctly and while the issue on hardware renderer is about the same it will be look worse in the severity factor.
This will improve the blending behavior on default settings. Blending affects many things such as the lighting, shadows and more.
In the past all the hardware renderers like OpenGL, Direct3D11 all had their own code locations which causes a lot of duplicated code and if someone made an improvement to one they shouldn’t forget improving the other hardware renderers which causes code debt if they do.
Now they all will share more code with each other.
Misc Improvements #
Instead of the absolute path C:/User/Documents/PCSX2/ELF/test.elf, you can do things like ELF/test.elf instead.
IPC is a generic name for this function so PINE was chosen as it’s replacement. Especially when it’s already useful for RPCS3 and other potentially other emulators or programs.
In the times always knoweth, there lies an evil power known as entropy but one angry kot has arisen to the challenge to bring order to chaos. Okay, maybe not as epic as the first sentence makes you believe but CK1 made sure to automatically bring a subfolder per-game instead of one blob (root) of savestates inside the ‘sstates’ folder.
What does it solve you wonder? Well, my dear reader if you do have several games and you make a lot of savestates in multiple slots (Don’t look at me) you will gain the ability to have an easier overview which ironically kinda mimics how a real memcard structure works. If you want to compare it, just save normally on your memcard save and make sure it’s a folder memcard to see the differences and similarities.
Makes sure that the hotkeys still retain their function after rebooting.
Preset 4,5,6 (Preset 1 is bad too to be fair) were removed as they only brought specific improvements for specific hardware and it wasn’t good in most cases anyway as it just did some random EE cyclerate and cycleskip.
PCSX2 shouldn’t obfuscate with mostly useless settings that will only appeal a minority.
Personally there should be only 3 modes for people:
- Preset 2 which are the default settings (good for weaker computers that don’t have enough cores)
- Preset 3 which is just Preset 2 + MTVU (it is free performance for the taking for good hardware)
- Custom global and custom per-game settings (any other situation that does not benefit the other two above ones)
Since all these major timing changes, this isn’t really useful and broke more often than not. More often than not downclocking the EE Cyclerate will give better results for the more lower-end hardware.
Though even in it’s current state Cycleskip 1 and 2 will have decent results for Shadow of The Collosus which didn’t run full speed on the ‘PlayStation 2’.
The main window will now say what preset you are using.
This is preparation part 2 to bring in the new Qt GUI.
If you moved or renamed your ISOs, you either had to nuke the recently played list, ignore it or set it back how it was before. More granular control on how you want to handle latest games played.
While not as exciting as a new fix for a game, increased performance on better compatibility this will make the text more aligned with the rest of the windows. Though it did have a meaningful change that the maximum audio latency is now 200 instead of 750.
The default is still 100 ms (0.1 seconds of audio latency) on Xaudio2 but technically lower with Cubeb if you read that section and if you really need 750 ms (0.75 seconds of audio latency) then it’s unlikely you will have a good experience on PCSX2. You may want to consider upgrading your computer in that case.
Can you spot the differences?
Doesn’t need much explaining as it was gone by accident to show the keybinding for aspect ratio. Emulation development can have it fair share of regressions.
The game had wrongly colored eye textures (yellow/blue) but are now correctly white:
The new GUI is moving along very well, but is not at feature parity as the current WX-Widgets GUI is. Please be patient when it will be released to you guys.
#4843 PGIF: Remove Force Fifo Clear on GP1 (00-01)
See you in our next progress report that is the first quarterly of 2022.
(dev1838 to dev2185) (2021-10-01 - 2021-12-31)