In the eye of the Occluder


As I’ve concluded my experiment of making games with modern engines for older low-end hardware used in retro handhelds, specifically the GameShell (released in 2018 and its CPU launching in 2011) I’d like to share my findings and thoughts in this post mortem.

Changelog

This last update brings some major FPS improvements thanks to occlusion culling via Godot’s Occluder node. This was by far the biggest performance gain I was able to achieve, everything else pales in comparison. The second minor tweak being the inclusion of VisibilityNotifier for animated flicker lights which toggles animation and visibility of the light based on distance to the player and the light being in visible on screen. This game about 2-3 extra FPS, not much, but I’ll take it.

TLDR;

Running this demo on such low-end hardware is a testament to how well done and portable Godot really is. The fact that I could achieve these frame rates with stock tools and without a need to change any of the engine’s source code speaks volumes. It also provided hands on experience with optimization tactics, monitoring tools and feature A/B testing.

Post mortem

First of all this was never meant to be a game. Treat it as a throwback to the 90s demoscene. My sole goal was to get a 3D project running on this low-end hardware and although the result exceeded my expectations it did leave me wanting more, I couldn’t help but wonder how much better would it all run on an R36S or RG35XX+ or any other modern handhelds. After getting my initial attempt of running a 3D scene running on the device I set out to see how much can be done while maintaining acceptable frame rates. It was an interesting challenge given the fact that retro-style games often times completely neglect any kind of optimizations simply because even a ~10 year old mid-tier setup is an overkill. It was also interesting to see how architectures differ and how one thing might be an optimization for desktop may actually hinder performance on a system on a chip.

Even though I’m not a major FPS fan, this genre had always been one to push the bar in terms of graphics so it would only make sense to adhere to the tradition.

My first try was quite devastating, running at 7 FPS just for a few walls and floor tiles, but I was soon to discover that even though some assets look retro, they are far from it. Even though the device can handle quite a large poly count, as discussed further, there is a limit and having 2k polygons for a single wall tile seemed to be a tad much. It got better fast though, after I’ve found the proper assets which weren’t perfect for the level of optimization I was going for, but enough to get the ball rolling and continue my investigations.

Polycount

It is common knowledge that the fewer polygons there are, the faster it renders. But as I’ve found out it’s not necessarily true and it doesn’t scale proportionally. There’s a lot more going on within the engine and the GPU than I could describe here. For instance I’ve found that rendering a single high poly object is more resource intensive than several of a lower poly count (even without GPU instancing and culling). Overall the Mali 400 GPU seems perfectly capable of rendering 5-10k vertices at good frame rates. This is quite a small number, as in general your level might contain way more vertices, even when considering frustrum culling. Occlusion culling is the best optimization one could ever make, simply because it does not require you to sacrifice anything in return when compared for instance with baked lightmaps or unshaded materials. The FPS gains can’t be measured as easily and will vary from scene to scene, but it’s a must have, especially for in-door scenes and using Godot’s Rooms/Portals as they do quite a bit more than the basic Occluder node. It would also be beneficial to split your meshes for in/out doors as well as parts that may be occluded, like rooftops. Another obvious suggestion would be to use LODs, this is something I haven’t given a try though. And there is a chance of it requiring more effort than the performance gain provided, unless LOD generation could be automated.

Lighting

Dynamic directional light does not work well on Mali 400, it causes a plethora of artifacts and weird bugs like meshes rendered without proper perspective, color, textures to being outright rendered twice and rotating like crazy. This could also be linked to the GPU driver, but there’s no way to confirm. Baked lights work really well though and yield a beautiful result, although are quite costly when compared to using plain textures. Thus unless you really need to be able to adjust lights in the engine, rebake them often or need your dynamic objects to read light data from the light map - you are better of baking them in Blender as a plain texture. This is one of the biggest optimizations you can do which from my tests gave a 50-60 FPS boost. As far as dynamic omnilights go, they are much better, they don’t seem to be as resource intensive as

I thought, in fact disabling them altogether gave little to no performance increase, given there are no other dynamic objects (i.e. NPCs or moving platforms) that would interact with them. Thus you can get a really cheap yet nice effect of having your static objects using unshaded textures with baked lights from Blender and positioning your omni lights to mimic the light setup for dynamic objects (like the shotgun in this demo).

Textures

This was probably the biggest surprise of them all. I was convinced that using a larger atlas texture and a single material for all the level would’ve resulted in huge performance gains. Alas, there was no benefit and in fact I saw 1-2 frame decrease in some scenarios. Having smaller texture sizes provided no benefit either, going from 512 to 32 pixels made no difference whatsoever. I have no explanation for this except perhaps the fact that the VRAM compression is doing its job well and that I didn’t have as many different materials to start with to see an improvement.

Audio

I was pleasantly surprised by how the device handled audio, it has a stereo speaker which immediately led me to try out 3d spatial audio and it was amazing to hear such effects on a handheld with seemingly no major performance cost, especially given that I was testing it with 48khz WAV file, mixed down to 44.1khz. Looping audio which played constantly as background music with footsteps and shotgun sounds on top of that was no problem for the device either.

Animation

I was happy to find out that animating various objects and their properties via Godot’s Animation player was rather cost effective, although I’d still recommend using the VisibilityNotifier to stop/pause said animations, this provided 1-2 FPS gains when compared against 2 flickering lights running at all times. Implementing the gun sway in GDScript turned out to be quite cheap too.

Memory

Running top I was amazed to see that I had 200-300mb of RAM available to me, that’s huge given the fact that my version of the device is limited to 512mb and runs X11. Although I did notice some crashes when I was pushing it a bit too far in terms of level size or number of objects. Meaning that you wouldn’t be able to get away with making a scene containing your whole level, but rather split that into several ones and load them dynamically.

Conclusion

Overall I am extremely impressed with GameShell and Godot although it felt a bit like a cliff-hanger, you are so close to getting extra features in, yet realize that it would be too much for an actual game. Should one be making games for such devices, even those that are more powerful? Perhaps as a hobby or as a reward or marketing gimmick to support your actual project. Otherwise the market is too small and geared towards retro emulation to justify the level of effort. That being said I will be definitely considering the option of making cut-down tie-in games or demos for my future commercial projects running on handhelds.

Files

GameShell x32 25 MB
Version 2 Jan 28, 2024
Linux x64 24 MB
Version 2 Jan 28, 2024
Windows x64 23 MB
Version 2 Jan 28, 2024

Get GameShell 3D Godot demo

Download NowName your own price

Comments

Log in with itch.io to leave a comment.

great stuff!

thank you for taking the time to share your findings with us.


I read somewhere (on reddit) that you were waiting for an R36S to arrive. 

if you take the time to explore its capabilities, would you kindly let us know?

I'm not very good when it comes to hardware, and I'm still very new to programming/game dev, but I'd love to know if I could export a Godot 4 project to a R36S. I'm doing a small project with my niece, and I think that hardware could be a nice, affordable, fit.

(1 edit) (+1)

Yep, it’s certainly possible, no doubt about it - using FRT. Also PortMaster was very helpful during my investigations, there are some docs on how to port existing Godot games. But given a particular device we know that it supports GLES3 and Vulkan (in theory), I assume the drivers are there to support these features and hardware rendering. AFAIK you have 2 choices when it comes to OS - ArkOS and JelOS, the latter seems to be running newer software (i.e. mainline kernel for RK3326 chips).

The gist of making it work is:

  • update device OS to latest version
  • download FRT either v2 or v3 it must correspond to Godot version
  • export your game as a .pck file
  • upload both frt and .pck file to the device
  • make the frt binary executable chmod +x frt
  • run the game ./frt frt.pck

Note that the file names are arbitrary, the only catch is that the .pck file should be named the same as the frt binary (i.e. foo and foo.pck)

The learning curve might be quite steep, so be prepared, but I’m sure you’ll emerge victorious, best of luck! :)

thank you for all the info!