Armor3d capability questions

david_j · May 20, 2019, 1:46am

Thanks for your answers… I’m not daunted if Armory can’t do all of what I need, as I’m willing to put dev work (and money) into improving it. I’m just trying to understand whether its reasonable / viable to get where I’m going…

For example…I would really prefer to keep my modules (like cloth) as fully native code. The cloth sim is threaded, using OpenMP… and jumping through hoops to get it to do native shared memory threading through Emscripten pthreads sounds like a world of pain. I don’t care about losing web-player support. Is this viable? Or is it just too difficult given the way Armory3d is built around Haxe?

On a similar note… is Armory3d entirely single-threaded because of Krom/Haxe? Does it have notable garbage collection pauses?

I was naively assuming that Armory3d was a more typical C/C++ core engine, with Haxe for scripting… but looking through the code, it appears to all be built in Haxe. That certainly has it’s advantages, but for some “heavy lifting” stuff, fast native threading is really important.

BlackGoku36 · May 20, 2019, 5:35am

Iron is core engine written in Haxe Kha, see Armory3D’s architecture. Yes, it have multi-threading and is really fast (Legend say that Krom target is sometime faster than C/C++ target)

zicklag · May 20, 2019, 2:55pm

Definitely checkout the Armory3D architecture link that @BlackGoku36 posted to understand a bit more about the way everything is put together.

As far as your cloth simulation library and not compiling to JavaScript, that is still a possibility, you would just have to compile your own version of Krom, the technology that runs Armory games during development, and provide JavaScript bindings to your library for it. That would be going out of your way a bit more to make that happen, but it isn’t a huge deal.

You can still use Krom during development and export the game to C for the production build. If you want to avoid Krom completely you are going to have to wait for much longer build times because you have to build the generated C code every time you change something.

zicklag · May 21, 2019, 3:51pm

I just checked out ArmorPaint, and it uses a fork of Krom to provide native file dialogs and Direct3D11 support, so that is a good example of Extending Armory to use native libraries for a Krom application.

Also, a while back @RobDangerous, the author of Kha, mentioned that he could probably add mechanisms to Krom for loading native libraries, which would mean that you wouldn’t have to fork Krom, to add those features.

For your cloth simulation library, it might be slowed down by the calls from JavaScript to C++, but if it was a problem, you could still do away with JavaScript for your production build with the C export of the game and only use Krom during development.

RobDangerous · May 21, 2019, 6:23pm

Sorry, but multithreading in Krom is very restricted - it’s JavaScript after all.

david_j · May 21, 2019, 6:28pm

I read the Armory3d architecture page, and dug through the code, and I’m amazed by this Haxe based technology stack.

However, there are some sticky and intertwined issues that affect the performance of data exchange across the boundary between C-code (native or WASM) and Haxe code.

Similar issues occur in thunking to other managed languages, but in other engines (UE4, Blender, Source, etc) the core of the engine is written in the low-level language. Having the core of Armory3d / Iron / Kha written in the high-level language complicates efficient integration of low-level modules.

This is especially important for a cloth simulator, because it needs to see all meshes and acceleration structures (aka BVH/kdtree) because it needs to do collision detection for all collision targets during the cloth solve. This is also stuff that effects haxebullet physics.

After my digging, I’ve decided I don’t think it’s practical to use a C cloth simulator with Armory3d, because of the data-marshalling issues… and I don’t think it’s practical to expect good multi-threading out of Armory3d, because the Krom target doesn’t support it, and the CPP/native target has a very very poor garbage collector.

Probably it would be more practical to convert my cloth simulator to Haxe+GPU compute. This might be a decent route for me, because the way Armory3d works with Blender scene files is extremely advantageous for my application - because it will be a very long time before the UE4 editor is as good as Blender.

Here is a summary of the detailed facts that lead me to this conclusion. I welcome any corrections or thoughts…

(1) Armor3d / Iron / Kha allocates the scene data structures in Haxe, which turns into Javascript GC heap objects in the Krom target. Mesh data is allocated by Kha as a Javascript “Int16Array”…

(2) Inside Krom/V8, WASM code (aka Emscripten compiled C code), can only see data allocated inside the WASM heap. This means to see any of the above scene or javascript mesh arrays, they need to be copied into the WASM heap, causing performance cost and memory bloat. This makes a C/Emscripten compiled cloth simulator pretty impractical.

For details, read Most performant way to pass data JS/WASM context · Issue #1231 · WebAssembly/design

(3) Integrating a C cloth simulator into Krom as a native C library could avoid copying mesh buffers, but there is still a big impedance mismatch, because it will have to use V8 APIs to traverse the scene data that lives in Javascript GC objects.

(4) CPU side collision detection also requires seeing animated mesh results, and mesh acceleration structures. Something needs to get the skinned mesh results into CPU memory (whether computed on the CPU or GPU), and update CPU memory acceleration structures (like a BVH). If this data lives in Javascript/GC memory, then it’ll have the same issues as described above to be seen by C/native or C/Emscripten.

(5) As a result…it seems most viable with Armory3d to write a cloth simulator in Haxe, so it has most efficient access to the data. However, unfortunately, Armory3d on Krom is single threaded, because Haxe/Krom compiles to Javascript with GC heap objects, and the Javascript V8 GC heap is single threaded. This would mean the cloth simulator would also be single-threaded, which is pretty painful. (Likewise, single-threaded CPU mesh animation skinning would be pretty painful)

(6) For Haxe code to multi-thread with shared heap objects on Krom, it would need to stop using the Javascript GC heap. This would basically mean running the Haxe CPP target on WASM, and relying on hxcpp’s inferor garbage collector, instead of the excellent (but single threaded) V8 garbage collector. My understanding is that this is not currently possible, as hxcpp currently does not generate Emscripten compatible code.

(7) The Haxe cpp target running native can support threads, and can share data buffers by reference with C, but relies on it’s very poor internal garbage collector. It would still have the same Haxe data thunking issues as C/native/Krom.

zicklag · May 21, 2019, 7:25pm

It looks like you did some good research and it sounds like you’ve got a pretty good understanding. These are my thoughts so far ( the numbers not related to the numbers on your post ):

Armory actually uses the HashLink to C ( HL/C ) target of Armory, not the hxcpp target. They’ve got a different garbage collector, but I don’t exactly how they compare. You would probably have similar problems so that doesn’t necessarily fix anything.

Note: I’ve also heard that the comment on the HashLink site saying that it outperforms V8 is inacurate.

It might not matter that the cloth is slow on the Krom build if you can export your project to C and everything will be native code. Still I don’t know that that would fix everything for the C build because there still has to be calls from the garbage collected HashLink C code and your non-garbage collected C cloth library.
I think your probably right about Haxe and GPU compute being the best option. That is how ArmorPaint achieves its extremely high performance painting. It would probably perform faster than CPU compute with the native implementation, anyway, as long as it wasn’t too difficult to port.

Edit: If the cloth library needs to integrate with Bullet physics, then you might still have similar issues with memory transfer even if you wrote the cloth sim itself in GLSL because of Bullet being run as ASM.js. So far Bullet’s cloth simulation in Armory has ran OK with small tests, but I don’t think that anybody has stress tested it or anything.

RobDangerous · May 21, 2019, 8:14pm

The hxcpp garbage collector is not a very poor garbage collector, particularly when you enable the generational gc mode and data exchange is much more efficient than it normally is in those situations - pointers to Haxe-internal things can be directly pinned and accessed and the other way round it’s even easier (see for example Kha’s array buffer classes). The hl/c garbage collector currently is indeed pretty slow though, also C/Haxe data exchange is still limited compared to hxcpp.

david_j · May 21, 2019, 9:26pm

@RobDangerous - it’s nice to know there is an experimental generational mode. I still don’t expect it to compare to V8’s collector, which is arguably one of the best GCs on the planet, next to Hotspot and Azul-Zing. But it’s nice to know they are improving it.

There shouldn’t be any need to pin things with hxcpp, as it uses a non-copying conservative collector. Pinning would only be relevant for interfacing with a precise copying/compacting collector like V8/Krom.

How does accessing Haxe structs / objects from C++ work? Are Haxe fields translated into C++ fields by the compiler? If so, this could be a big advantage of using a non-compacting conservative collector, as C++ code could read the Haxe cpp header files, and walk Haxe data-structures “natively”. Though I doubt this is a supported pattern, because it would only work for hxcpp, not HL/C or Krom.

david_j · May 21, 2019, 9:34pm

Does the current Armory3d bullet support Mesh Colliders? Or only collision shapes? If it does, how the Mesh data get transferred from Armory3d/Krom into ASM.js? I couldn’t find this in the code. The haxebullet mesh stuff seems to just be empty stubs.

If these mesh-data copying issues become a problem, another possibility to consider is using a Haxe physics library (maybe this is what OimoPhysics is to become?).

There is also a C# to Haxe transpiler, and while I doubt the amazing bepuphysics2 would transpile very well, there might be useful pieces of bepuphysics1 that could either be ported or transpiled over… like the excellent BEPUik.

zicklag · May 21, 2019, 9:39pm

It looks like the code for creating the Bullet mesh shape is here.

RobDangerous · May 21, 2019, 9:43pm

Sure, the V8 GC is better but I’d argue there’s some area between very poor and best of the world. The hxcpp GC optionally supports compaction by the way. And yes, Haxe fields are translated to C++ fields, you can access fields natively (and Kha uses that) and even just stuff some C struct into Haxe classes then later read it back in C, again directly (Kha uses that intensively).

RobDangerous · May 21, 2019, 9:54pm

To add one more tidbit to that - even with compaction turned on one doesn’t always have to pin objects because when the garbage collector is running can be controlled completely from the C++ side of things.

david_j · May 21, 2019, 10:02pm

@zicklag - thanks… That code is not just copying the whole mesh (duplicating all the data), but it’s doing it very slowly, by using addTriangle() on every triangle. That seems viable for small to moderate static collision meshes (like a non-streaming level mesh), but not for handling collisions with animated / skinned meshes which change every frame. This is fine for games with small to medium size level meshes and which use armature translated collision shapes for avatars, but for my application I need precise mesh collisions with avatars. That said, I don’t really need bullet, so for my app there might be a different way to skin this cat.

@RobDangerous - That’s interesting, the Haxe Manual just mentions conservative-stop-the-world, but now that you pointed in the right direction I do see some patches and command line flags referencing generational and compaction (which must mean some type of non-conservative precise collector). The only documentation i could find is the Threads and Stacks section of the Haxe Manual, I would update it with some GC details, but it’s written in LaTex so it lacks a wiki-edit/preview capability.

RobDangerous · May 21, 2019, 10:07pm

hxcpp is lacking in the documentation department, particularly in the details. I only know those things because I regularly move around in the sources and sometimes talk to Hugh. But there’s lots of good stuff in there for performance and interop, even something like http://api.kha.tech/kha/simd/Float32x4.html is possible without much effort - pretty much unthinkable in something like JS, Java or C#.

zicklag · May 21, 2019, 11:51pm

Unfortunately, even if hxcpp could work, @lubos dropped the hxcpp target in favor of HL/C for Armory. I don’t know what the reason was. It may have had something to do with the C++ binding generator for HashLink.

david_j · May 22, 2019, 4:11am

…this is a side-point, but you’re mis-informed about that “unthinkability”…

C# has supported SIMD for quite a while. It was available in Mono.simd.dll back in 2013, and was standardized into System.Numerics.Vectors in the windows 64bit RyuJIT backend since ~2014 and in mono since 2016 with . Bepuphysics2 has insanely good peroformance (video) by making heavy use of the new S.N.V SIMD. In Unity, it looks like it’s possible to use the old Mono.simd.dll on win/mac, but I don’t think SIMD works in their new IP2CPP backend (required for iOS and web).

Java also has some support for SIMD, as it’s SuperWord optimization can automatically inject SIMD operations in some cases, but it’s not possible in Java to explicitly use them.

Chrome/V8 had an experimental SIMD.js which was removed in favor of shifting to [WebAssembly SIMD], (6020 - v8 - V8 JavaScript Engine - Monorail), which is bleeding edge, but apparently working. While some users preferred the former because it was usable from “normal” javascript, in practice it was very hard to structure normal javascript to benefit from it, which is why they shifted to a WebAssembly centric implementation.

While SIMD is often important, equally important is using cache-efficient value-type struct arrays, instead of “randomly heap allocated” OO object arrays. You can see an overview of the issue in the “Value Types” section of this BepuPhysicsV2 article. To quote the article…

C# expresses packed struct arrays a bit more naturally than Java or Javascript… Though Java structs/junion, looks half decent, and Javascript has TypedArrays. I don’t know if Haxe automatically uses TypedArrays for struct arrays in the Krom target (i suspect not).

RobDangerous · May 22, 2019, 6:12am

That’s not my point. My point is, Haxe/hxcpp does not have simd support at all. And yet I could just easily add it.

zicklag · May 22, 2019, 9:37pm

So, to sum up the current findings, writing a cloth sim for the GPU in Haxe is probably the best option for performance, but we would have to find another way to manage physics, because the performance is not going to be good still, if we have to transfer the collision meshes back and forth from the WASM Bullet build to the Haxe -> JavaScript cloth simulator when deploying to Krom. Additionally, hxcpp has some potential for making the data communication between C and Haxe efficient enough, possibly, but Armory isn’t even using the hxcpp target anymore, so that kind of defeats that.

This has opened up a concern for me, in general, because my team hope to one day make potentially large scale games with an Open Source engine, and Armory seems to be the best available option for us. Having to rely on native libraries for certain functionality and yet not being able to communicate efficiently with those libraries could be a serious problem.

I’ve been trying to figure out if there is any solution for Armory to handle this any better, but it seems like the only option is to have Armory core written in a language without a garbage collector. I know it would be a huge change, and I’m not saying that it is necessarily the right thing to do, but I have been trying to figure out whether or not it makes sense to rewrite Iron and Armory in Rust.

We could still utilize Kore just like we are now, and we could even still support Haxe traits by embedding JavaScript, similar to Krom except that the whole of Armory would be native code.

The thing I’m afraid of is that I have seen plenty of small examples with Armory, and examples that show off its amazing graphics capabilities, but none of the examples I have seen are ones that would really exercise the CPU or demonstrate being able to handle large scenes with lots of objects and lots of interaction between the objects in the scene. Could Armory effectively support a large game that could compete with Unity or Unreal while running on HashLink and/or JavaScript?

Rust has the efficiency of C and C++ while eliminating most of the memory errors and security vulnerabilities related to memory errors. It really makes efficient and concurrent programming approachable to a much larger audience and increases productivity. Additionally, Rust compiles to WASM, so we might not have to give up the web target, which too many people are using at this point to justify getting rid of it for good.

I’m still mulling over this and I want to get your guys’s input on it if you have any. I could be missing something major that makes this a completely bad idea, but I want to figure out what will be best for Armory in the long term. @lubos it would be good to get your input on this too if you can find the time.

david_j · May 22, 2019, 11:18pm

@zicklag - I recommend you take a careful look at your goals and prioritize. If your goal is to work on a game engine, Armory3d needs it. If your goal is to build games, it seems much better to use a game engine that can already build successful games, like UE4 or Unity. If you insist on using an open-source engine, there are others further along, like Xenko. It’s written in C# and very similar to Unity. Though it is still clunky and unfinished compared to Unity or UE4.

The exact same thing is true of my project… My goal is to get the project working, so I feel wrong even looking at unfinished open-source engines. I’m here because for my project, the ability for the community to author and share editable game assets as .blend files is such a potential game changer that I’d be willing to experience some pain if I could make it work.

As for your other points…

I don’t think a game engine has to be written in a low-level language without a garbage-collector. Every game engine is different, and has different goals and tradeoffs. And I must say, the amount of functionality present in even today’s unfinished Armory3d in a small code-size is pretty astounding.

Armory3d is using Haxe and Emscripten to have an almost surreal ability to deploy to any target (including HTML5/webgl)… and it has a deep connection to Blender, to get rid of the hell that is asset-conversion and working with game-asset-editors that are feeble shadows of Blender. I conceptually like this direction.

I think Armory3d, even with Emscripten bullet copying mesh data, even with 100% single-threading, even with only the Krom target… can be a far better BGE than the BGE ever was. It’s an ultra-rapid prototyping system for relatively small games that you want to write entirely in Nodes and Haxe, and deploy everywhere trivially. This is a pretty good niche to be in.

However, when you go beyond that niche, what you get is a confusing paradox of choice, and Armory3d pulled in different directions. Every Haxe target has different complexities and limitations. Krom doesn’t multi-thread Haxe code. Emscripten causes extra copying/marshalling. They all use different GCs, and have different ways of integrating C/C++ code.

One problem with this is understanding it. I have enough experience in low level runtimes to come up to speed on these details quickly, but for most people, this is going to be baffling.

Another problem is integrating foreign code. If you want to integrate non-Haxe modules, you have to pick a method, which effectively means picking a target… so some of the target neutrality promise goes out the window.

Another issue is that optimizing Armory3d for new scenarios is target dependent!

For example, one person is posting on the forums about troubles using a really large terrain mesh. You can cut it up in pieces and use LOD, or you can make it a ROAM heightmap, or whatever… but if the data all lives in Haxe land, and you want to use Emscripten bullet to make a character walk on the ground, there is going to be copying and duplication. What do you do about this?

It feels like this decision depends on the target you care most about. If you want to keep target flexibilty, and efficiently share Haxe mesh data and acceleration structures, maybe you switch to a Haxe native physics implementation like OimoPhysics. Or, if you want to heavily support Emscripten bullet, maybe you want to design a mesh layout and allocation scheme that locates these assets in WASM memory always. Or maybe if you want to keep things simple, maybe you just accept Armory3d isn’t going to be good for very large mesh colliders and send the guy with the huge terrain mesh to a different game engine.

My point is that I don’t think Armory3d can follow all these paths simultaneously, it needs to prioritize where it’s focus lies.