Should I continue making my own VM, or scrap it for some preexisting solution?

Should I continue making my own VM, or scrap it for some preexisting solution?
from ZILtoid1991@lemmy.world to programming_languages@programming.dev on 09 May 2024 21:35
https://lemmy.world/post/15218195

After getting angry at Lua for bugs reappearing after fixing them, and otherwise having issues with making it interoperable with my own language of choice (D), I decided to roll out my own VM out, with some enhanced optional features compared to regular Lua (type safety, static arrays on the stack, enhanced metatables, etc.), and also allowing the potential of other scripting languages to be ported to the VM. As I already have crafted a VM for a programmable MIDI format (M2: Docs / Implementation ; bit incomplete as of now), I thought it’ll be easy, just don’t optimize this time entirely around preallocation (M2 has 128 not entirely general purpose registers per pattern (musical thread) + 128 shared registers + shared arrays), add stack handling, add heap handling, add more types, etc.

Thus I begun working on PingusVM, to contribute to the problem of ever growing number of standards.

However, as time went by, I had to realize (just like every time I’ve added another engine feature) that it’s way more complicated, especially as I have realized mid-development that I had a lot of oversight on design. Not only that, I have a lot of other projects, such as the game engine I’ve originally intended the VM for. My main issue is, potential candidates either lack integer support, or are very bloated (I don’t need a second XML DOM parser, etc). Lua kind of worked, but after I fixed an issue (which was hard as every tutorial implied you just wanted to load a file directly instead of having the ability of loading the text directly) a very similar one reappeared, while following every tutorial possible. Others would lead me to introduce some C-style build system, as they would need “hard linking” to my engine, and that “hard linking” is why I had to halt development of my image library, as I would have needed to introduce a build system into my project for the LZW codec (required for GIF files).

#programming_languages

threaded - newest

tedu on 09 May 2024 22:17 next collapse

Sounds like you want to write your own interface to Lua. Surely writing the load function you want is easier than an entire VM.

sxan@midwest.social on 09 May 2024 23:31 next collapse

I’m not sure that you’re describing the same thing, but this very experience you’re having with Lua is what drove me away from Ruby, and ultimately from all non-statically compiled PLs.

Instability in the VM, but more so in libraries, meant upgrades became projects to fix things that broke because libraries introduced regressions, API changes, and new bugs. For any non-trivial application, something would break. This was entirely unacceptable for services; entire suites would go down because of a regression in one popular library, and you get to find these things out at runtime.

Eventually, I went back to entirely statically compiled languages - I’d encountered the same usage with bytecode VM languages like Java, to a lesser degree. But with PLs like Go, once I have a binary, it’ll work until someone breaks libc. Not impossible, but that happens years or decades apart, not monthly like it did with Ruby.

Now I make do with zsh, or bash if I plan on sharing, with help from the usual ask/ser/grep crowd. If I start feeling cramped, I know my program is getting too big for scripting and switch to a reliable language. The short-term convenience is not worth the long term grief.

ZILtoid1991@lemmy.world on 10 May 2024 06:22 collapse

Once I fixed the load function (which in my case, involved looking up how that function worked and what I needed to call to get things working), I had a very similar issue without any error messages or whatsoever, things just didn’t work, and before that, I already had issues with using Lua functionalities like Metatables, due to the clunky nature of its API.

I need at least some bytecode, since I need some way to get some OS and CPU-independent loadable code (scripts). Sure, I could use DLL files (and their other OS equivalents), but then I introduce a massive hurdle of having to compile scripts for each CPU architecture and OS combinations (as well as potential for more serious security issues), even if it had a lot of speed benefits.

I think I’ll try to continue pushing forward with it, but restrict my time spent on it as a “lower priority” project. I did try to look up whether there’s a way to use LLVM as a host for scripts, because no way I’m going to touch WASM with a 10 meter long pole (TL;DR: I have started learning programming during the whole “web app” and “the web is the future” craze, seeing other libraries for “app portability” and the teaching optimization being abandoned in favor of electron and “there will be a big strong server on the backend” really broke my heart.)

sxan@midwest.social on 10 May 2024 14:23 next collapse

I don’t have enough experience yet with WASM; ultimately, it’s bytecode, right? So the same rules apply. Note I didn’t make an exception for dynamically linked programs; the more components in your stack that are variable, the more instability you have.

But it sounds like you’re doing a sort of plug-in system, and that involves instability by design. Using WASM seems far easier that implementing a bespoke Lua VM.

bitcrafter@programming.dev on 10 May 2024 14:59 collapse

because no way I’m going to touch WASM with a 10 meter long pole

I think that you should look into WASM a little more closely because it is not web-specific at all; it is more like an alternative to the JVM that is a bit lower level and designed to be interpreted/JIT compiled more efficiently. You do not need to embed a web browser or anything similarly heavy into your app to use it; you can just use via Wasmtime, which is a library written in Rust with bindings to other languages that is officially supported by the maintainers of the WASM standard.

ZILtoid1991@lemmy.world on 11 May 2024 13:19 collapse

Thank you, I think I’ll instead write a D language binding to wasmtime. I did a few such projects in the past and only gave up during an attempt of making one for Pipewire (that thing is massive with way more files), I think I can do it yet again with wasmtime.

porgamrer@programming.dev on 10 May 2024 12:07 next collapse

If you want to do anything other than long-term blue sky VM research, don’t write your own VM. That’s my advice. Same goes for programming languages, game engines, etc.

Always do the unambitious thing that seems like it should take one weekend, and probably set aside a month for it >_>

Also admit to yourself that things like bloat and binary size are not real problems, unless they intrude on your daily workflow. They are just convenient distractions from harder tasks.

I say this as someone who is constantly failing to complete any projects for all these reasons.

ChubakPDP11@programming.dev on 10 May 2024 14:36 collapse

Are you specifying everything beforehand? If not, I’d recommend locking in on an ISA with stack effect pre-determined. Also, minimize as much as you can.

First off: Read Xia-Feng Li’s excellent book if you have not.

Then.

Here are some minimization tips I am doing for my RuppVM. I just began working on it less that 24 hours ago. But the tips are based on over 6 months of off-and-on trying to make a VM, and failing. I am sure this one will take, because of these minimization efforts:

Everything is a word: When you are reading the bytecode stream, read word-by-word, and by word I mean machine word. Don’t waste your time on floats, you can implement IEEE-745 floats as a module, it will be good for practice. There’s a good book for it here.
No complex datatypes, BLESS!: So I said everything is a word, what about arrays? Structs? Basically, you need to ‘bless’ chunks. Blessing means taking a routine and turning it into a structure, see: Perl.
No OS-level threads, BIND!: Just make ‘green’ threads, as I am doing in RuppVM. You can use the FFI to bind them to OS threads.
Stop the World GC + Arena Allocation: Don’t waste time on intricate GC, Just do stop-the-world mark and sweep, on arena-allocated memory (see my code).
Super Basic FFI: Take a look at my ISA.txt file, look at what I am doing for FFI. You don’t need intricate type mappings for the FFI. Just register a ‘hook’ to an address in memory (or ELF).
Avoid JiT/AoT for now: Don’t focus on it at the beginning.

These variables are not exactly portable, but you can use them, abuse them, etc:

extern etext -> First address past the text segment;
extern edata -> First address past the initialized data segment;
extern end -> end of bss

I think there are portable libraries for handling ELFs and PEs. You could also write your own Assembly code for it. For loading functions from files, for the FFI, these help a lot.

Another aspect you should implement is a ‘signal trampoline’. This trampoline will help handle signals from the OS, and hook it to your green threads.

Now, don’t take these as gospel. I am sure you know most of these already. But in case there’s new info, feel free to adapt it.

Star RuppVM, I will be adding a lot of stuff to it soon.

EDIT: I looked, there does not seem to be any ‘portable’ libraries for handling PE and ELF with the same interface. I guess ther can’t be, two different beasts.

EDIT 2: The FFI could prove to be much more complex than I thought? There’s libffi though.