Benefits of a join table vs. array to express relations? (outside of SQL)
from matcha_addict@lemy.lol to programming@programming.dev on 11 May 2024 18:15
https://lemy.lol/post/24796867

I am building an application that is using JSON / XML files to persist data. This is why I indicated “outside of SQL” in the title.

I understand one benefit of join tables is it makes querying easier with SQL syntax. Since I am using JSON as my storage, I do not have that benefit.

But are there any other benefits when using a separate join table when expressing a many-to-many relationship? The exact expression I want to express is one entity’s dependency on another. I could do this by just having a “dependencies” field, which would be an array of the IDs of the dependencies.

This approach seems simpler to me than a separate table / entity to track the relation. Am I missing something?

Feel free to ask for more context.

#programming

threaded - newest

marcos@lemmy.world on 11 May 2024 18:32 next collapse

Joins and tables are abstract concepts, they don’t dictate how you store data on memory or disk or how you read it.

If you want a specialized data storage, go with whatever format is easier for you to use. But also, the format that is easier to store is not necessarily the easiest one to work on memory.

matcha_addict@lemy.lol on 11 May 2024 18:37 collapse

Currently, I am storing entities in a JSON array / list. every element in this list corresponds to one instance of that entity.

I could express a many-to-many relationship as just another field in that entity that happens to be a list / array, or I can imitate a SQL join table by creating a separate JSON list to log an instance of that relation.

Are there any benefits to the second approach?

SzethFriendOfNimi@lemmy.world on 11 May 2024 19:24 collapse

Mostly storage space and ease of updating records.

Let’s say you have records of users who watch a TV show.

You could keep users as a key and shows as an array. Where each array entry is a record of the TV show title, release date, and other info such as time watched by that user.

In this case you’re duplicate the strings for shows like “Fallout” and the release date thousands of times. And then if there’s an update such as a title change or the streaming service or channel where it’s found you have to find those thousands of subrecords and update them.

Keeping a reference to another key/json file by some ID makes it easier to do such updates and reduces storage for that data. Except now you have to correlate that data when doing things like reports of what shows were watched by what users.

mox@lemmy.sdf.org on 11 May 2024 18:37 next collapse

Conceptually, the benefit of a join table is to allow many-to-many relations. That’s it.

If I understand you correctly, your relations are one-to-many, so a join table would just be needless complexity.

SzethFriendOfNimi@lemmy.world on 11 May 2024 19:12 next collapse

And to cover atomicity. Child records deleted when a parent record is, etc.

matcha_addict@lemy.lol on 11 May 2024 19:16 collapse

The list would still allow a many-to-many relationship. Let demonstrate:

entity A and entity B both have 2 members: A-1, A-2, B-1, and B-2.

we add a “relations” field to entity A, which is a list of IDs from B, describing the list of B’s that A is related to.

A-1 has the relations field as: [B-1, B-2] and A-2 has [B-1, B-2].

As you can see, this is a many-to-many relationship. Each of our entities is tied to multiple entities. So this is many on both sides, hence many to many

mox@lemmy.sdf.org on 11 May 2024 20:04 collapse

That’s overly complicated to my eyes, and not really relevant. The point I was trying to make is just that a join table is unnecessary in the situation you originally described.

matcha_addict@lemy.lol on 15 May 2024 06:40 collapse

What I described in the comment above is the same thing I originally described, but expanded.

A dependency relation can still be many to many (and in my case, it is). The comment above gives an example to prove it.

Toes@ani.social on 11 May 2024 18:55 next collapse

I’m wondering if you would benefit from using MongoDB?

matcha_addict@lemy.lol on 11 May 2024 19:18 collapse

The reason I am using JSON is so I can have a flat file, sorta plaintext. This way, the storage is easily readable by the user without any special tools, and can even be debugged or modified directly, or using a tool like jq. All this without the need for a heavy database engine, indexing, etc (I am not operating at a large scale). I dont believe MongoDB would be suitable for me based on this, but please let me know if you think I am wrong.

abbadon420@lemm.ee on 11 May 2024 18:57 next collapse

One example could be to add a value to the relationship, like a rating or a ranking.

For example a Movie can be seen by many Users and a User can see many Movies. A user can rate the movie they’ve seen between 0 and 10. So, the join table would have 3 collumns:

  • he FK for User
  • the FK for Movie
  • the numerical value of the rating by that particular User for that particualr Movie.
matcha_addict@lemy.lol on 11 May 2024 19:19 collapse

That’s a good point!

deegeese@sopuli.xyz on 11 May 2024 20:25 collapse

The JSON version of this is to store an array of relation objects which express the weights.

In my opinion the main advantage of a “join table” in your situation is the ability to look up the relationship from either direction while only storing a single copy of it.

If you store the relation in the object, becomes very easy for A’s relation to B to get out of sync from B’s relation to A.

andrew@lemmy.stuart.fun on 11 May 2024 20:56 collapse

The other related advantage is being able to update data about a given B once, instead of everywhere it occurs as a child in A.

kevincox@lemmy.ml on 11 May 2024 19:20 next collapse

There is no concrete difference between the two options. But in general they will be similar. I think you are talking about these options:

struct Person;
struct Skill;

struct PersonSkills {
    person: PersonId,
    skill: SkillId,
}

vs

struct Person {
    skills: SkillId[],
}

struct Skill;

The main difference that I see is that there is a natural place to put data about this relationship with the “join table”.

struct PersonSkills {
    person: PersonId,
    skill: SkillId,
    acquired: Timestamp,
    experience: Duration,
}

You can still do this at in the second one, but you notice that you are basically heading towards an interleaved join table.

struct PersonSkills {
    skill: SkillId,
    acquired: Timestamp,
    experience: Duration,
}

struct Person {
    skills: PersonSkills[],
}

There are other less abstract concerns. Such as performance (are you always loading the list of skills, what if it is long) or correctness (if you delete a Person do you want to delete these relationships, it comes “for free” if they are stored on the Person) But which is better will depend on your use case.

andrew@lemmy.stuart.fun on 11 May 2024 21:10 next collapse

The real primary benefit of storing your relationships in a separate place is that it becomes a point of entry for scans or alterations instead of scanning all entries of one of the larger entity types. For example, “how many users have favorited movie X” is a query on one smaller table (and likely much better optimized on modern processor architectures) vs across all favorites of all users. And “movie x2 is deleted so let’s remove all references to it” is again a single table to alter.

Another benefit regardless of language is normalization. You can keep your entities distinct, and can operate on only one of either. This matters a lot more the more relationships you have between instances of both entities. You could get away with your json array containing IDs of movies rather than storing the joins separately, but that still loses for efficiency when compared to a third relationship table.

The biggest win for design is normalization. Store entities separately and updates or scans will require significantly less rewriting. And there are degrees of it, each with benefits and trade-offs.

talkingpumpkin@lemmy.world on 11 May 2024 22:19 next collapse

Not sure I’m getting the issue here (what does “join table” mean in the scope of JSON/XML?), but… doesn’t how you lay out your data in JSON/XML file have zero impact in your application’s queries? You won’t be querying the JSON - you’ll be loading data from it into memory and query the memory.

matcha_addict@lemy.lol on 15 May 2024 06:41 collapse

I am simulating a database table as a json list. So a join table would be simulated also as a separate list (or separate json file).

solrize@lemmy.world on 12 May 2024 08:31 next collapse

SQL uses join tables to do what traditionally would have been done with dictionaries embedded in the objects. I.e. purpose made data structure. That approach is more efficient but less convenient than using a 3-way join.

Turun@feddit.de on 12 May 2024 17:04 collapse

It depends entirely on how you want to work with the data.

Have you considered sqlite? The database is just a single file, which gives you all the advantages of a text file (easy backup, sharing, easy editing via sqlite browser) while also providing the benefits of SQL when operating on the data (join, etc).

matcha_addict@lemy.lol on 15 May 2024 06:38 collapse

Sqlite is nice but the file would not be readable in a plaintext-like format from my understanding.

Turun@feddit.de on 16 May 2024 21:18 collapse

  1. Yes, but devil’s advocate: you also need a program to text files, needing a program to read sqlite files is not worse.

  2. I am confused by your requirements. Why do you need to store your data as json or XML? Would it suit your requirements to read in text files, convert to sqlite for processing and then save as a text file? What do you gain by being able to edit the files in a text editor, as opposed to a table editor? Do you maybe just need a config file (e.g. in toml format) and don’t actually do much data processing?