?

Log in

No account? Create an account

Previous Entry | Next Entry

On database crap

So, I'm working on the databases for my fanfic site.

Specifically, I'm working on the metadata for a story. Basically, a story has a fairly simple table:
  • id (integer)
  • owner (integer)
  • title (tinytext)
  • title_norm (char(32)—this is a version of the title munged to make a pretty URL)
  • description (tinytext)
  • disclaimer (tinytext)
  • note (tinytext—an author's note)
  • date
  • status (enum—in progress, completed, abandoned)
That table only tells a small part of the story, though. There's also a table of chapters:
  • id (integer)
  • storyid (integer)
  • chapterno (integer—0 for prologue, something really big for epilogue)
  • title (tinytext)
  • note (tinytext—a per-chapter author's note)
  • text (probably mediumtext)
Plus there's a many-to-many mapping to fandoms (each story can belong to several fandoms, to allow crossovers). I'll omit the table for fandoms—the only information besides names and IDs is used to fit each fandom into the overall hierarchy for browsing. And finally, there's a many-to-many mapping to tags:
  • id (integer)
  • type (big enum—things like "genres", "eras", "characters", plus "custom" for custom tags applied by authors and "user" for custom tags applied by users)
  • fandom (integer—0 indicates that it's not associated with a particular fandom, for e.g. "slash sex")
  • name (tinytext or char(something))
Basically, most of the metadata is expressed in the tags, which I think will make the search feature quite a bit more elegant. And the "custom" means that people can add specialist tags like "redeemed draco", and "user" means that users can help annotate stories. (I'm not sure about the "user" idea, actually, but I really want something like it.)

The problem is pairings or ships or whatever you want to call them.

They don't fit into the tag concept, because they involve a special relationship between two tags. After all, a story that has "Harry, Hermione/Ron" is rather different from one that has "Harry/Hermione, Ron". So I need to come up with something.

I see two ways to deal with this. One way is to create tags for every possible pairing. Except that this creates the situation best known as "combinatorial explosion". If there are a hundred characters in Harry Potter, there are 10,000 possible pairings—not to mention the people with really eccentric tastes, like "Harry/Ginny/Hermione/Parvati/Padma" (!).

The other way I see to do it is to create some sort of ships table, like so:
  • storyid (integer)
  • shipno (integer)
  • character (integer—links into tags table)
To explain how that works, given a tags table like this:
idtypefandomname
1character42Granger, Hermione
2character42Lovegood, Luna
3character42Malfoy, Draco
4character42Potter, Harry
5character42Potter, James
6character42Potter, Lily
7character42Weasley, Ginevra
8character42Weasley, Ronald
9character42Patil, Padma
10character42Patil, Parvati


A story with the full set of Portkey.org pairings (H/Hr, R/L, D/G (yuck), JP/LP) would have these pairings in the table:
storyidshipnocharacter
2411 (Hermione)
2414 (Harry)
2422 (Luna)
2428 (Ron)
2433 (Draco)
2437 (Ginny)
2445 (James)
2446 (Lily)


And that rather scary Harry/Harem thing I mentioned earlier:
storyidshipnocharacter
2511 (Hermione)
2514 (Harry)
2518 (Ginny)
2519 (Padma)
25110 (Parvati)


That actually would work, but it's a little messy and confusing. Is there a better way?

Comments

( Read 9 comments — Leave a comment )
joshlamont
May. 27th, 2006 07:30 am (UTC)
Create a code. Like, say, Harry/Ron would be ab or 1.2 and Hermione/Harry would be ac or 1.3, and maybe abc or 1.2.3 for the threesome. Then, when someone creates a search, have them check boxes or select three from a series of lists. Do a little php to parse a request of, say, 3, 1, and 2 into 1.2.3, and then search for that entry. That way, you can potentially create a search for ANY pairing. If you want to combine character listing with character shippings, then just change to, say, 1.3,2 for Harry/Hermione, and Ron.
acdragonmaster
May. 27th, 2006 07:39 am (UTC)
That's actually a really good idea.

Though, with any pairing code, is it going to distinguish between, for example, Harry/Hermione and Hermione/Harry? Not all fandoms (and not all authors within a fandom) make a distinction between that sort of thing, but it's still something to consider.
brentdax
May. 27th, 2006 08:31 am (UTC)
is it going to distinguish between, for example, Harry/Hermione and Hermione/Harry?

No. Where possible, the system tries to find "canonical" forms for things, so that we don't end up in a hell where there are stories marked H/H, H/Hr, Hr/H, Harry/Hermione, Hermione/Harry, etc. That's why I'm not going with a straight "type in your tags" system like LiveJournal has, except for custom and user tags—the genres and so on get bunches of checkboxes, the pairings get rather nifty multiple-drop-down-menu thingies. (Basically, you select "Potter, Harry" and it reloads a version of the form with another drop down menu to the right of that one; select "Granger, Hermione" and it gives you another menu; select "Weasley, Ginevra" etc.)
brentdax
May. 27th, 2006 08:23 am (UTC)
The problem is, that makes it very hard to directly select a list of pairings from the database.

To get the names of all the tags on a fic, I just have to do:
SELECT tags.name FROM tags, storytags WHERE storytags.storyid = (blah) AND storytags.tagid = tags.id

Similarly, getting all the data I need to construct a pairing list is a single SQL statement:
SELECT tags.name, storyships.shipno FROM tags, storyships WHERE storyships.storyid = (blah) AND storyships.character = tags.id

(After that, I just have to do a little Perl trickery to get an array-of-arrays representing the ships. I've written a module that has a function which should handle this nicely.)

On the other hand, what you're talking about means a SQL statement, some Perl, and a bunch more SQL statements:
for(SELECT tags.name FROM tags, storytags WHERE storytags.storyid = (blah)
        AND storytags.tagid = tags.id AND tags.type = 'ship') {
    @charids = split '.', tags.name
    for(@charids) {
        SELECT name FROM tags WHERE id = (blah)
    }
}
Hmm...perhaps I'm approaching this wrong. What if I create tags for each pairing as needed, and use a secondary table to keep track of which characters are in the pairing? For example:

The tags table has this:
idtypefandomname
1character42Granger, Hermione
2character42Lovegood, Luna
3character42Malfoy, Draco
4character42Potter, Harry
5character42Potter, James
6character42Potter, Lily
7character42Weasley, Ginevra
8character42Weasley, Ronald
9character42Patil, Padma
10character42Patil, Parvati
11ship42Granger, Hermione/Potter, Harry

The ships table, which is only used to figure out which ships already have tags, has this:
tagidcharacter
11 (H/Hr)1 (Hermione)
11 (H/Hr)3 (Harry)

H/Hr stories are just marked as having tag #11, which just happens to be a ship tag. When someone uploads an H/Hr story, the software checks to see if there's already a tag for H/Hr, and sees that tag #11 already exists for that, so it uses that. But when someone goes nuts and tries to upload H/Hr/G/Pv/Pd, it sees that such a thing doesn't exist, so it adds a new tag:
idtypefandomname
12ship42Granger, Hermione/Patil, Padma/Patil, Parvati/Potter, Harry/Weasley, Ginny

And then notes it in the ships table with new entries:
tagidcharacter
12 (H/Harem)1 (Hermione)
12 (H/Harem)3 (Harry)
12 (H/Harem)7 (Ginny)
12 (H/Harem)10 (Padma)
12 (H/Harem)11 (Parvati)

The win: The only special thing about a ship is how it's created; in every other respect, a pairing is just a normal tag.

The loss: I was hoping that, except for "custom" and "user", we could avoid having tags which were created by normal users (as opposed to administrators). In crossover fics, cross-fiction pairings become impossible, because a tag can only be associated with either one fandom or all fandoms. (But maybe this isn't so important, as ship tags aren't added using the same menus as other kinds of tags—they're constructed out of drop-down lists.)
acdragonmaster
May. 27th, 2006 09:11 am (UTC)
You have to be careful though with the people who'll write any/all pairings they can think of, if you don't want to end up with dozens, even hundreds of tags that apply only to a handful of fics at best. I know there was someone in the FMA fandom who was deliberately writing a series of fics pairing Ed with every single other character they could think of from the series, for example...
brentdax
May. 27th, 2006 06:35 pm (UTC)
I don't particularly care if there are lots of tags with only a few fics on them. Tags are a couple dozen bytes of disk space to create and about eight bytes to use—hardly a disk-buster.
acdragonmaster
May. 27th, 2006 06:56 pm (UTC)
Hehe, I suppose, I've just seen people go really crazy with the pairings before.

Oh, though here's another thought in general- maybe have an optional spot for people to mention what spoilers their fic might give for a series?
brentdax
May. 27th, 2006 09:08 pm (UTC)
The full list of tag types is:
  • genre
  • warning/element (haven't decided on the name)
  • restriction (stories with these tags are considered to be adult-only, and are only displayed to registered users over the age of 18)
  • era
  • spoiler
  • character
  • (maybe) ship
  • custom
  • user
acdragonmaster
May. 27th, 2006 07:37 am (UTC)
Two options for user-added tags: suggested tags aren't added, but added to a suggestion list of sorts that the author can view and add if they think it's a good idea, OR have a publicly-viewed suggestion list that other users can vote yes or no on (and of course, the author could always still add it themselves if they wanted).

Possible variations on that, of course, such as having a list of suggestions viewable to the author, but if enough people suggest the same thing it gets added automatically, etc.
( Read 9 comments — Leave a comment )