As promised here ( https://bdsmlr.zendesk.com/hc/en-us/community/posts/360003485759-Search-radical-improvement-to-tags ) , here is my idea for searching improvements: a meta-tagging system.
First, there should be two searches: a tagged search which matches verbatim (sans capitalization/diacritics), and a more general "search".
The tagged search should work the way search works now for posts. If I do a tagged search for the phrase "girls with mud on their shoes", I only get posts tagged "girls with mud on their shoes".
The more general should loosely track the tags, and to a lesser extent, the text. Let's say we're trying to solve the "similar tags" problem, where a lesbian picture may be tagged "lesbian", "lesbiana", "lesbienne", their plural forms, "girl on girl", "gg", or "ff". Over the course of several reblog chains and many different posts, those tags are likely to appear within the same reblog chains at some point. So we know that "lesbian" and "lesbianas" have a high correlation. Likewise, the tag "cbt" would be very poorly correlated with "lesbian".
The general search should use this information to find related posts, that closely correlate based on the tags a post gets across the entire reblog chain. A search for "gg" will correlate highly with "lesbian", but not with "cbt".
This has several advantages:
- A new post need not have all possible tags for a topic - a single tag is sufficient, since that tag will highly correlate with closely related tags.
- Translation becomes a non-issue, since the same tag in different languages will correlate with each other.
- This doesn't require a change in user behavior; if everyone keeps doing exactly what they do now the data set builds itself.
- Freely-available, open source software is available to do the bulk of the data mining.
- And finally, the DATA SET ALREADY EXISTS. There is no extra work creating a new tagging scheme. There is already a sufficiently large data set to make this work.
I realize any work towards this would be months, if not more than a year, away. But this search scheme has the big advantage of having all the data needed already in place, and is more likely to deliver a better search experience since it doesn't rely on a single poster's tagging scheme for any post. All initial tagged posts would now easily searchable, without getting lost due to simple variations in tagging.
Longer-term, this could be fed into an image recognition package to allow searching of untagged images. Basically, untagged images would be scanned, and receive "hidden" tags based on content, that are now searchable.
(But whatever you do, don't destroy verbatim tagged searches; lots of people have "personal tags" that require verbatim.)
Please sign in to leave a comment.