How to create personal Thesaurus.com
If you already have OpenOffice installed, you are half way done creating your own Thesaurus.com. All you have to do is just parse the thesaurus file that came with OO and display the result on your local web server. I'm not able to put this online since I like the result from Thesaurus.com and the code is not yet practical (using text file, not database). However, the OO comes with multiple languages, so there is something that Thesaurus.com doesn't offer yet.
If you did the default installation, the thesaurus files for multiple languages should be located on C:\Program Files\OpenOffice.org 2.1\share\dict\ooo folder. There are two version for each language, .dat and .idx. I've used the_en_US_v2.dat file since it was easy to understand the format.
For each word, it consists with multiple lines and use '|' for delimitated field on the line. The first line starts with the word and the number of meanings. The following lines are for each meaning of the word. It start with the part of speech, main word, and synonyms and antonyms all separated by '|'. Just take a look. It is easy to understand the format.
If you decided to create your own Thesaurus.com, use a database. I'm the only user for local web server, P4 1.8GHz running Windows Server 2003, I've noticed the lag if you search any word starts with the letter z.
Update: I've divided one-big-flat thesaurus file into multiple files according the first letter. I no longer notice any lag when I'm searching. It was taking over 1 second for any word starts with letter z. Now it display the resulted in less than 0.1 second. Still it's not better than using a database.
No comments:
Post a Comment