TSEP (The Search Engine Project) has been in development since 2004. Girish started this project as a requirement for a site which needed a search engine to search its pages which numbered up to 150 pages approximately. So he went through the at that time available Open Source search engine software, but was unable to find any that was easy to understand and setup for a web-master. At that time he started this project. Since then it has been extremly improved continue s to get better. Olaf joined in since v0.9 beta and is responsible for the project now. By now there are some other developers, special note to Manfred. Please see the TSEP summarypage on sourceforge for more details.
The primary objective of this software is 'ease of use'. If you still think that this software is difficult to setup and / or use please let us know, and we will personally help you as well as make us aware of the complexities. By submitting this software to the Open Source community we strive for this software to become the most powerful personal site search engine in the world.
We have made every effort to make the copyright notice nice and small, so please do not remove it from your site, so that others too can discover what a great tool TSEP is.
We are very interested where TSEP is being used. Therefore we would really appreciate it if you could contact us to let us know the web address where we can take a look (and grab a screenshot) or - if it's an intranet - to send us a screenshot.
A word about versions: We publish a new version whenever we think one is ready. A version number does not indicate the quantity or complexity of the changes applied since its previous version. We might add a 0.001 to a version number but still have made huge changes to TSEP. In brief, we recommend downloading every new version of TSEP.
see the installation instrucions for this topic.
This was introduced in version 0.927. Building the index (using the indexer) needs an algorithm to find the files to be indexed. The TSEP-integrated filefind-algorithm reads each (sub)-directory, starting at the given starting-directory, to collect the filenames found there, to be indexed.
In addition to this integrated filefind-algorithm, TSEP gives the ability to build the index for files, whose filenames (urls) are supplied from outside of TSEP (e.g. a simple file(url)list, a filelist returned by a crawler/spider-process,...).
The external datasupply has to be a .php-script, which has to communicate with TSEP in the following way:
At the TSEP-admin-page "build-new-index", the (fully qualified) name of this .php-script has to be given.
Urls,... are returned to TSEP via
call_user_func("TSEP_ExternalCallBack", returnstring);
returnstring has to be one of the following:
In the admin/example-directory you can find examples, how this external datasupply feature can be used. Try using "examples/urllist.php" as external datasupply for your first tests (it's a very simple datasupply and makes it easy to understand, how this feature works).
examples/phpcrawl4tsep.php is an up-and-running samplescript to communicate with an installed PHPCrawl. Just place the phpcrawl4tsep.php into the same directory as PHPCrawl (there where the example.php from PHPCrawl is) and call it from the TSEP indexer.
Attention:
Example of how to correctly configure to run phpcrawl4tsep.php from within TSEP:
assume:
www.mydomain.de/index.php entry-page of your site
www.mydomain.de/php/tsepsearch installation-directory of TSEP
www.mydomain.de/php/tsepsearch/admin/indexer.php TSEP-indexer.php-script ("build-new-index"-startpage)
www.mydomain.de/php/phpcrawl/ install-directory of PHPCRAWL
www.mydomain.de/php/phpcrawl/phpcrawl4tsep.php our samplescript
The picture of the installation shows our example with it's settings.
Parameter, entered at TSEP-admin-page "build-new-index", are made available for the external datasupply script via public variables:
In version 0.938 we introduced the first tags.
You can simply add those to your pages which will be indexed to give TSEP instructions.
At this time there are 3 different tags:
To run a search, open your search page or the page we prepared for you called 'tsepsearch.php' in your browser and input the words to be searched. The search words are not case sensitive.
TSEP supports boolean search if you have a MySQL version equal or higher than 4. Below are some the boolean search features. Important: Your tables must be MyISAM tables if the boolean search should work. (they should be MyISAM when we created them)
The minimum length of a search term is 4, see MySQL restrictions below for details. (User defined) stopwords are not marked in the results and not used in the database query.
You can add, update and delete your own stopwords.
Stopwords are words which will not be searched on your pages. This means that when using a stopword as a searchterm, it will not be marked as a searchterm in the results.
Stopwords are not case sensitive. This means that if you enter "Apple" in the stopwords section and the users searches for "apple" this word will be treated as a stopword.
Please note that there are MySQL restrictions on stopwords as well!
This was introduced in version 0.911. The administrator can define in the setup file whether and what search activity should be logged. All log entries are accompanied by a timestamp. The admin can decide to log the following: IP address, search term and clicks on the results.
The administrator may want to analyse what users are searching for on his site and make navigation to those points easier.
The administrator can also log the IP address of the person searching. Be aware that people might not like the idea of you "spying" on them. But we thought this might be a useful feature - maybe especially for Intranets. In those, if someone is totally lost the administrator can take him by his hand and help directly.
The administrator may want / need to notify the users if their actions are being loged, especially when logging their IP address.
For sorting the log entries by IP adddress, MySQL v3.23 or higher is required. (MySQL restrictin on IP sorting)
Quote:
Any word that is too short is ignored. The default minimum length of words that will be found by full-text searches is four characters.
Quote:
Words in the stopword list are ignored. A stopword is a word such as ``the'' or ``some'' that is so common that it is considered to have zero semantic value. There is a built-in stopword list.
For more details you might read on the source page of these quotes: 13.6 Full-Text Search Functions
The restrictions are covered on 13.6.3 Full-Text Restrictions
People with access to the MySQL server though can fine-tune their MySQL to overcome these restrictions. You find information about this on 13.6.4 Fine-Tuning MySQL Full-Text Search
More on built-in MySQL stopwords you will find when you search the MySQL page for "stopword list". A list of words which we think are compiled into MySQL is in the docs directory: stopword-mysql.txt
Personally I do not see the big problem about the built-in stopwords because they are so general that probably no one really trying to find something will enter "you" as a search word. Searching is nothing new to people so that they will enter words which they think match what they need best. This also comes down to that they will enter words which are probably long enough not to fall under the length restriction. Also those are English words and TSEP is now ready for other languages as well. (Olaf)
The version of TSEP is included in the 'title' tag of the copyright notice. This means that you can move your cursor over the copyright notice (on the bottom of the search page for example) and after a little while your browser should display the version number.
The version number is read from a textfile in the include directory named tsepversion.txt. There is no need to change anything in this file: it is maintained by the programmers.
If you decide to create a new language please mail us the language.php file which you created, so that we can add it to the next version.
Language files define the PHP variables which are being used in the TSEP files. Place the language.php into a subdirectory of the language directory. Let's say you are creating a Spanish version:
Some people asked how they can delete a word from the index or correct a word. In version 0.910 we introduced the possibility to do this right from TSEP.
Open the indexoverview.php page either directly or from any page in the admin folder (indexer.php for example). Look for the page you want to edit. Click in the title column on the link (title of the page). This takes you to the editing. Be aware: This behaviour has changed in 0.938 - if you click on the URL (in the URL column) you will open the page itself!
Tipp: You can still call the indexedit.php directly (with no parameters). This will show you more page details at once and you can still add completly new pages to the index! The number of pages you see there is defined in the configuration.php by this value in the Limits group:
How many index entries in the show complete index page in one html document? Be careful not to set a too high number as the page might get very big!
This has been changed in 0.934 - now you can simply enter the filetypes (extensions) you want TSEP to index on the indexer page. Please seperate different types by comma only (no spaces etc). Also make sure that you pay attention to the case of the extenstions: "php" is not equal "PHP" on Unix/Linux systems!
Example: html,htm,php
Rank means that all pages are shown ordered by the number of hits they received by all search words. Example: You get 2 results after a search, on the page with rank 1 the search words were found more often than on the page with the rank 2 - simple but very useful if you have many pages on your site and the user might face lots of results.
This is simple but takes a little while. To make things as easy as we can, we will take a look on the result page step by step. The formating we show you here is from version 0.911. It might change in future but still be pretty much the same.
Please note that there are additional div-blocks in the search page. Those are only shown when errors occur (stopword was searched, MySQL version to low...) Therefore we leave it up to you for now to look deeply into these formattings and for the general users sake we stick with something most people will see.
If you have done some nice formating we would appreciate it if you could contact us and send us your CSS file so that we could include it in a new TSEP version.
All of TSEP - on all TSEP pages is in the following div container to provide a global area for TSEP.

With this knowledge already you can change the look very much, for example setting the .tsepProject class in the tsep.css file to another font. This will change all fonts in the TSEP area to whatever you define.
Now that you know the header, let's look on the next part of the search page: The .SearchBlock which contains the search form fields and the help - which as you can see has it's extra div container .SearchHintsHelp .

This SearchBlock is being followed by another .SearchBlock which provides status information. This whole block is repeated at the bottom of all search results. If you know a little about CSS you should be able to format this block to fit your needs.

This first container of this type is followed by our search results. Here we use the following classes:
.SearchResultAllPagesBlock - this is the block of all the results.
.SearchResultOnePageBlock - this is a block of one resulting page.
.SearchResultOnePageTitle - this is the title of the webpage we found in the database.
.resultnumber - this is the rank of the page. (details: rank).
.SearchResultPageRank - displays how many times the page had a hit from the searchwords.
.SearchResultOutput - these are the words which we indexed - until we encounter the first "explode" character (a . (dot) right now).
.foundSearchWord - this is one of the words the user has searched. We can mark it special so that the user sees it faster.
.SearchResultOutputMore - these are the little dots which show the user there is more on the page.
.SearchResultURL - is the URL of the page we have found, extended by the size of the page (as written in the database).

You need to open configuration.php once again - no changes are needed, but you must open the file once! That is where the TSEP path is saved. Now your installation should work fine again.
"Warning: set_time_limit(): Cannot set time limit in safe mode in /..../tsep/admin/indexer.php on line 110"
This is nothing really important. It shows only in the admin area. The error occurs when the safe-mode on the server is on. No problems except this are know at this time (concerning the safe-mode).
You are trying to create an index. TSEP finds your pages but shows no words and a filesize of 0k.
You entered the path setting not correctly. Please check if you entered
http://www.yourdomain.com/ (wrong)
instead of
http://www.yourdomain.com (correct: no slash)
You might run into problems with MySQL v3.23 or lower. If you are running such a version we would be happy to hear if TSEP is working for you or and what kind of problems you have encountered.
Also see other MySQL restrictions
It seems that with MySQL 5 alpha there are problems concerning the indexer.php. We will assume for now that is an issue of the new MySQL version and has nothing to to with TSEP.
Try populating the $db_table_prefix.config table by hand (using phpMyAdmin for example) with the values you find in the SQL scripts in the admin directory.
This software has been tested on Windows and Linux systems with Apache as web server running PHP v4.2 or greater and MySQL (v4 or greater for boolean capabilities). 'allow_url_fopen' option should be enabled for PHP.
Please mail us any suggestions or questions you ay have or post them to the Sourceforge forums. We welcome any response. If you need help or "something does not work" please include the version number of TSEP you are using, of your PHP and MySQL as well as of your webserver and which one.
Software by: Olaf Noehring (main development since 0.9beta (excluding)) and Girish R (development until 0.9beta (including))
Special thanks to:
Version: TSEP 0.9nnn
This file has been last modified on:
2005-01-20 10:26 AM
by Olaf Noehring
Copyright (c) 2002-2005, Olaf Noehring & Girish R. All Rights Reserved.
Support & Info (Summary on Sourceforge): http://sourceforge.net/projects/tsep/
Contact: Olaf Noehring (email on website: http://www.team-noehring.de) or Girish R at:
girishr at gmail.com with your comments, suggestions, enquires or requirements.
This file is part of TSEP (The Search Engine Project)
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA