India's First Regional Search Engine Developed

by Julia Fernandes    Nov 09, 2004

Step aside all ye search engines. A country that boasts of more than 75 major regional languages, has finally managed its own regional search engine.

Named ‘Kazhugu’ the search engine has been developed by Anna University’s K.B. Chandrasekar Research Centre (AU-KBC) and is ready for use on Tamil websites.

Sharing the uniqueness of this engine with CXOtoday, S Baskaran, member-research staff, AU-KBC, said, “The engine is developed in such a manner that each and every Web page irrespective of the way it has been encoded is stored in a uniform encoding pattern within the database of the engine. This makes it easier to search across all websites.”

Stressing further he said, “Non-standardization of encoding patterns across different websites has been tackled with the development of this engine. We have developed a uniform encoding converter, which converts from any existing encoding pattern to another encoding scheme.”

They have also developed something called as morphological analysis. According to Baskaran, unlike English, Indian languages are rich in inflections. So, even though a user types one word, different forms of the same word are pulled up.

Users can run both site specific as well as Web searches using this engine. Kazhugu is capable of searching all the Tamil websites, despite the difference in the fonts they use. While the search can be done on English language websites also, the user gets the results in Tamil.

The engine has been developed in Java on a Linux platform, but it can be ported on Windows. It consists of around 40 to 50K lines of code. The database used is Oracle 9i and My SQL.

Despite the use of open source components, CXOtoday queried as to why the engine was not released as open ource. “Since we work on a self-sustaining model, the possibility of releasing the engine as open source currently is not on the cards. However, future possibility is not ruled out,” revealed Baskaran.

The project commenced in August 2000. The Natural Language Processing group consists of 10-12 people, while four core developers worked on this engine.

While the research center is currently working on the Hindi search engine, plans are underway to introduce similar Internet search engines for other languages like Malayalam, Telugu and Kannada.

The search engine is currently being tested on the Sify portal. Kazhugu will be marketed commercially by KBC Research Foundation Pvt Ltd.

Tags: Search