To accurately classify a query as navigational, thousands of available
features are explored, extracted from major commercial search engine
results, user Web search click data, query log, and the whole Web's
relational content. To obtain the most useful features for navigational
query identification, a three level system is used which integrates
feature generation, feature integration, and feature selection in a
pipeline. Because feature selection plays a key role in classification
methodologies, the best feature selection method is coupled with the best
classification approach to achieve the best performance for identifying
navigational queries. According to one embodiment, linear Support Vector
Machine (SVM) is used to rank features and the top ranked features are
fed into a Stochastic Gradient Boosting Tree (SGBT) classification method
for identifying whether or not a particular query is a navigational
query.