Method for automatically finding frequently asked questions in a helpdesk data set page

A system and method automatically identify candidate helpdesk problem categories that are most amenable to automated solutions. The system generates a dictionary wherein each word in the text data set is identified, and the number of documents containing these words is counted, and a corresponding count is generated. The documents are partitioned into clusters. For each generated cluster, the system sorts the dictionary terms in order of decreasing occurrence frequency. It then determines a search space by selecting the top dictionary terms as specified by a user defined depth of search. Next, the system chooses a set of terms from the search space as specified by a user-defined value indicating the desired level of detail. For each possible combination of frequent terms in the search space, the system finds the set of examples containing all the terms, and then determines if the frequency is sufficiently high and the overlap sufficiently low for this candidate set of examples to be a frequently asked question.

Un sistema e un metodo identificano automaticamente le categorie di problema del helpdesk del candidato che sono le più favorevoli alle soluzioni automatizzate. Il sistema genera un dizionario in cui ogni parola nell'insieme di dati del testo è identificata ed il numero di documenti che contengono queste parole è contato e un conteggio corrispondente è generato. I documenti sono divisi nelle serie di ingranaggi. Per ogni serie di ingranaggi generata, il sistema fascicola i termini del dizionario per frequenza di diminuzione di caso. Allora determina uno spazio di ricerca selezionando i termini superiori del dizionario come specificati da una profondità definita utente della ricerca. Dopo, il sistema sceglie un insieme dei termini dallo spazio di ricerca come specificato da un valore prestabilito dall'utente che indica il livello voluto del particolare. Per ogni combinazione possibile di frequenti termini nello spazio di ricerca, il sistema trova l'insieme degli esempi che contengono tutti i termini ed allora determina se la frequenza è sufficiente alta e la sovrapposizione sufficiente basso per questo insieme del candidato degli esempi essere una domanda frequentemente fatta.