Source | 2009 30th IEEE Symposium on Security and Privacy |
Authors | Matt Weir, Sudhir Aggarwal, Breno de Medeiros, Bill Glodek |
Approach: use Probabilistic Context-Free Grammars(PCFGs) to model the derivation of password patterns.
Two most commonly used methods: brute-force and dictionary attacks.
Assumption: not all guesses have the same probability of cracking a password.
Experiment: divide password list into train and test sets.
Preprocessing phase: measure the frequencies of certain patterns associated to the password strings.
Base structure: capture the length of the observed strings.
Only calculate the probabilities for digit and special strings.
Context-Free Grammar can be defined as below:
Probabilistic Context-Free Grammar have probabilities associated with each production and add up to 1.
Sentantial form: string derived from the start symbol. Probability of sentential form is the product of the probabilities of the productions used in its derivation.
Problem: generating guesses in order of non-increasing probabilities.
Proposed mathod: use priority queue(based on max heap1).
Pivot value: next pre-terminal form shall be obtained by replacing variables with an index greater or equal than the popped pivot value. Pivot order is used to ensure the structure in the queue are not duplicated.
Entry in the priority queue contains base structure, pre-terminal form, probability and pivot value.
Summary: The proposed method in most cases can produce better result than John the Ripper. However, its performance is strongly related to the dictionary used and the similarity between the password complexity in the training and testing sets.
The parent entries' probability is greater(or maybe equal) compared to its children. ↩