Note: Password Cracking Using Probabilistic Context-Free Grammars

securitypassword cracking
2022-09-27
Source2009 30th IEEE Symposium on Security and Privacy
AuthorsMatt Weir, Sudhir Aggarwal, Breno de Medeiros, Bill Glodek

Introduction

Approach: use Probabilistic Context-Free Grammars(PCFGs) to model the derivation of password patterns.

Background and Previous Work

Two most commonly used methods: brute-force and dictionary attacks.

Probabilistic Password Cracking

Assumption: not all guesses have the same probability of cracking a password.

Experiment: divide password list into train and test sets.

Preprocessing

Preprocessing phase: measure the frequencies of certain patterns associated to the password strings.

Base structure: capture the length of the observed strings.

Only calculate the probabilities for digit and special strings.

Using Probabilistic Grammars

Context-Free Grammar can be defined as below:

G=(V,Σ,S,P)G = (\mathbf{V}, \mathbf{\Sigma}, \mathbf{S}, \mathbf{P}) αβ\boldsymbol{\alpha} \rightarrow \boldsymbol{\beta}

Probabilistic Context-Free Grammar have probabilities associated with each production and add up to 1.

Sentantial form: string derived from the start symbol. Probability of sentential form is the product of the probabilities of the productions used in its derivation.

Efficienty Generating a "Next" Function

Problem: generating guesses in order of non-increasing probabilities.

Proposed mathod: use priority queue(based on max heap1).

Pivot value: next pre-terminal form shall be obtained by replacing variables with an index greater or equal than the popped pivot value. Pivot order is used to ensure the structure in the queue are not duplicated.

Entry in the priority queue contains base structure, pre-terminal form, probability and pivot value.

Proof of Correctness of the Next Function

Experiment and Results

Summary: The proposed method in most cases can produce better result than John the Ripper. However, its performance is strongly related to the dictionary used and the similarity between the password complexity in the training and testing sets.


Footnotes

  1. The parent entries' probability is greater(or maybe equal) compared to its children.