Optimizing Tip on Adaptive Arithmetic Coding | CodeGuru

Optimizing Tip on Adaptive Arithmetic Coding

The arithmetic coding data compression method is very well-known. It has a number of implementations used in of popular complex, multistaged compression techniques. This article, therefore, assumes that the reader will be readers with the basic concepts of arithmetic coding. This article describes an optimization point I found in the most commonly used adaptive Witten-Neal-Cleary […]

Written By
CodeGuru Staff
CodeGuru Staff
Jan 2, 2002
3 minute read
CodeGuru content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

The arithmetic coding data compression method is very well-known. It has
a number of implementations used in of popular complex,
multistaged compression techniques. This article, therefore, assumes that the reader
will be readers with the basic concepts of arithmetic coding.

This article describes an optimization point I found in the most commonly used adaptive
Witten-Neal-Cleary implementation (also known as “the finite-precision algorithm”).
For details (including source code samples), you can refer to the

Mark Nelson’s article
on arithmetic coding. Below I’m using the common terminology that
article also follows.

In the practical implementations I’ve seen, the cumulative symbol frequency table (a component
of the algorithm) was represented by an array of integer numbers containing, for each given
symbol, the sum of frequencies of all the symbols having indices less than given. The array
was sorted in arbitrary fashion (for instance, by “values” of symbols) or by (non-cumulative)
frequencies of symbols. Encoding a regular symbol needs corresponding frequency interval to
be determined; decoding needs determining a frequency interval containing a given point. Both
actions are followed by updating the adaptive model, what’s commonly done by increasing the
non-cumulative symbol’s frequency (= increasing the cumulative frequencies of all the symbols
having indices not less than the one of the symbol encoded/decoded). The first thing is fast,
the second can be done fast enough by binary search in the array (which is monotone; I’ve noted
it’s not done in the sample I’ve referenced above), but the third requires a considerable number
of increments to perform. Sorting the array by frequencies gives effect only for small arrays
(= small total number of symbols) and EXTREMELY non-uniform distribution of symbols.

So I’ve suggested to use a tree-like representation of the frequency information (stored in an
array). Assume we totally have 8 symbols : 0, 1, …, 7 and their cumulative frequencies
Q[0] = 0, Q[1], … Q[7], Q[8] = (the sum of all non-cumulative frequencies). We can break the
whole frequency interval (0, Q[8]) with Q[4] into 2 subintervals (of lengths Q[4] and Q[8] – Q[4]),
then do the same with the intervals obtained (using Q[2] and Q[6]), and so on. Thus we
obtain a binary tree of “breaking points”, each node of which represents breaking of an interval
into the two. Let the node contain the breaking location counting from the lower bound of an
interval. If we now store the resulting tree in an array using the indices of breaking points
(as Q[] elements) in the Q[] array, we will obtain the following array A[]:

Q[0] = 0, Q[1], Q[2], Q[3] – Q[1], Q[4], Q[5] – Q[4], Q[6] – Q[4], Q[7] – Q[6] (, Q[8]).

Now, obtaining interval bounds and updating the model while encoding/decoding can be done
parallelly. To pass a given symbol while encoding, binary-search it’s index in the
“0, 1, …, 7, 8” array (starting to compare with “8”, then “4” …) with increasing A[k] when
going left the node “k”. To pass a given point while decoding, binary-search it in A[] with
subtracting A[k] from the point when going right the node “k” (and increasing A[k] if going
to the left). Initial state of A[] is not hard to determine (in our case, it is
“0, 1, 2, 1, 4, 1, 2, 1, 8”), and the overflow preventing (halving the frequencies) is also
trivial. Q[8] can be stored in A[0] instead of A[8].

I’m attaching a code shortcut containing ‘CFreqModIncr’ class implementing all this functionality
(sorry, but I didn’t have time to make up a complete project demonstrating it 🙂 (maybe someone
will do it “for me and others” ? ;).

// *** A short sample.
// Construct the model.
CFreqModIncr model;
// Initialize to keep 11 symbols and 
// set up the increment.
model.Initialize(11);
model.dwIncrement = 1;
// To pass a symbol with index ‘dwIndex’ 
// while encoding, use :
DWORD dwLeft, dwRight, dwTotal = model.GetTotalFrequency();
model.PassByIndex(dwIndex, dwLeft, dwRight);
… // … and encode the (dwLeft/dwTotal,// dwRight/dwTotal) interval
// To decode a symbol, obtain a point 
// ‘dwPoint’ in frequency units, then use :
DWORD dwLeft, dwRight, dwTotal = model.GetTotalFrequency();
DWORD dwIndex = model.PassByPoint(dwPoint, dwLeft, dwRight);
… // dwIndex = index of the symbol // decoded; other ‘dw’s are to be
// used in the rest of decoding process …
model.Uninitialize();

Downloads

Download source – 2 Kb

CodeGuru Logo

CodeGuru covers topics related to Microsoft-related software development, mobile development, database management, and web application programming. In addition to tutorials and how-tos that teach programmers how to code in Microsoft-related languages and frameworks like C# and .Net, we also publish articles on software development tools, the latest in developer news, and advice for project managers. Cloud services such as Microsoft Azure and database options including SQL Server and MSSQL are also frequently covered.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.