This is an old revision of this page, as edited by Doug Bell (talk | contribs) at 22:20, 20 January 2006 (added discussion of hashCode() to Java section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Revision as of 22:20, 20 January 2006 by Doug Bell (talk | contribs) (added discussion of hashCode() to Java section)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)In computing, an associative array, also known as a map, lookup table, or dictionary, is an abstract data type very closely related to the mathematical concept of a function with a finite domain. Conceptually, an associative array is composed of a collection of keys and a collection of values, and each key is associated with one value. The operation of finding the value associated with a key is called a lookup or indexing, and this is the most important operation supported by an associative array. The relationship between a key and its value is sometimes called a mapping or binding. For example, if the value associated with the key "bob"
is 7
, we say that our array maps "bob"
to 7
.
From the perspective of a programmer using an associative array, it can be viewed as a generalization of an array: While a regular array maps integers to arbitrarily typed objects (integers, strings, pointers, and, in an OO sense, objects), an associative array maps arbitrarily typed objects to arbitrarily typed objects. (Implementations of the two data structures, though, may be considerably different.)
The operations that are usually defined for an associative array are:
- Add: Bind a new key to a new value
- Reassign: Bind an old key to a new value
- Remove: Unbind a key from a value and remove it from the key set
- Lookup: Find the value (if any) that is bound to a key
Examples
One can think of a telephone book as an example of an associative array, where names are the keys and phone numbers are the values. Another example would be a dictionary where words are the keys and definitions are the values. A database is a sort of generalized associative array.
Data structures for associative arrays
Associative arrays are usually used when lookup is the most frequent operation. For this reason, implementations are usually designed to allow speedy lookup, at the expense of slower insertion and a larger storage footprint than other data structures (such as association lists).
Efficient representations
There are two main efficient data structures used to represent associative arrays, the hash table and the self-balancing binary search tree. Skip lists are also an alternative, though relatively new and not as widely used. Relative advantages and disadvantages include:
- Hash tables have faster average lookup and insertion time (O(1)), while some kinds of balanced binary trees have faster worst-case lookup and insertion time (O(log n) instead of O(n)). Hash tables have seen extensive use in realtime systems, but trees can be useful in high-security realtime systems where untrusted users may deliberately supply information that triggers worst-case performance in a hash table, although careful design can remove that issue. Hash tables shine in very large arrays, where O(1) performance is important. Skip lists have worst-case operation time of O(n), but average-case of O(log n), with much less insertion and deletion overhead than balanced binary trees.
- Hash tables can have more compact storage for small value types, especially when the values are bits.
- There are simple persistent versions of balanced binary trees, which are especially prominent in functional languages.
- Building a hash table requires a reasonable hash function for the key type, which can be difficult to write well, while balanced binary trees and skip lists only require a total ordering on the keys. On the other hand, with hash tables the data may be cyclically or partially ordered without any problems.
- Balanced binary trees and skip lists preserve ordering -- allowing one to efficiently iterate over the keys in order or to efficiently locate an association whose key is nearest to a given value. Hash tables do not preserve ordering and therefore cannot perform these operations as efficiently.
- Balanced binary trees can be easily adapted to efficiently assign a single value to a large ordered range of keys, or to count the number of keys in an ordered range.
Association lists
A simple but generally inefficient type of associative array is an association list, often called an alist for short, which simply stores a linked list of key-value pairs. Each lookup does a linear search through the list looking for a key match.
Strong advantages of association lists include:
- No knowledge is needed about the keys, such as an order or a hash function.
- For small associative arrays, common in some applications, association lists can take less time and space than other data structures.
- Insertions are done in constant time by consing the new association to the head of the list.
Specialized representations
If the keys have a specific type, one can often use specialized data structures to gain performance. For example, integer-keyed maps can be implemented using Patricia trees or Judy arrays, and are useful space-saving replacements for sparse arrays. Because this type of data structure can perform longest-prefix matching, they're particularly useful in applications where a single value is assigned to most of a large range of keys with a common prefix except for a few exceptions, such as in routing tables.
String-keyed maps can avoid extra comparisons during lookups by using tries.
Language support
Associative arrays are known by many names in different programming languages. In Smalltalk and Python they are called dictionaries; in Perl and Ruby they are called hashes; in Java they are called Maps (see Map
) and in Common Lisp they are called hash tables (since Common Lisp typically uses this implementation). In PHP all arrays can be associative, except that the keys are limited to integers and strings and can only be a single level of subscripts. In MUMPS, all arrays are associative, with multiple dimensional arrays (ie: with variable numbers of keys), MUMPS keys are strings with values at any dimensional level allowed to store string values.
In the scripting language Lua, associative arrays, called tables, are used as the primitive building block for all data structures, even arrays. Likewise, in JavaScript, all objects are associative arrays. In MUMPS, the associative arrays are typically stored as B-trees.
Associative arrays can be implemented in any programming language, and one or more implementations of them is typically found either built into the language or the standard library distributed with it (C is a notable exception, as neither the language nor the standard library directly provide one. It is not difficult to write one in C, though).
Awk
Awk has built-in, language-level support for associative arrays.
For example:
phonebook = "555-9999" phonebook = "555-1212" phonebook = "555-1337"
You can also loop through an associated array as follows:
for (name in phonebook) { print name, " ", phonebook }
You can also check if an element is in the associative array, and delete elements from an associative array.
C++
C++ also has a form of associative array called std::map
(see Standard Template Library#Containers). One could create a map with the same information as above using C++ with the following code:
#include <map> #include <string> int main() { std::map<std::string, std::string> phone_book; phone_book = "555-9999"; phone_book = "555-1212"; phone_book = "553-1337"; return 0; }
In C++, std::map
allows keys and values to be different data types, but all of the keys in a particular map must be of the same base type.
The same must be true for all of the values. Although std::map
is typically implemented using a self-balancing binary search tree, the SGI STL also provides a std::hash_map
which has the algorithmic benefits of a hash table.
D
D offers direct support for associative arrays in the core language. The equivalent example would be:
int main() { char] phone_book; phone_book = "555-9999"; phone_book = "555-1212"; phone_book = "553-1337"; return 0; }
Keys and values can be any types, but all the keys in an associative array must be of the same type, and the same for values.
Java
In Java associative arrays are implemented as "maps"; they are part of the Java Collections Framework. Since J2SE 5.0 and the introduction of generics into Java, collections can have a type specified; for example, an associative array mapping strings to strings might be specified as follows:
Map<String, String> phoneBook = new HashMap<String, String>(); phoneBook.put("Sally Smart", "555-9999"); phoneBook.put("John Doe", "555-1212"); phoneBook.put("J. Random Hacker", "555-1337");
The get
method is used to access a key; for example, the value of the expression phoneBook.get("Sally Smart")
is "555-9999"
.
This code uses a hash map to store the associative array, by calling the constructor of the HashMap
class; however, since the code only uses methods common to the interface Map
, one could also use a self-balancing binary tree by calling the constructor of the TreeMap
class, without changing the definition of the phone_book
variable or the rest of the code, or use a number of other underlying data structures that implement the Map
interface.
The hash function in Java is provided by the method Object.hashCode()
. Since every class in Java inherits from Object
, every object has a hash function. A class can override the default implementation of hashCode()
to provide a custom hash function based on the properties of the object.
The Object
class also contains the method equals(Object)
that tests the object for equality with another object. Maps in Java rely on objects maintaining the following contract between their hashCode()
and equals()
methods:
- For two objects a and b
- a.equals(b) == b.equals(a)
- if a.equals(b) then a.hashCode() == b.hashCode()
In order to maintain this contract, a class that overrides equals()
must also override hashCode()
so that hashCode()
is based on the same properties (or a subset of the properties) as equals()
.
A further contract that the map has with the object is that the result of the hashCode()
method will not change once the object has been inserted into the map. For this reason, it is generally a good practice to base the hash function on immutable properties of the object.
JavaScript
JavaScript (and its standardized version: ECMAScript) is a prototype-based object-oriented language. In JavaScript an object is a mapping from property names to values -- that is, an associative array. The only difference is that objects have a prototype link to the object they inherit from. Doing a lookup for a property will forward the lookup to the prototype if the object does not define the property itself.
An object literal is written as { property1 : value1, property2 : value2, ... }
. For example:
var myObject = { "Sally Smart" : "555-9999", "John Doe" : "555-1212", "J. Random Hacker" : "553-1337" };
If the property name is a valid identifier, the quotes can be omitted, e.g.:
var myOtherObject = { foo : 42, bar : false }
Lookup is written using property access notation, either square brackets, which always works, or dot notation, which only works for identifier keys:
myObject myOtherObject.foo
Any object, including built-in objects such as Array, can be dynamically extended with new properties. For example:
Array.prototype.removeAllObjects = function () { /* ... */ }
Lisp
Lisp was originally conceived as a "LISt Processing" language, and one of its most important data types is the linked list, which can be treated as an association list ("alist").
'(("Sally Smart" . "555-9999") ("John Doe" . "555-1212") ("J. Random Hacker" . "553-1337"))
The syntax (x . y)
is used to indicate a pair. Keys and values need not be the same type within an alist. Lisp and Scheme provide operators such as assoc
to manipulate alists in ways similar to associative arrays.
Because of their linear nature, alists are used for relatively small sets of data. Common Lisp also supports a hash table data type, and for Scheme they are implemented in SRFI 69. Hash tables have greater overhead than alists, but provide much faster access when there are many elements.
It is easy to construct composite abstract data types in Lisp, using the object-oriented programming features in conjunction with lists, arrays, and hash tables.
MUMPS
In MUMPS every array is an associative array. The built-in, language-level, direct support for associative arrays applies to private, process-specific arrays stored in memory called "locals" as well as to the permanent, shared arrays stored on disk which are available concurrently by multiple jobs. The name for globals is preceded by the circumflex "^" to distinquish it from local variable names.
SET ^phonebook("Sally Smart")="555-9999" ;; storing permanent data SET phonebook("John Doe")="555-1212" ;; storing temporary data SET phonebook("J. Random Hacker")="553-1337" ;;storing temporary data MERGE ^phonebook=phonebook ;;copying temporary data into permanent data
To access the value of an element, simply requires using the name with the subscript:
WRITE "Phone Number :",^phonebook("Sally Smart"),!
You can also loop through an associated array as follows:
SET NAME="" FOR S NAME=$ORDER(^phonebook(NAME)) QUIT:NAME="" . WRITE NAME," Phone Number :",^phonebook(NAME),!
OCaml
The OCaml programming language provides three different associative containers. The simplest is a list of pairs:
# let m = ["Sally Smart", "555-9999"; "John Doe", "555-1212"; "J. Random Hacker", "553-1337"];; val m : (string * string) list = [("Sally Smart", "555-9999"); ("John Doe", "555-1212"); ("J. Random Hacker", "553-1337")] # List.assoc "John Doe" m;; - : string = "555-1212"
The second is a polymorphic hash table:
# let m = Hashtbl.create 3;; val m : ('_a, '_b) Hashtbl.t = <abstr> # Hashtbl.add m "Sally Smart" "555-9999"; Hashtbl.add m "John Doe" "555-1212"; Hashtbl.add m "J. Random Hacker" "553-1337";; - : unit = () # Hashtbl.find m "John Doe";; - : string = "555-1212"
Finally, functional maps (represented as immutable balanced binary trees):
# include (Map.Make(String));; ... # let m = add "Sally Smart" "555-9999" empty let m = add "John Doe" "555-1212" m let m = add "J. Random Hacker" "553-1337" m;; val m : string t = <abstr> # find "John Doe" m;; - : string = "555-1212"
Lists of pairs and functional maps both provide a purely functional interface. In contrast, hash tables provide an imperative interface. For many operations, hash tables are significantly faster than lists of pairs and functional maps.
Perl
Perl has built-in, language-level support for associative arrays. Modern Perl vernacular refers to associative arrays as hashes; the term associative array is found in older documentation, but is considered somewhat archaic. Perl hashes are flat: keys are strings and values are scalars. However, values may be references to arrays or other hashes.
A hash variable is marked by a %
sigil, to distinguish it from scalar, array and other data types. A hash can be initialized from a key-value list:
%phone_book = ("Sally Smart", "555-9999", "John Doe", "555-1212", "J. Random Hacker", "553-1337");
Perl offers the =>
syntax, semantically equivalent to the comma, to make the key-value association more visible:
%phone_book = ("Sally Smart" => "555-9999", "John Doe" => "555-1212", "J. Random Hacker" => "553-1337");
Accessing a hash element uses the syntax $hash_name{$key}
- the key is surrounded by curly braces and the hash name is prefixed by a $
, indicating that the hash element itself is a scalar value, even though it is part of a hash. The value of $phone_book{"John Doe"}
is "555-1212"
. The %
sigil is only used when referring to the hash as a whole, such as when asking for keys %phone_book
.
PHP
PHP also has built-in support for Associative Arrays. In fact, this is the major kind of array used in PHP programming. An associative array can be formed in one of two ways:
$phonebook = array () ; $phonebook = '555-9999' ; $phonebook = '555-1212' ; $phonebook = '555-1337' ;
$phonebook = array ( 'Sally Smart' => '555-9999', 'John Doe' => '555-1212', 'J. Random Hacker' => '555-1337' ) ;
Python
In Python, associative arrays are called dictionaries. Dictionary literals are marked with curly braces:
phonebook = {'Sally Smart': '555-9999', 'John Doe': '555-1212', 'J. Random Hacker': '553-1337'}
To access an entry in python simply use the array indexing operator. For example, the expression phonebook
would return '555-9999'
.
Ruby
In Ruby a Hash is used:
phonebook = { 'Sally Smart' => '555-9999', 'John Doe' => '555-1212', 'J. Random Hacker' => '553-1337' }
phonebook
produces '555-1212'
Smalltalk
In Smalltalk a dictionary is used:
phonebook := Dictionary new. phonebook at: 'Sally Smart' put: '555-9999'. phonebook at: 'John Doe' put: '555-1212'. phonebook at: 'J. Random Hacker' put: '553-1337'.
To access an entry the message #at:
is sent to the dictionary object.
phonebook at: 'Sally Smart'
gives
'555-9999'
TCL
In Tcl every array is an associative array.
set "phonebook(Sally Smart)" 555-9999 set "phonebook(John Doe)" 555-1212 set "phonebook(J. Random Hacker)" 553-1337
The first argument of the set
command has to be enclosed by double quotes, as it contains a space. In Tcl, space is used to separate arguments. Alternatively, array setting can be grouped into a single command (keys braced because they contain whitespace):
array set phonebook { {Sally Smart} 555-9999 {John Doe} 555-1212 {J. Random Hacker} 553-1337 }
To access an entry and put it on standard output
puts "$phonebook(Sally Smart)"
The result is here
555-9999
External links
- NIST's Dictionary of Algorithms and Data Structures: Associative Array
- NIST's Dictionary of Algorithms and Data Structures: Association List
- SGI STL page on Associative Containers
- SGI STL page on std::map
- SGI STL page on std::hash_map