Misplaced Pages

X86-64: Difference between revisions

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 17:09, 8 July 2008 editFnagaton (talk | contribs)3,957 edits WP:MOSNUM Says "The IEC prefixes are not to be used on Misplaced Pages..."← Previous edit Revision as of 21:02, 9 July 2008 edit undoHeadbomb (talk | contribs)Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors454,938 editsm fix exponentsNext edit →
(4 intermediate revisions by the same user not shown)
Line 30: Line 30:
*'''Additional XMM (SSE) registers:''' Similarly, the number of 128-bit XMM<!-- don't confuse this name with MMX; MMX has no hardware registers and is mapped to the FPU stack. Here we talk about SSE (XMM) registers --> registers (used for ] instructions) is also increased from 8 to 16. *'''Additional XMM (SSE) registers:''' Similarly, the number of 128-bit XMM<!-- don't confuse this name with MMX; MMX has no hardware registers and is mapped to the FPU stack. Here we talk about SSE (XMM) registers --> registers (used for ] instructions) is also increased from 8 to 16.


*'''Larger virtual address space:''' Current processor models implementing the AMD64 architecture can address up to 256&nbsp;]s of virtual address space (2<sup>48</sup> bytes). This limit can be raised in future implementations to 16&nbsp;]s (2<sup>64</sup> bytes). This is compared to just 4&nbsp;]s for 32-bit x86. This means that very large files can be operated on by mapping the entire file into the process' address space (which is generally faster than working with file read/write calls), rather than having to map regions of the file into and out of the address space. *'''Larger virtual address space:''' Current processor models implementing the AMD64 architecture can address up to 256&nbsp;]<ref>Physical memory is specified using ] for K (1024<sup>1</sup> instead of 1000<sup>1</sup>), M (1024<sup>2</sup> instead of 1000<sup>2</sup>), G (1024<sup>3</sup> instead of 1000<sup>3</sup>), ... </ref> of virtual address space. This limit can be raised in future implementations to 16&nbsp;]. This is compared to just 4&nbsp;] for 32-bit x86. This means that very large files can be operated on by mapping the entire file into the process' address space (which is generally faster than working with file read/write calls), rather than having to map regions of the file into and out of the address space.


*'''Larger physical address space:''' Current implementations of the AMD64 architecture can address up to 1&nbsp;] of RAM (2<sup>40</sup>&nbsp;bytes); the architecture permits extending this to 4&nbsp;]s (2<sup>52</sup>&nbsp;bytes) in the future (limited by the page table entry format). In ], ] (PAE) is supported, as it is on most current 32-bit x86 processors, allowing access to a maximum of 64&nbsp;gigabytes. *'''Larger physical address space:''' Current implementations of the AMD64 architecture can address up to 1&nbsp;TB of RAM; the architecture permits extending this to 4&nbsp;] in the future (limited by the page table entry format). In ], ] (PAE) is supported, as it is on most current 32-bit x86 processors, allowing access to a maximum of 64&nbsp;GB.


*'''Instruction pointer relative data access:''' Instructions can now reference data relative to the instruction pointer (RIP register). This makes ], as is often used in shared libraries and code loaded at run time, more efficient. *'''Instruction pointer relative data access:''' Instructions can now reference data relative to the instruction pointer (RIP register). This makes ], as is often used in shared libraries and code loaded at run time, more efficient.
Line 38: Line 38:
*'''SSE instructions:''' The original AMD64 architecture adopted Intel's ] and ] as core instructions. ] instructions were added in April 2005. SSE2 replaces the ] instruction set's ], with the choice of either IEEE 32-bit or 64-bit floating-point mathematics. This provides floating-point operations compatible with many other modern CPUs. The SSE and SSE2 instructions have also been extended to support the eight new XMM registers. SSE and SSE2 are available in 32-bit mode in modern x86 processors; however, if they're used in 32-bit programs, those programs will only work on systems with processors that support them. This is not an issue in 64-bit programs, as all processors that support AMD64 support SSE and SSE2, so using SSE and SSE2 instructions instead of x87 instructions does not reduce the set of machines on which the programs will run. Since SSE and SSE2 are generally faster than, and duplicate most of the features of, the traditional x87 instructions, ], and ], the latter are redundant under AMD64. <!-- are there enough examples to say that they are considered deprecated features in "some" operating systems? Most? As for "why this absurd insistence x87 FPU is still allowed under AMD64?", it's because the AMD doc says it is. This section is about the chip, not the way operating systems use it. --> *'''SSE instructions:''' The original AMD64 architecture adopted Intel's ] and ] as core instructions. ] instructions were added in April 2005. SSE2 replaces the ] instruction set's ], with the choice of either IEEE 32-bit or 64-bit floating-point mathematics. This provides floating-point operations compatible with many other modern CPUs. The SSE and SSE2 instructions have also been extended to support the eight new XMM registers. SSE and SSE2 are available in 32-bit mode in modern x86 processors; however, if they're used in 32-bit programs, those programs will only work on systems with processors that support them. This is not an issue in 64-bit programs, as all processors that support AMD64 support SSE and SSE2, so using SSE and SSE2 instructions instead of x87 instructions does not reduce the set of machines on which the programs will run. Since SSE and SSE2 are generally faster than, and duplicate most of the features of, the traditional x87 instructions, ], and ], the latter are redundant under AMD64. <!-- are there enough examples to say that they are considered deprecated features in "some" operating systems? Most? As for "why this absurd insistence x87 FPU is still allowed under AMD64?", it's because the AMD doc says it is. This section is about the chip, not the way operating systems use it. -->


*''']:''' The “NX” bit (bit 63 of the page table entry) allows the operating system to specify which pages of virtual address space can contain executable code and which cannot. An attempt to execute code from a page tagged "no execute" will result in a memory access violation, similar to an attempt to write to a read-only page. This should make it more difficult for malicious code to take control of the system via "]" or "unchecked buffer" attacks. A similar feature has been available on x86 processors since the ] as an attribute of segment descriptors; however, this works only on an entire segment at a time. Segmented addressing has long been considered an obsolete mode of operation, and all current PC operating systems in effect bypass it, setting all segments to a base address of 0 and a size of 4&nbsp;]. AMD was the first x86-family vendor to support no-execute in linear addressing mode. The feature is also available in legacy mode on AMD64 processors, and recent Intel x86 processors, when PAE is used. *''']:''' The “NX” bit (bit 63 of the page table entry) allows the operating system to specify which pages of virtual address space can contain executable code and which cannot. An attempt to execute code from a page tagged "no execute" will result in a memory access violation, similar to an attempt to write to a read-only page. This should make it more difficult for malicious code to take control of the system via "]" or "unchecked buffer" attacks. A similar feature has been available on x86 processors since the ] as an attribute of segment descriptors; however, this works only on an entire segment at a time. Segmented addressing has long been considered an obsolete mode of operation, and all current PC operating systems in effect bypass it, setting all segments to a base address of 0 and a size of 4&nbsp;GB. AMD was the first x86-family vendor to support no-execute in linear addressing mode. The feature is also available in legacy mode on AMD64 processors, and recent Intel x86 processors, when PAE is used.


*'''Removal of older features:''' A number of "system programming" features of the x86 architecture are not used in modern operating systems and are not available on AMD64 in long (64-bit and compatibility) mode. These include segmented addressing (although the FS and GS segments were retained in vestigial form for compatibility with Windows code)<ref>{{cite web|url=http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf|title=AMD64 Architecture Programmer’s Manual Volume 2: System Programming|accessdate=2007-08-30|format=pdf|pages=p. 70}}</ref>, the task state switch mechanism, and Virtual-8086 mode. These features do of course remain fully implemented in "legacy mode," thus permitting these processors to run 32-bit and 16-bit operating systems without modification. *'''Removal of older features:''' A number of "system programming" features of the x86 architecture are not used in modern operating systems and are not available on AMD64 in long (64-bit and compatibility) mode. These include segmented addressing (although the FS and GS segments were retained in vestigial form for compatibility with Windows code)<ref>{{cite web|url=http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf|title=AMD64 Architecture Programmer’s Manual Volume 2: System Programming|accessdate=2007-08-30|format=pdf|pages=p. 70}}</ref>, the task state switch mechanism, and Virtual-8086 mode. These features do of course remain fully implemented in "legacy mode," thus permitting these processors to run 32-bit and 16-bit operating systems without modification.


===Virtual address space details=== ===Virtual address space details===
Although virtual addresses are 64&nbsp;bits wide in 64-bit mode, current implementations (and any chips known to be in the planning stages) do not allow the entire virtual address space of 2<sup>64</sup>&nbsp;bytes (16&nbsp;exabytes, or about 18&times;10<sup>18</sup>&nbsp;bytes) to be used. Most operating systems and applications will not need such a large address space for the foreseeable future (for example, Windows implementations for AMD64 are only populating 16&nbsp;terabytes, or 44&nbsp;bits' worth), so supporting such wide virtual addresses would simply increase the complexity and cost of address translation with no real benefit. AMD therefore decided that, in the first implementations of the architecture, only the least significant 48&nbsp;bits of a virtual address would actually be used in address translation (page table lookup). However, bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to ]), or the processor will raise an exception. Addresses complying with this rule are referred to as "canonical form." Canonical form addresses run from 0 through 00007FFF`FFFFFFFF, and from FFFF8000`00000000 through FFFFFFFF`FFFFFFFF, for a total of 2<sup>48</sup>&nbsp;bytes or 256&nbsp;terabytes of usable virtual address space. Although virtual addresses are 64&nbsp;bits wide in 64-bit mode, current implementations (and any chips known to be in the planning stages) do not allow the entire virtual address space of 16&nbsp;EB to be used. Most operating systems and applications will not need such a large address space for the foreseeable future (for example, Windows implementations for AMD64 are only populating 16&nbsp;TB, or 44&nbsp;]' worth), so supporting such wide virtual addresses would simply increase the complexity and cost of address translation with no real benefit. AMD therefore decided that, in the first implementations of the architecture, only the least significant 48&nbsp;bits of a virtual address would actually be used in address translation (page table lookup). However, bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to ]), or the processor will raise an exception. Addresses complying with this rule are referred to as "canonical form." Canonical form addresses run from 0 through 00007FFF`FFFFFFFF, and from FFFF8000`00000000 through FFFFFFFF`FFFFFFFF, for a total of 256&nbsp;TB of usable virtual address space.


This "quirk" allows an important feature for later scalability to true 64-bit addressing: This "quirk" allows an important feature for later scalability to true 64-bit addressing:
Line 60: Line 60:
|} |}


The 64-bit addressing mode ("]") is a superset of ]s (PAE); because of this, ] sizes may be either 4&nbsp;KB (2<sup>12</sup>&nbsp;bytes), 2&nbsp;MB (2<sup>21</sup>&nbsp;bytes), or 1&nbsp;GB (2<sup>30</sup>&nbsp;bytes). However, rather than the three-level ] system used by systems in PAE mode, systems running in ] use four levels of page table: PAE's ''Page-Directory Pointer Table'' is extended from 4 entries to 512, and an additional ''Page-Map Level 4 Table'' is added, containing 512 entries in 48-bit implementations. In implementations supporting larger virtual addresses, this latter table would either grow to accommodate sufficient entries to describe the entire address range, up to a theoretical maximum of 33,554,432 entries for a 64-bit implementation, or be over ranked by a new mapping level, such as a PML5. Either way, a full mapping hierarchy of 4&nbsp;KB pages for the whole 48-bit space would take a bit more than 512&nbsp;GB of RAM (about 0.196% of the 256&nbsp;TB virtual space). The 64-bit addressing mode ("]") is a superset of ]s (PAE); because of this, ] sizes may be either 4&nbsp;], 2&nbsp;], or 1&nbsp;GB. However, rather than the three-level ] system used by systems in PAE mode, systems running in ] use four levels of page table: PAE's ''Page-Directory Pointer Table'' is extended from 4 entries to 512, and an additional ''Page-Map Level 4 Table'' is added, containing 512 entries in 48-bit implementations. In implementations supporting larger virtual addresses, this latter table would either grow to accommodate sufficient entries to describe the entire address range, up to a theoretical maximum of 33,554,432 entries for a 64-bit implementation, or be over ranked by a new mapping level, such as a PML5. A full mapping hierarchy of 4&nbsp;KB pages for the whole 48-bit space would take a bit more than 512&nbsp;GB of RAM (about 0.196% of the 256&nbsp;TB virtual space).


===Operating modes=== ===Operating modes===
Line 208: Line 208:
*Early Intel CPUs with Intel 64 also lack the ] (No Execute bit) of the AMD64 architecture. The NX bit marks memory pages as non-executable, allowing protection against many types of malicious code. *Early Intel CPUs with Intel 64 also lack the ] (No Execute bit) of the AMD64 architecture. The NX bit marks memory pages as non-executable, allowing protection against many types of malicious code.


*Original AMD64 implementations allowed access only to 2<sup>40</sup>&nbsp;bytes (1 TB) of physical memory, however, recent AMD64 implementations now provide 2<sup>48</sup>&nbsp;bytes (256 TB) of physical address space (with planned expansion to 2<sup>52</sup>&nbsp;bytes (4 PB)). *Original AMD64 implementations allowed access only to 1&nbsp;TB of physical memory, however, recent AMD64 implementations now provide 256&nbsp;TB of physical address space (with planned expansion to 4&nbsp;PB).


*Original Intel 64 implementations allowed access only to 2<sup>36</sup>&nbsp;bytes (64 GB) of physical memory, however, recent Intel 64 implementations now provide 2<sup>40</sup>&nbsp;bytes (1 TB) of physical address space. *Original Intel 64 implementations allowed access only to 64&nbsp;GB of physical memory, however, recent Intel 64 implementations now provide 1&nbsp;TB of physical address space.


==Operating system support== ==Operating system support==
Line 217: Line 217:
===Windows=== ===Windows===
x64 editions of Microsoft Windows client and server, ] and Windows Server 2003 SP1 x64 Edition, were released in March 2005. Internally they are actually the same build (5.2.3790.3959 SP2), as they share the same source base and operating system binaries, so even system updates are released in unified packages, much in the manner as Windows 2000 Professional and Server editions for x86. ], which also has many different versions, was released in January 2007. Windows for x64 has the following characteristics: x64 editions of Microsoft Windows client and server, ] and Windows Server 2003 SP1 x64 Edition, were released in March 2005. Internally they are actually the same build (5.2.3790.3959 SP2), as they share the same source base and operating system binaries, so even system updates are released in unified packages, much in the manner as Windows 2000 Professional and Server editions for x86. ], which also has many different versions, was released in January 2007. Windows for x64 has the following characteristics:
*8&nbsp;]s (2<sup>43</sup>&nbsp;bytes) of "user mode" virtual memory address space per process. A 64-bit program can use all of this, subject of course to backing store limits on the system. This is a 4096-fold increase over the default 2&nbsp;] user-mode virtual address space offered by 32-bit Windows. *8&nbsp;TB of "user mode" virtual memory address space per process. A 64-bit program can use all of this, subject of course to backing store limits on the system. This is a 4096-fold increase over the default 2&nbsp;GB user-mode virtual address space offered by 32-bit Windows.
*8&nbsp;]s (2<sup>43</sup>&nbsp;bytes) of kernel mode virtual address space for the operating system. Again, this is a 4096-fold increase over 32-bit Windows versions. The increased space is primarily of benefit to the file system cache and kernel mode "heaps" (non-paged pool and paged pool). *8&nbsp;TB of kernel mode virtual address space for the operating system. Again, this is a 4096-fold increase over 32-bit Windows versions. The increased space is primarily of benefit to the file system cache and kernel mode "heaps" (non-paged pool and paged pool).
*Interestingly the total address space is limited to 2<sup>44</sup> bytes due to early AMD64 lacking a CMPXCHG16B instruction.<ref>{{ cite web *Interestingly the total address space is limited to 16&nbsp;TB due to early AMD64 lacking a CMPXCHG16B instruction.<ref>{{ cite web
| title = Behind Windows x64’s 44-bit Virtual Memory Addressing Limit | title = Behind Windows x64’s 44-bit Virtual Memory Addressing Limit
| url = http://www.alex-ionescu.com/?p=50 | url = http://www.alex-ionescu.com/?p=50
Line 226: Line 226:
*] data model: "int" and "long" types are still 32&nbsp;bits wide, while pointers and types derived from pointers are 64&nbsp;bits wide. *] data model: "int" and "long" types are still 32&nbsp;bits wide, while pointers and types derived from pointers are 64&nbsp;bits wide.
*Device drivers must be 64-bit versions; there is no support for running 32-bit kernel-mode executables within the 64-bit operating system. *Device drivers must be 64-bit versions; there is no support for running 32-bit kernel-mode executables within the 64-bit operating system.
*Support for running existing 32-bit applications (.exe's) and dynamic link libraries (.dll's). A 32-bit program, if linked with the "large address aware" option, can use up to 4&nbsp;gigabytes of virtual address space, as compared to the default 2&nbsp;gigabytes (optional 3&nbsp;gigabytes with /3GB boot.ini option and "large address aware" link option) offered by 32-bit Windows. *Support for running existing 32-bit applications (.exe's) and dynamic link libraries (.dll's). A 32-bit program, if linked with the "large address aware" option, can use up to 4&nbsp;GB of virtual address space, as compared to the default 2&nbsp;GB (optional 3&nbsp;GB with /3GB boot.ini option and "large address aware" link option) offered by 32-bit Windows.
*16-bit DOS and Windows (Win16) applications will not run on x64 versions of Windows due to removal of ]. *16-bit DOS and Windows (Win16) applications will not run on x64 versions of Windows due to removal of ].
*Full implementation of the NX (No Execute) page protection feature. This is also implemented on recent 32-bit versions of Windows when they are started in PAE mode. *Full implementation of the NX (No Execute) page protection feature. This is also implemented on recent 32-bit versions of Windows when they are started in PAE mode.
Line 241: Line 241:
] was the first operating system kernel to run the x86-64 architecture in ], starting with the 2.4 version prior to the physical hardware's availability.{{Fact|date=October 2007}} Linux also provides backward compatibility for running 32-bit executables. This permits programs to be recompiled into long mode while retaining the use of 32-bit programs. Several Linux distributions currently ship with x86-64-native kernels and ]. Some, such as ], ] and ] package both 32-bit and 64-bit systems on a single DVD-ROM image to allow automatic selection of the best software during installation. Other distributions, such as ], are available in a version compiled for 32-bit and one compiled for x86-64 architecture. ] was the first operating system kernel to run the x86-64 architecture in ], starting with the 2.4 version prior to the physical hardware's availability.{{Fact|date=October 2007}} Linux also provides backward compatibility for running 32-bit executables. This permits programs to be recompiled into long mode while retaining the use of 32-bit programs. Several Linux distributions currently ship with x86-64-native kernels and ]. Some, such as ], ] and ] package both 32-bit and 64-bit systems on a single DVD-ROM image to allow automatic selection of the best software during installation. Other distributions, such as ], are available in a version compiled for 32-bit and one compiled for x86-64 architecture.


64-bit Linux allows up to 2<sup>47</sup> bytes (128 TB) of address space for individual processes, and can address approximately 2<sup>46</sup> (64 TB) of physical memory, subject to processor and system limitations. 64-bit Linux allows up to 128&nbsp;TB of address space for individual processes, and can address approximately 2<sup>46</sup> (64 TB) of physical memory, subject to processor and system limitations.


===Mac OS X=== ===Mac OS X===

Revision as of 21:02, 9 July 2008

x86-64 is a 64-bit superset of the x86 instruction set architecture. Because the x86-64 instruction set is a superset of the x86 instruction set, all instructions in the x86 instruction set can be executed by central processing units (CPUs) that implement the x86-64 instruction set; therefore those CPUs can natively run programs that run on x86 processors from Intel, Advanced Micro Devices (AMD), and other vendors.

x86-64 was designed by AMD, who have since renamed it AMD64. It has been cloned by Intel under the name Intel 64 (formerly known as EM64T among other names). This leads to the common use of the names x86-64 or x64 as more vendor-neutral terms to collectively refer to the two nearly identical implementations.

x86-64 should not be confused with the Intel Itanium architecture, also known as IA-64, which is not compatible on the native instruction set level with the x86 or x86-64 architecture.

AMD64

AMD64 Logo

The AMD64 instruction set is currently implemented in AMD's Athlon 64, Athlon 64 FX, Athlon 64 X2, Phenom, Athlon X2, Turion 64, Turion 64 X2, Opteron and later Sempron processors.

History of AMD64

AMD64 was created as an alternative to Intel and Hewlett Packard's radically different IA-64 architecture. Originally announced as "x86-64" in August 2000, the architecture was positioned by AMD from the beginning as an evolutionary way to add 64-bit computing capabilities to the existing x86 architecture, as opposed to Intel's approach of creating an entirely new 64-bit architecture with IA-64.

The first AMD64-based processor, the Opteron, was released in April 2003.

Architectural features

The primary defining characteristic of AMD64 is its support for 64-bit general purpose registers, 64-bit integer arithmetic and logical operations, and 64-bit virtual addresses. The designers took the opportunity to make other improvements as well. The most significant changes include:

  • Full support for 64-bit integers: All general-purpose registers (GPRs) are expanded from 32 bits to 64 bits, and all arithmetic and logical operations, memory-to-register and register-to-memory operations, etc., are now directly supported for 64-bit integers. Pushes and pops on the stack are always in eight-byte strides, and pointers are eight bytes wide.
  • Additional registers: In addition to increasing the size of the general-purpose registers, the number of named general-purpose registers is increased from eight (i.e. eax,ebx,ecx,edx,ebp,esp,esi,edi) in x86-32 to 16. It is therefore possible to keep more local variables in registers rather than on the stack, and to let registers hold frequently accessed constants; arguments for small and fast subroutines may also be passed in registers to a greater extent. However, AMD64 still has fewer registers than many common RISC processors (which typically have 32–64 registers) or VLIW-like machines such as the IA-64 (which has 128 registers).
  • Additional XMM (SSE) registers: Similarly, the number of 128-bit XMM registers (used for Streaming SIMD instructions) is also increased from 8 to 16.
  • Larger virtual address space: Current processor models implementing the AMD64 architecture can address up to 256 TB of virtual address space. This limit can be raised in future implementations to 16 EB. This is compared to just 4 GB for 32-bit x86. This means that very large files can be operated on by mapping the entire file into the process' address space (which is generally faster than working with file read/write calls), rather than having to map regions of the file into and out of the address space.
  • Larger physical address space: Current implementations of the AMD64 architecture can address up to 1 TB of RAM; the architecture permits extending this to 4 PB in the future (limited by the page table entry format). In legacy mode, Physical Address Extension (PAE) is supported, as it is on most current 32-bit x86 processors, allowing access to a maximum of 64 GB.
  • Instruction pointer relative data access: Instructions can now reference data relative to the instruction pointer (RIP register). This makes position independent code, as is often used in shared libraries and code loaded at run time, more efficient.
  • SSE instructions: The original AMD64 architecture adopted Intel's SSE and SSE2 as core instructions. SSE3 instructions were added in April 2005. SSE2 replaces the x87 instruction set's IEEE 80-bit precision, with the choice of either IEEE 32-bit or 64-bit floating-point mathematics. This provides floating-point operations compatible with many other modern CPUs. The SSE and SSE2 instructions have also been extended to support the eight new XMM registers. SSE and SSE2 are available in 32-bit mode in modern x86 processors; however, if they're used in 32-bit programs, those programs will only work on systems with processors that support them. This is not an issue in 64-bit programs, as all processors that support AMD64 support SSE and SSE2, so using SSE and SSE2 instructions instead of x87 instructions does not reduce the set of machines on which the programs will run. Since SSE and SSE2 are generally faster than, and duplicate most of the features of, the traditional x87 instructions, MMX, and 3DNow!, the latter are redundant under AMD64.
  • No-Execute bit: The “NX” bit (bit 63 of the page table entry) allows the operating system to specify which pages of virtual address space can contain executable code and which cannot. An attempt to execute code from a page tagged "no execute" will result in a memory access violation, similar to an attempt to write to a read-only page. This should make it more difficult for malicious code to take control of the system via "buffer overrun" or "unchecked buffer" attacks. A similar feature has been available on x86 processors since the 80286 as an attribute of segment descriptors; however, this works only on an entire segment at a time. Segmented addressing has long been considered an obsolete mode of operation, and all current PC operating systems in effect bypass it, setting all segments to a base address of 0 and a size of 4 GB. AMD was the first x86-family vendor to support no-execute in linear addressing mode. The feature is also available in legacy mode on AMD64 processors, and recent Intel x86 processors, when PAE is used.
  • Removal of older features: A number of "system programming" features of the x86 architecture are not used in modern operating systems and are not available on AMD64 in long (64-bit and compatibility) mode. These include segmented addressing (although the FS and GS segments were retained in vestigial form for compatibility with Windows code), the task state switch mechanism, and Virtual-8086 mode. These features do of course remain fully implemented in "legacy mode," thus permitting these processors to run 32-bit and 16-bit operating systems without modification.

Virtual address space details

Although virtual addresses are 64 bits wide in 64-bit mode, current implementations (and any chips known to be in the planning stages) do not allow the entire virtual address space of 16 EB to be used. Most operating systems and applications will not need such a large address space for the foreseeable future (for example, Windows implementations for AMD64 are only populating 16 TB, or 44 bits' worth), so supporting such wide virtual addresses would simply increase the complexity and cost of address translation with no real benefit. AMD therefore decided that, in the first implementations of the architecture, only the least significant 48 bits of a virtual address would actually be used in address translation (page table lookup). However, bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception. Addresses complying with this rule are referred to as "canonical form." Canonical form addresses run from 0 through 00007FFF`FFFFFFFF, and from FFFF8000`00000000 through FFFFFFFF`FFFFFFFF, for a total of 256 TB of usable virtual address space.

This "quirk" allows an important feature for later scalability to true 64-bit addressing: many operating systems (including, but not limited to, the Windows NT family) take the higher-addressed half of the address space (named kernel space) for themselves and leave the lower-addressed half (user space) for application code, user mode stacks, heaps, and other data regions. The "canonical address" design ensures that every AMD64 compliant implementation has, in effect, two memory halves: the lower half starts at 00000000`00000000 and "grows upwards" as more virtual address bits become available, while the higher half is "docked" to the top of the address space and grows downwards. Also, fixing the contents of the unused address bits prevents their use by operating system as flags, privilege markers, etc., which could become problematic when the architecture is indeed extended to 52, 56, 60 and 64 bits.

Current 48-bit implementation

56-bit implementation

Full 64-bit implementation

(not drawn to scale)

The 64-bit addressing mode ("long mode") is a superset of Physical Address Extensions (PAE); because of this, page sizes may be either 4 KB, 2 MB, or 1 GB. However, rather than the three-level page table system used by systems in PAE mode, systems running in long mode use four levels of page table: PAE's Page-Directory Pointer Table is extended from 4 entries to 512, and an additional Page-Map Level 4 Table is added, containing 512 entries in 48-bit implementations. In implementations supporting larger virtual addresses, this latter table would either grow to accommodate sufficient entries to describe the entire address range, up to a theoretical maximum of 33,554,432 entries for a 64-bit implementation, or be over ranked by a new mapping level, such as a PML5. A full mapping hierarchy of 4 KB pages for the whole 48-bit space would take a bit more than 512 GB of RAM (about 0.196% of the 256 TB virtual space).

Operating modes

Operating mode Operating system required Compiled-application rebuild required Default address size Default operand size Register extensions Typical GPR width
Long mode 64-bit mode OS with 64-bit support Yes 64 32 Yes 64
Compatibility mode No 32 32 No 32
16 16 16
Legacy mode Protected mode Legacy 16-bit or 32-bit OS No 32 32 No 32
16 16 16
Virtual 8086 mode 16 16 16
Real mode Legacy 16-bit OS

Operating mode explanation

The architecture has two primary modes of operation:

Long mode
The architecture's intended primary mode of operation; it is a combination of the processor's native 64-bit mode and a combined 32-bit and 16-bit compatibility mode. It is used by 64-bit operating systems. Under a 64-bit operating system, 64-bit, 32-bit and 16-bit (or 80286) protected mode applications may be supported.
Since the basic instruction set is the same, there is no major performance penalty for executing x86 code. This is unlike Intel's IA-64, where differences in the underlying ISA means that running 32-bit code must be done either in emulation of x86, or with a dedicated x86 core, making the process extremely slow and essentially useless for backwards compatibility. However, on AMD64, 32-bit x86 applications may still benefit from a 64-bit recompile, due to the additional registers in 64-bit code, which a high-level compiler can use for optimization.
Legacy mode
The mode used by 16-bit (protected mode or real mode) and 32-bit operating systems. In this mode, the processor acts just like an x86 processor, and only 16-bit or 32-bit code can be executed. 64-bit programs will not run.

Implementations

The following processors implement the AMD64 architecture:

Intel 64

Intel 64 is Intel's implementation of x86-64. It is used in newer versions of Pentium 4, Pentium D, Pentium Extreme Edition, Celeron D, Xeon and Pentium Dual-Core processors, and in all versions of the Core 2 processors.

History of Intel 64

Historically, AMD has developed and produced processors patterned after Intel's original designs, but with x86-64, roles were reversed: Intel found itself in the position of adopting the architecture which AMD had created as an extension to Intel's own x86 processor line.

Intel's project was originally codenamed Yamhill (after the Yamhill River in Oregon's Willamette Valley). After several years of denying its existence, Intel announced at the February 2004 IDF that the project was indeed underway. Intel's chairman at the time, Craig Barrett, admitted that this was one of their worst kept secrets.

Intel's name for this technology has changed several times. The name used at the IDF was CT (presumably for Clackamas Technology, another codename from an Oregon river); within weeks they began referring to it as IA-32e (for IA-32 extensions) and in March 2004 unveiled the "official" name EM64T (Extended Memory 64 Technology). In late 2006 Intel began instead using the name Intel 64 for its implementation, paralleling AMD's use of the name AMD64.

Implementations

Intel 64 was originally implemented on the E revision (Prescott) of Pentium 4 line of microprocessors, which were supported by i915P (Grantsdale) and i925X (Alderwood) chipsets in June 2004. This was largely due to the competitive pressure of AMD's AMD64 technology implemented on Opteron and Athlon 64 lines of microprocessing units, otherwise known as the K8 core, one year earlier in 2003; the technology was largely built compatible to AMD64, and the then announced Windows XP Professional x64 Edition supporting AMD64 technology. Intel's first processor to activate the Intel 64 technology was the multi-socket processor Xeon code-named Nocona. Since the Nocona Xeon itself is directly based on Intel's desktop processor, the Pentium 4, the Pentium 4 also has Intel 64 technology built in, although as with Hyper-Threading, this feature was not initially enabled on the then-new Prescott design, likely because enabling Intel 64 did not coincide with Intel's stance on 64-bit x86 extensions at that particular time. Intel subsequently began selling Intel 64-enabled Pentium 4s using the E0 revision of the Prescott core, being sold on the market as the Pentium 4, model F. However, the revision F core was targeted at workstations. Intel's official launch of Intel 64 (under the name EM64T at that time) in mainstream desktop processors was the N0 Stepping Prescott-2M. The E0 revision also adds eXecute Disable (XD) (Intel's name for the NX bit) support to Intel 64, and has been included in the current Xeon code-named Irwindale. All 9xx, 8xx, 6xx, 5x6, 5x1, 3x6, and 3x1 series CPUs have Intel 64 enabled, as do the Core 2 CPUs, and as will all future Intel CPUs. Intel 64 is also present in the last members of the Celeron D line.

The first Intel mobile processor supporting Intel 64 is the Merom version of the Core 2 processor, which was released on 27 July 2006. None of Intel's earlier notebook CPUs (Core Duo, Pentium M, Celeron M, Mobile Pentium 4) supports Intel 64.

The following processors implement the Intel 64 architecture:

Differences between AMD64 and Intel 64

This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (March 2007) (Learn how and when to remove this message)

There are a few differences between the two instruction sets. Compilers generally produce binaries that are compatible with both (that is, compatible with the subset of X86-64 that is common to both AMD64 and Intel 64), making these differences mainly of interest to compiler developers and to operating system developers.

Recent implementations

  • Intel 64's BSF and BSR instructions act differently when the source is 0 and the operand size is 32 bits. The processor sets the zero flag and leaves the upper 32 bits of the destination undefined.
  • Intel 64 lacks the ability to save and restore a reduced (and thus faster) version of the floating-point state (involving the FXSAVE and FXRSTOR instructions).
  • Intel 64 lacks some model-specific registers that are considered architectural to AMD64. These include SYSCFG, TOP_MEM, and TOP_MEM2.
  • AMD64 require a different microcode update format and control MSRs while Intel 64 supports microcode update as in 32-bit mode.
  • AMD64 originally lacked the MONITOR and MWAIT instructions, used by operating systems to better deal with Intel's Hyper-threading feature and also to enter specific low power states.
  • AMD64 systems allow the use of the AGP aperture as an IOMMU. Operating systems can take advantage of this to let normal PCI devices DMA to memory above 4 GB. Intel 64 systems require the use of bounce buffers, which are slower.
  • Intel 64 only supports SYSCALL and SYSRET in IA-32e mode (not in compatibility mode). SYSENTER and SYSEXIT are supported in both modes.
  • AMD64 lacks support for SYSENTER and SYSEXIT in both sub-modes of long mode.
  • Near branches with the 66H (operand size) prefix behave differently. Intel 64 clears only the top 32 bits, while AMD64 clears the top 48 bits.
  • AMD64 added support for 1GB pages, in the page table system.
  • Intel CPUs based on the Conroe microarchitecture have two major performance bottlenecks, namely they lack support for Intel's "Macrofusion" technology which allows for faster reading from the instruction (L1) cache, and address the additional registers less efficiently than AMD processors, this leads to a significant performance degradation compared to AMD64 when some Core 2 Duo models are in long mode.

Older implementations

  • Early AMD64 processors lacked the CMPXCHG16B instruction, which is an extension of the CMPXCHG8B instruction present on most post-486 processors. Similar to CMPXCHG8B, CMPXCHG16B allows for atomic operations on 128-bit double quadword (or oword) data types. This is useful for parallel algorithms that use compare and swap on data larger than the size of a pointer, common in lock-free and wait-free algorithms. Without CMPXCHG16B one must use workarounds, such as a critical section or alternative lock-free approaches.
  • Early Intel CPUs with Intel 64 lacked LAHF and SAHF instructions supported by AMD64 until introduction of Pentium 4 G1 step in December 2005. LAHF and SAHF are load and store instructions, respectively, for certain status flags. These instructions are used for virtualization and floating-point condition handling.
  • Early Intel CPUs with Intel 64 also lack the NX bit (No Execute bit) of the AMD64 architecture. The NX bit marks memory pages as non-executable, allowing protection against many types of malicious code.
  • Original AMD64 implementations allowed access only to 1 TB of physical memory, however, recent AMD64 implementations now provide 256 TB of physical address space (with planned expansion to 4 PB).
  • Original Intel 64 implementations allowed access only to 64 GB of physical memory, however, recent Intel 64 implementations now provide 1 TB of physical address space.

Operating system support

The following operating systems and releases support the x86-64 architecture in long mode:

Windows

x64 editions of Microsoft Windows client and server, Windows XP Professional x64 Edition and Windows Server 2003 SP1 x64 Edition, were released in March 2005. Internally they are actually the same build (5.2.3790.3959 SP2), as they share the same source base and operating system binaries, so even system updates are released in unified packages, much in the manner as Windows 2000 Professional and Server editions for x86. Windows Vista, which also has many different versions, was released in January 2007. Windows for x64 has the following characteristics:

  • 8 TB of "user mode" virtual memory address space per process. A 64-bit program can use all of this, subject of course to backing store limits on the system. This is a 4096-fold increase over the default 2 GB user-mode virtual address space offered by 32-bit Windows.
  • 8 TB of kernel mode virtual address space for the operating system. Again, this is a 4096-fold increase over 32-bit Windows versions. The increased space is primarily of benefit to the file system cache and kernel mode "heaps" (non-paged pool and paged pool).
  • Interestingly the total address space is limited to 16 TB due to early AMD64 lacking a CMPXCHG16B instruction.
  • Support for up to 128 GB (Windows XP) or 1 TB (Windows Server 2003) of random access memory (RAM).
  • LLP64 data model: "int" and "long" types are still 32 bits wide, while pointers and types derived from pointers are 64 bits wide.
  • Device drivers must be 64-bit versions; there is no support for running 32-bit kernel-mode executables within the 64-bit operating system.
  • Support for running existing 32-bit applications (.exe's) and dynamic link libraries (.dll's). A 32-bit program, if linked with the "large address aware" option, can use up to 4 GB of virtual address space, as compared to the default 2 GB (optional 3 GB with /3GB boot.ini option and "large address aware" link option) offered by 32-bit Windows.
  • 16-bit DOS and Windows (Win16) applications will not run on x64 versions of Windows due to removal of NTVDM.
  • Full implementation of the NX (No Execute) page protection feature. This is also implemented on recent 32-bit versions of Windows when they are started in PAE mode.
  • Instead of FS segment descriptor on x86 versions of the Windows NT family, GS segment descriptor is used to point to two operating system defined structures: Thread Information Block (NT_TIB) in user mode and Processor Control Region (KPCR) in kernel mode. Thus, for example, in user mode GS:0 is the address of the first member of the Thread Information Block. Maintaining this convention made the x64 port easier, but required AMD to retain the function of the FS and GS segments in long mode — even though segmented addressing per se is not really used by any modern operating system.
  • Early reports claimed that the operating system scheduler would not save and restore the x87 FPU machine state across thread context switches. Observed behavior shows that this is not the case: the x87 state is saved and restored, except for kernel-mode-only threads. The most recent documentation available from Microsoft states that the x87/MMX/3DNow! instructions may be used in long mode.
  • Some components like Microsoft Jet Database Engine and Data Access Objects will not be ported to 64-bit architectures such as x86-64 and IA-64.

Linux

See also: List of 64-bit Linux distributions

Linux was the first operating system kernel to run the x86-64 architecture in long mode, starting with the 2.4 version prior to the physical hardware's availability. Linux also provides backward compatibility for running 32-bit executables. This permits programs to be recompiled into long mode while retaining the use of 32-bit programs. Several Linux distributions currently ship with x86-64-native kernels and userlands. Some, such as SUSE, Mandriva and Debian GNU/Linux package both 32-bit and 64-bit systems on a single DVD-ROM image to allow automatic selection of the best software during installation. Other distributions, such as Ubuntu, are available in a version compiled for 32-bit and one compiled for x86-64 architecture.

64-bit Linux allows up to 128 TB of address space for individual processes, and can address approximately 2 (64 TB) of physical memory, subject to processor and system limitations.

Mac OS X

Mac OS X v10.5 supports 64-bit GUI applications using Cocoa, Quartz, OpenGL and X11 on 64-bit Intel-based machines, as well as on 64-bit PowerPC machines. All non-GUI libraries and frameworks also support 64-bit applications on those platforms. The kernel is 32-bit.

Mac OS X v10.4.7 and higher versions of Mac OS X v10.4 support 64-bit command-line tools using the POSIX and math libraries when run on 64-bit Intel-based machines, just as all versions of Mac OS X v10.4 and higher support them on 64-bit PowerPC machines. No other libraries or frameworks support 64-bit applications in Mac OS X v10.4.

BSD

FreeBSD

FreeBSD first added x86-64 support under the name "amd64" as an experimental architecture in 5.1-RELEASE in June 2003. It was included as a standard distribution architecture as of 5.2-RELEASE in January 2004. Since then, FreeBSD has designated it as a Tier 1 platform. The 6.0-RELEASE version cleaned up some quirks with running 32-bit x86 executables under amd64, and most drivers work just as they do on 32-bit x86 architectures. Work is currently being done to integrate more fully the 32-bit x86 application binary interface (ABI), in the same manner as the Linux 32-bit ABI compatibility currently works.

NetBSD

Support for the x86-64 architecture was first committed to the NetBSD source tree on 19 June 2001. As of NetBSD 2.0, released on 9 December 2004, NetBSD/amd64 is a fully integrated and supported port.

OpenBSD

OpenBSD has supported AMD64 since OpenBSD 3.5, released on 1 May 2004. Complete in-tree support for the platform was achieved prior to the hardware's initial release due to AMD's loaning of several machines for the project's hackathon that year. OpenBSD developers have taken to the platform because of its use of the NX bit, which allowed for an easy implementation of the W^X feature.

The code for the AMD64 port of OpenBSD also runs on Intel 64 processors which contains cloned support for the AMD64 extensions, but since Intel left out support for the page table NX bit in early Intel 64 processors, there is no W^X support on those Intel CPUs; later Intel 64 processors added support for the NX bit under the name "XD bit". Symmetric multiprocessing (SMP) is supported on OpenBSD's AMD64 port, starting with release 3.6 on 1 November 2004.

MenuetOS

The 64-bit version of MenuetOS (M64) was released in June 2005. Although MenuetOS was originally written for 32-bit x86 architectures and released under the GPL, the 64-bit version is proprietary. It is distributed as freeware with the source code for some components.

Solaris

Solaris 10 and later releases support the x86-64 architecture. Just as with the SPARC architecture, there is only one operating system image for all 32-bit and 64-bit x86 systems; this is labeled as the "x86/x64" DVD-ROM image.

Default behavior is to boot a 64-bit kernel, allowing both 64-bit and existing or new 32-bit executables to be run. A 32-bit kernel can also be manually selected, in which case only 32-bit executables are supported. The isainfo command can be used to determine if a system is running a 64-bit kernel.

DOS

It is possible to enter long mode under DOS with a DOS extender similar to DOS/4GW. DOS itself is not aware of that and no benefits should be expected unless running DOS in an emulation with an adequate virtualization driver backend, for example: the mass storage interface.

It is also possible to enter long mode without a DOS extender, but you have to return to real mode in order to call BIOS or DOS interrupts.

Industry naming conventions

Since AMD64 and Intel 64 are substantially similar, many software and hardware products use one vendor-neutral term to indicate their support for both implementations. AMD's original designation for this processor architecture, "x86-64", is still sometimes used for this purpose, as is the variant "x86_64". Other companies, such as Microsoft and Sun Microsystems, use "x64" (as a contraction of "x86-64") in marketing material.

Many operating systems and products, especially those that introduced x86-64 support prior to Intel's entry into the market, use the term "AMD64" or "amd64" to refer to support for both AMD64 and Intel 64.

  • BSD systems such as FreeBSD, NetBSD and OpenBSD support both AMD64 and Intel 64 under the architecture name "amd64".
  • Microsoft Windows: x64 versions of Windows use the AMD64 moniker to designate various components which use 64-bit technology for IA-32 processors. For example, the system folder on a Windows x64 Edition installation CD-ROM is named "AMD64", in contrast to "i386" in 32-bit versions.
  • Solaris: The isalist command in Sun's Solaris operating system identifies both AMD64- and Intel 64–based systems as "amd64".

See also

Notes and references

  1. Extending the World's Most Popular Processor Architecture
  2. "AMD Releases x86-64™ Architectural Specification; Enables Market Driven Migration to 64-Bit Computing" (Press release). AMD. August 10, 2000. Retrieved 2007-08-03.
  3. Physical memory is specified using binary meanings for K (1024 instead of 1000), M (1024 instead of 1000), G (1024 instead of 1000), ...
  4. "AMD64 Architecture Programmer's Manual Volume 2: System Programming" (pdf). pp. p. 70. Retrieved 2007-08-30. {{cite web}}: |pages= has extra text (help)
  5. "Craig Barrett confirms 64 bit address extensions for Xeon. And Prescott", from The Inquirer
  6. "A Roundup of 64-Bit Computing", from internetnews.com
  7. "Intel® 64 Architecture". Intel. Retrieved 2007-06-29.
  8. "Behind Windows x64's 44-bit Virtual Memory Addressing Limit".
  9. "Everything You Need To Know To Start Programming 64-Bit Windows Systems". On x64 versions of Windows, the FS register has been replaced by the GS register.
  10. Microsoft Developer Network - General Porting Guidelines (64-bit Windows Programming)
  11. Microsoft Developer Network - Data Access Road Map
  12. Apple - Mac OS X Leopard - Technology - 64 bit
  13. Apple - Mac OS X Xcode 2.4 Release Notes: Compiler Tools
  14. Tutorial for entering protected and long mode from DOS
  15. Kevin Van Vechten (August 9, 2006). "re: Intel XNU bug report". Darwin-dev mailing list. Apple Computer. Retrieved 2006-10-05. The kernel and developer tools have standardized on "x86_64" for the name of the Mach-O architecture {{cite web}}: Check date values in: |date= (help)

External links

Categories:
X86-64: Difference between revisions Add topic