Misplaced Pages

User:Thunderbird2/The case against deprecation of IEC prefixes

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
< User:Thunderbird2

This is an old revision of this page, as edited by Dondervogel 2 (talk | contribs) at 12:45, 30 November 2024 (Background information). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 12:45, 30 November 2024 by Dondervogel 2 (talk | contribs) (Background information)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

The case against deprecation

Background information

Multiple-byte units
Decimal
Value Metric
1000 kB kilobyte
1000 MB megabyte
1000 GB gigabyte
1000 TB terabyte
1000 PB petabyte
1000 EB exabyte
1000 ZB zettabyte
1000 YB yottabyte
1000 RB ronnabyte
1000 QB quettabyte
Binary
Value IEC Memory
1024 KiB kibibyte KB kilobyte
1024 MiB mebibyte MB megabyte
1024 GiB gibibyte GB gigabyte
1024 TiB tebibyte TB terabyte
1024 PiB pebibyte
1024 EiB exbibyte
1024 ZiB zebibyte
1024 YiB yobibyte
Orders of magnitude of data
  • In most contexts the SI prefixes kilo-, mega- and giga- mean 1 thousand, 1 million and 1 (short scale) billion, respectively, as in one kilogram = one thousand grams, one megajoule = one million joules and one gigawatt = one billion watts. In symbols: 1 kg = 1,000 g; 1 MJ = 1,000,000 J; 1 GW = 1,000,000,000 W.
  • In computer science the units kilobyte, megabyte and gigabyte (symbols kB, MB and GB) were originally used in this standard decimal sense to mean 1,000 and 1,000,000 and 1,000,000,000 bytes, respectively. In symbols: 1 kB = 1000 B; 1 MB = 1000 B; 1 GB = 1000 B.
  • However, in modern use (and depending on the context), the same three symbols sometimes have a binary meaning. The binary definitions of these three symbols are 1 KB = 1024 B; 1 MB = 1024 B; 1 GB = 1024 B. In this context it is customary to use an upper case "K" instead of the SI prefix "k", for kilo.
  • The computer itself does not account for the number of bytes using binary prefixes, but someone in the 1980s decided to report memory, file and HDD size in this manner. As such, the use of binary prefixes is only a convention. Altering this convention to agree with SI Prefixes such as in Apple's 2009 "Snow Leopard" release and Ubuntu could have been done at any time; however, it stuck this way for much of the computer industry.
  • For many applications (primarily the storage capacity of hard disk drives and data rates for telecommunications), the decimal convention is retained, whereby one kilobit is exactly one thousand bits and one megabyte is exactly one million bytes.
  • There are many WP articles in which the same symbol (eg MB) is used with two different meanings, often hopping between them in the same paragraph or section, sometimes even in the same sentence. This dual use creates confusion and a corresponding need to disambiguate.
  • These ambiguous usages are common beyond Misplaced Pages and have led to litigation.
  • Problems get successively worse with higher values prefixes tera- (1000 vs 1024), peta- (1000 vs 1024), etc. The highest value SI prefix for which a binary counterpart has been defined is yotta-, meaning 1000. The corresponding binary prefix yobi- means 1024 (≈1.21×10), which differs by 21 % from the conventional decimal interpretation of yotta-.
  • In December 1998, in an attempt to resolve the ambiguity the International Electrotechnical Commission (IEC) introduced a new set of prefixes kibi-, mebi- and gibi- for the binary meanings, with symbols Ki-, Mi- and Gi- so that 1 KiB (one kibibyte) = 1024 B, 1 MiB (one mebibyte) = 1024 B and 1 GiB (one gibibyte) = 1024 B. In the IEC standard, the prefixes kilo-, mega- etc are reserved for their original decimal meanings.
  • In March 2005, the IEC prefixes were adopted by the Institute of Electrical and Electronics Engineers (IEEE) after a two-year trial period.
  • The use of IEC prefixes has been approved by national and international standards bodies, including, in addition to IEC and IEEE, the International Bureau of Weights and Measures (the standards body responsible for the SI system of units), the European Committee for Electrotechnical Standardization (CENELEC) and the US National Institute of Standards and Technology.
  • The binary prefixes defined by the IEC are now incorporated in the International System of Quantities (ISQ).
  • The alternative (binary use of SI-like prefixes) is deprecated by the same standards bodies.
  • Use of IEC prefixes in popular literature is rare, making them unfamiliar to many readers. Their use in scientific publications increased from fewer than 15 per year on first introduction to about 200 per year in the early 2010s, and about 600 per year in the mid-2020s: 1999-2001 (ca. 40 hits); 2002-2004 (60 hits); 2005-2007 (190 hits); 2008-2010 (380 hits); 2011-2013 (710 hits); 2014-2016 (1050 hits); 2017-2019 (1330 hits); 2020-2022 (1510 hits); 2023-2025 (1280 hits to date).

Why Misplaced Pages should not deprecate the use of IEC prefixes

  1. IEC prefixes are unambiguous, succinct, simple to use and simple to understand.
  2. The use of IEC prefixes is endorsed by national and international standards bodies.
  3. The use of one symbol (e.g. GB) to mean two different things in the same article creates confusion and ambiguity. Despite this ambiguity, there are many WP articles in which kilobyte, megabyte and/or gigabyte are used in this way. In this situation, the IEC prefixes provide an ideal disambiguation tool because they are unambiguous and succinct.
  4. Deprecation (of IEC prefixes) increases the difficulty threshold for disambiguation, reducing the rate at which articles can be disambiguated by expert editors.
  5. In turn this reduces the total number of articles that can be further improved by less expert editors with footnotes etc (assuming that there is consensus to do so).
  6. Deprecation is interpreted by some editors as a justification for changing unambiguous units into ambiguous ones.
  7. Removing IEC prefixes from articles, even when disambiguated with footnotes, destroys a part of the information that was there before, because it requires an expert to work out which footnote corresponds to which use in the article.
  8. In the long term, the use of IEC prefixes would ultimately avoid the need to use same symbol (e.g., MB) with two different meanings. This may sound like a pipe dream, but it could be implemented as a user preference, so that readers could choose between familiar (ambiguous) units and (unfamiliar) unambiguous ones.
  9. The main argument for not using IEC prefixes is the unfamiliarity of, for example, the mebibyte (MiB) compared with the megabyte (MB). The unfamiliarity is not disputed, but is not relevant to disambiguation. The point is that disambiguation is rare and therefore all disambiguation methods are unfamiliar.
  10. Alternative disambiguation methods are either cumbersome (i.e., exact numbers of bytes), difficult and time-consuming to implement in a manner that is clear to the reader (i.e., footnotes) or unlikely to be understood (i.e. exponentiation).
In conclusion, disambiguation is not easy, so it would be unwise to discard the simplest disambiguation tool at our disposal just because it is unfamiliar to some readers. The best disambiguation method has yet to be established, so it is premature to deprecate this one.

See also

Footnotes

  1. MB even has a third meaning, equal to 1000 KiB or 1,024,000 B
  2. Snow Leopard changes how file and drive sizes are calculated
  3. According to the LBA Count for IDE Hard Disk Drives Standard from the website of the International Disk Drive Equipment and Materials Association (IDEMA), there are 1,000,194,048 bytes (1,953,504 logical blocks x 512 bytes/logical block) per nominal gigabyte of hard drive storage.
  4. This problem is illustrated by Address space layout randomization, which includes the confusing disambiguation footnote "Transistorized memory, such as RAM and cache sizes (other than solid state disk devices such as USB drives, CompactFlash cards, and so on) as well as CD-based storage size are specified using binary meanings for K (1024), M (1024), G (1024), ..."
User:Thunderbird2/The case against deprecation of IEC prefixes Add topic