ChatGPT解决这个技术问题 Extra ChatGPT

What's the difference between a word and byte?

I've done some research. A byte is 8 bits and a word is the smallest unit that can be addressed on memory. The exact length of a word varies. What I don't understand is what's the point of having a byte? Why not say 8 bits?

I asked a prof this question and he said most machines these days are byte-addressable, but what would that make a word?

It is best to avoid the term "word" because of its ambiguity. Or make it precise by saying 16-bit word, 32-bit word, ...
Is it advantageous to have a word be larger or smaller?
@quest4knoledge a larger word allows for larger pointers (a.k.a more RAM), and allows for bigger numbers to be processed quickly. It also may allows for some operations like memset to be faster, by working in larger blocks. However, processors with a larger word require more transistors in the processor and may consume a bit more energy.
@VoidStar and a larger word would mean smaller address space, or am I confused?
To answer the question "what is the point of having a byte" - it's history. CPU's did not start out being able to handle anything bigger than a "byte" (earlier processors handled only nybbles (4 bits) but the term never really caught on). The first cpu of any note was the Intel 8086/8088. It was designed to deal with instructions built around "bytes", this is also why we still refer to memory in terms of xBytes e.g. GigaBytes because the basic unit of addressable memory was the byte. 'K is a reference to KiloBytes of which the first PC's had 16, expandable to 64 - woo hoo!

D
DarkDust

Byte: Today, a byte is almost always 8 bit. However, that wasn't always the case and there's no "standard" or something that dictates this. Since 8 bits is a convenient number to work with it became the de facto standard.

Word: The natural size with which a processor is handling data (the register size). The most common word sizes encountered today are 8, 16, 32 and 64 bits, but other sizes are possible. For examples, there were a few 36 bit machines, or even 12 bit machines.

The byte is the smallest addressable unit for a CPU. If you want to set/clear single bits, you first need to fetch the corresponding byte from memory, mess with the bits and then write the byte back to memory.

By contrast, one definition for word is the biggest chunk of bits with which a processor can do processing (like addition and subtraction) at a time – typically the width of an integer register. That definition is a bit fuzzy, as some processors might have different register sizes for different tasks (integer vs. floating point processing for example) or are able to access fractions of a register. The word size is the maximum register size that the majority of operations work with.

There are also a few processors which have a different pointer size: for example, the 8086 is a 16-bit processor which means its registers are 16 bit wide. But its pointers (addresses) are 20 bit wide and were calculated by combining two 16 bit registers in a certain way.

In some manuals and APIs, the term "word" may be "stuck" on a former legacy size and might differ from what's the actual, current word size of a processor when the platform evolved to support larger register sizes. For example, the Intel and AMD x86 manuals still use "word" to mean 16 bits with DWORD (double-word, 32 bit) and QWORD (quad-word, 64 bit) as larger sizes. This is then reflected in some APIs, like Microsoft's WinAPI.


Excellent answer. I'd only quibble with "[t]he word by contrast is biggest chunk of bits with which a processor can do processing ... at a time". It is in fact the most-common chunk of bits etc. Lots of architectures that have evolved over time have a word size that isn't their widest, but they are often limited in what they can do with their widest values.
For extra credit, a "nibble" is a common term for half a byte. It arose during the early microcomputer CPU era (e.g., the Intel 8080), and was always understood to be 4 bits, because by then the byte had settled down to 8 bits.
Today a 8-bit byte is a standard; see IEC 80000-13:2008.
x86 (as usual) makes things complicated: In Intel terminology, a word is 16 bits, even on modern x86 CPUs where the default operand size is 32 bits (dword), and the integer register width is 64 bits (qword). And xmm registers are 128-bits wide (movdqa move double-quad). The memory bus is at least 64 bits wide (and transfers in bursts of 64 bytes = a cache line), and execution-unit to cache paths are at least 128 bits wide, or 256 or even 512 bits wide. Whatever the native machine-word size of modern x86 is, it's not 16 bits, but modern x86 still uses 8086 terminology.
@Crystina: Yes, more specifically: it's usually the size of the general purpose registers (registers for floating point may have a different size, for example).
S
Stephen C

What I don't understand is what's the point of having a byte? Why not say 8 bits?

Apart from the technical point that a byte isn't necessarily 8 bits, the reasons for having a term is simple human nature:

economy of effort (aka laziness) - it is easier to say "byte" rather than "eight bits"

tribalism - groups of people like to use jargon / a private language to set them apart from others.

Just go with the flow. You are not going to change 50+ years of accumulated IT terminology and cultural baggage by complaining about it.

FWIW - the correct term to use when you mean "8 bits independent of the hardware architecture" is "octet".


i thought the octet was just the french translation of the byte, thank you ;)
V
Vaibhav Patle

BYTE

I am trying to answer this question from C++ perspective.

The C++ standard defines ‘byte’ as “Addressable unit of data large enough to hold any member of the basic character set of the execution environment.”

What this means is that the byte consists of at least enough adjacent bits to accommodate the basic character set for the implementation. That is, the number of possible values must equal or exceed the number of distinct characters. In the United States, the basic character sets are usually the ASCII and EBCDIC sets, each of which can be accommodated by 8 bits. Hence it is guaranteed that a byte will have at least 8 bits.

In other words, a byte is the amount of memory required to store a single character.

If you want to verify ‘number of bits’ in your C++ implementation, check the file ‘limits.h’. It should have an entry like below.

#define CHAR_BIT      8         /* number of bits in a char */

WORD

A Word is defined as specific number of bits which can be processed together (i.e. in one attempt) by the machine/system. Alternatively, we can say that Word defines the amount of data that can be transferred between CPU and RAM in a single operation.

The hardware registers in a computer machine are word sized. The Word size also defines the largest possible memory address (each memory address points to a byte sized memory).

Note – In C++ programs, the memory addresses points to a byte of memory and not to a word.


c
cnicutar

Why not say 8 bits?

Because not all machines have 8-bit bytes. Since you tagged this C, look up CHAR_BIT in limits.h.


V
VoidStar

A word is the size of the registers in the processor. This means processor instructions like, add, mul, etc are on word-sized inputs.

But most modern architectures have memory that is addressable in 8-bit chunks, so it is convenient to use the word "byte".


So in a sense the term "byte" is just used for convenience?
Yes, "byte" was especially convenient when the term was invented. Like many conventions, once they set in they persist. I'm not sure if byte-based terminology really makes computers any easier to understand in the big picture anymore, but it's the dominant convention and isn't like to change any time soon.
Byte is the term used for a unit that was used as a character in text. Historically there were byte with sizes from 6 to 9 bits.
@starblue how is it possible that a char takes up less room than a word?
@ quest4knoledge: because memory is stored in smaller chunks that words. A word is 32bits (or 64bits on newer machines). In an algorithm that processes individual chars 1-by-1, they DO take up a whole word only when inside the CPU, and when placed back in RAM, they are packed more tightly.
j
johnfound

It seems all the answers assume high level languages and mainly C/C++.

But the question is tagged "assembly" and in all assemblers I know (for 8bit, 16bit, 32bit and 64bit CPUs), the definitions are much more clear:

byte  = 8 bits 
word  = 2 bytes
dword = 4 bytes = 2Words (dword means "double word")
qword = 8 bytes = 2Dwords = 4Words ("quadruple word")

Nope, these sizes are only valid on a 16-bit machine. You're probably used to Windows programming which still uses these macros as it's a legacy from its 16-bit days and MS hasn't bothered to correct this.
BTW, because the size of a word (and really even a byte) can vary, ISO-C has the int<X>_t and uint<X>_t types (plus more) which should be used if you want a variable/parameter of a specific bit size.
@DarkDust we are talking about assembly language here. C standards are not relevant. BTW, I am programming assembly from 1980 and the same names was in use. (well, maybe except qword)
However, I did find an exception: in GNU as, the .word may be 32 bits (for example for Sparc).
Sorry, AS is not an assembler. It is an ugly, cripple, miserable, mutant, created with the only goal to be a back end for the HLL compilers.
C
Community

In this context, a word is the unit that a machine uses when working with memory. For example, on a 32 bit machine, the word is 32 bits long and on a 64 bit is 64 bits long. The word size determines the address space.

In programming (C/C++), the word is typically represented by the int_ptr type, which has the same length as a pointer, this way abstracting these details.

Some APIs might confuse you though, such as Win32 API, because it has types such as WORD (16 bits) and DWORD (32 bits). The reason is that the API was initially targeting 16 bit machines, then was ported to 32 bit machines, then to 64 bit machines. To store a pointer, you can use INT_PTR. More details here and here.


B
BeeOnRope

The exact length of a word varies. What I don't understand is what's the point of having a byte? Why not say 8 bits?

Even though the length of a word varies, on all modern machines and even all older architectures that I'm familiar with, the word size is still a multiple of the byte size. So there is no particular downside to using "byte" over "8 bits" in relation to the variable word size.

Beyond that, here are some reasons to use byte (or octet1) over "8 bits":

Larger units are just convenient to avoid very large or very small numbers: you might as well ask "why say 3 nanoseconds when you could say 0.000000003 seconds" or "why say 1 kilogram when you could say 1,000 grams", etc. Beyond the convenience, the unit of a byte is somehow as fundamental as 1 bit since many operations typically work not at the byte level, but at the byte level: addressing memory, allocating dynamic storage, reading from a file or socket, etc. Even if you were to adopt "8 bit" as a type of unit, so you could say "two 8-bits" instead of "two bytes", it would be often be very confusing to have your new unit start with a number. For example, if someone said "one-hundred 8-bits" it could easily be interpreted as 108 bits, rather than 100 bits.

1 Although I'll consider a byte to be 8 bits for this answer, this isn't universally true: on older machines a byte may have a different size (such as 6 bits. Octet always means 8 bits, regardless of the machine (so this term is often used in defining network protocols). In modern usage, byte is overwhelmingly used as synonymous with 8 bits.


B
Brendan

Whatever the terminology present in datasheets and compilers, a 'Byte' is eight bits. Let's not try to confuse enquirers and generalities with the more obscure exceptions, particularly as the word 'Byte' comes from the expression "By Eight". I've worked in the semiconductor/electronics industry for over thirty years and not once known 'Byte' used to express anything more than eight bits.


Unusual yes(we know that. An example is, The texas instruments c54x Google texas instruments c54x byte. ti.com/lit/ug/spru393/spru393.pdf "The ’C55x instructions are variable byte lengths ranging in size from 8 bits to 48 bits." stackoverflow.com/questions/2098149/…
It doesn't come from there at all. The term was actually coined by W. Buchholtz at IBM in the late 1950's. Source: bobbemer.com/BYTE.HTM. According to Bob Bemer, the spelling "byte" was chosen in preference to "bite" to avoid confusion (with "bit") due to typos. He would know. He was there!
(Only 30 years? You are a mere whipper-snapper. I learned to program on systems where the natural "byte" size was not 8 bits :-) )
L
LiLi

Reference:https://www.os-book.com/OS9/slide-dir/PPT-dir/ch1.ppt

The basic unit of computer storage is the bit. A bit can contain one of two values, 0 and 1. All other storage in a computer is based on collections of bits. Given enough bits, it is amazing how many things a computer can represent: numbers, letters, images, movies, sounds, documents, and programs, to name a few. A byte is 8 bits, and on most computers it is the smallest convenient chunk of storage. For example, most computers don’t have an instruction to move a bit but do have one to move a byte. A less common term is word, which is a given computer architecture’s native unit of data. A word is made up of one or more bytes. For example, a computer that has 64-bit registers and 64- bit memory addressing typically has 64-bit (8-byte) words. A computer executes many operations in its native word size rather than a byte at a time. Computer storage, along with most computer throughput, is generally measured and manipulated in bytes and collections of bytes. A kilobyte, or KB, is 1,024 bytes a megabyte, or MB, is 1,024 2 bytes a gigabyte, or GB, is 1,024 3 bytes a terabyte, or TB, is 1,024 4 bytes a petabyte, or PB, is 1,024 5 bytes Computer manufacturers often round off these numbers and say that a megabyte is 1 million bytes and a gigabyte is 1 billion bytes. Networking measurements are an exception to this general rule; they are given in bits (because networks move data a bit at a time)


t
tolitius

A group of 8 bits is called a byte ( with the exception where it is not :) for certain architectures )

A word is a fixed sized group of bits that are handled as a unit by the instruction set and/or hardware of the processor. That means the size of a general purpose register ( which is generally more than a byte ) is a word

In the C, a word is most often called an integer => int


A group of 8 bits is called an octet.
correct: The term octet was defined to explicitly denote a sequence of 8 bits because of the ambiguity associated with the term byte. But I like the sound of byte better :)
@tolitius: +1 for "But I like the sound of byte better": I strongly suspect you're not alone in this and safe for a few niche systems, the "confusion" of a byte possibly being a size other than 8-bit is no longer relevant these days.
K
K-ballo

If a machine is byte-addressable and a word is the smallest unit that can be addressed on memory then I guess a word would be a byte!


Yep. The minimum addressable unit of memory on TMS320C54xx (one of Texas Instruments' DSPs) is 16-bit long, which is also the smallest size of its general-purpose registers. And the TI C compiler defines char=short=int=16 bits on it.
No, most RISC machines have 32-bit words, but can address single bytes. On MIPS for example, word definitely means 32 bits, but there's an lb (load byte) instruction which loads 8 bits.
C
Chris Calley

The terms of BYTE and WORD are relative to the size of the processor that is being referred to. The most common processors are/were 8 bit, 16 bit, 32 bit or 64 bit. These are the WORD lengths of the processor. Actually half of a WORD is a BYTE, whatever the numerical length is. Ready for this, half of a BYTE is a NIBBLE.


No, in CPUs with 32-bit words and 8-bit bytes (e.g. MIPS or ARM), half a word is 2 bytes.
B
Brian Knoblauch

In fact, in common usage, word has become synonymous with 16 bits, much like byte has with 8 bits. Can get a little confusing since the "word size" on a 32-bit CPU is 32-bits, but when talking about a word of data, one would mean 16-bits. Microcontrollers with a 32-bit word size have taken to calling their instructions "longs" (supposedly to try and avoid the word/doubleword confusion).


That's entirely dependent on the CPU type. As you point out, on 32-bit non-IA32 machines, a "word" is typically 32 bites.
@RossPatterson That's entirely dependent on whether you're developing software or eating dinner.
ARM / MIPS / other mainstream RISC architectures have 32-bit words. It's the register width (on the 32-bit version of those ISAs) and the instruction width. 16 bits is a half-word, thus ARM instructions like ldrh to load 16 bits and zero-extend it into a 32-bit register. Or ldrsh to load and sign-extend 16 bits.