file command (which is usually installed by default) is used to determine the type of a given file.
Interestingly the result is almost always correct, just imagine the number of different file types (we have currently 2280 registred MIME types).
The Windows file type detection technique is very naive because it's mostly based on the file extension and you can easily break it.
You can find the complete manual of the
file command here: man file.
file command uses multiples technique to determine the file type.
The idea of the
magic bytes is to provide the type information (and sometimes version) in the first few bytes of the file. It is usually used for binary files, but can also be used for text files (since characters are also bytes).
For example, the WebAssembly binary format (wasm) has the following definitions:
magic ::= 0x00 0x61 0x73 0x6D version ::= 0x01 0x00 0x00 0x00 module ::= magic version ...
moduleis the binary file.
As you can see the first 4 bytes represents the file type (
\0asm for short) and the 4 next bytes the version (
Each file type has to be registred and has an unique sequence of
magic bytes to avoid any conflicts.
Text files can not use this kind of API, the
file command uses a collection of RegExp instead.
For example, to detect an HTML file the following RegExp are used:
\<head\> \<title\> \<a\ href=
In the case of
shell, the interpreter is usually indicated on top of the file (example
#!/usr/bin/python). It is also used for file detection.