Introduction to Assembly Language

What Is Assembly?

‘Assembly’, also called ‘Assembler’, is a low-level programming language that is specific to a particular computer architecture or microprocessor.

Every architecture has its’ own version of Assembly language, but there is a large amount of crossover between different Assembly languages. Knowing one may not make you skilled in another, but it will dramatically reduce the learning curve.

In addition, x86 and x64 (also called x86-64) assembly are very similar, while ARM assembly is more unique in comparison.

Assembly is essentially the lowest level of programming that is practical for anyone to use. Underlying assembly is the machine code; the binary 0’s and 1’s that computers use to perform digital arithmetic and logic.

While most programming today is done in high-level languages like Java or Python, assembly allows programmers to write instructions that are very close to the machine code instructions executed by the computer.

Each assembly language instruction corresponds to a single machine language instruction, which comes with certain advantages and disadvantages when programming directly in an assembly language. A program written in assembly can be extremely well-designed and efficient, but it also takes a long time – far too long for many modern applications. However, wherever assembly is used, it is often essential.

Furthermore, knowledge of assembly is an excellent skill to acquire for anyone working with or interested in computers. When we learn assembly, we also learn how processors and computers work at a deep level.

Where is Assembly Language Used?

While most programming today is done in higher level languages, lots of people code directly in assembly.

Assembly language is often used in situations where performance is critical, such as in embedded systems, device drivers, real-time systems, and certain high-performance applications where every cycle of the CPU matters.

Programming in assembly language requires and develops a deep understanding of the underlying hardware and can be challenging compared to higher-level programming languages, but it offers a level of control and efficiency that is essential for certain applications.

In addition to programming directly in assembly, there are other two common situations where assembly is used. First, assembly is sometimes an intermediate step in the compilation process. For example, GCC (the Gnu Compiler Collection) first compiles C code to assembly and then to machine code. The other situation is in reverse engineering, which we will learn about in the next section.

‘Everything is Open Source if You Can Read Assembly’

Assembly language is a human-readable form of machine code. However, its important to remember that many compilers don’t generate assembly directly, as this would be an unnecessary step.

Despite this, any program can be disassembled into assembly by interpreting the machine code directly. This process is called disassembly, is performed by a disassembler, and is frequently used for reverse engineering.

In other words, ‘everything is open source if you can read assembly‘.

Key Features of Assembly

Here are some of the primary features of assembly language. This is just an overview; we will see each of these topics in much greater detail in later lessons.

  1. Instructions: Assembly language instructions correspond directly to the binary code understood by a computer’s central processing unit (CPU). Each assembly instruction represents a specific operation the CPU can perform, such as addition, subtraction, or data movement.
  2. Mnemonic Codes: Assembly language uses mnemonic codes, which are human-readable abbreviations for instructions. For example, “MOV” represents a move instruction, while “ADD” represents an addition instruction. Programmers write code using these mnemonics, making it easier to understand than writing in pure binary.
  3. Memory Access: Assembly provides instructions for accessing specific memory locations. Programs can read from and write to different parts of the computer’s memory, allowing for data storage and retrieval from memory.
    • The Stack: The stack is a region of a computer’s memory that operates in a Last-In, First-Out (LIFO) manner. This means that the last item placed onto the stack is the first item to be removed.
    • The Heap: The heap is a region of a computer’s memory that is used for dynamic memory allocation. Unlike the stack, which operates in a Last-In, First-Out (LIFO) manner and is used for function calls and local variables, the heap is a more flexible area of memory where data can be allocated and deallocated in a random order.
  4. Registers: Registers are small storage locations within the CPU that are used to store data temporarily during program execution. Assembly instructions can efficiently perform operations on data stored in registers, and registers are much faster to use than memory. Assembly language allows direct manipulation of the CPU’s registers.

How Much Assembly Do I Need to Know?

The answer to this question really depends on what you want to do. The goal of this course is to get you to the point that you can read, write, and understand the majority of Assembly code, and have the ability to look up anything else.

In other words, we are trying to build a strong enough foundation that you can further develop on your own. This actually doesn’t require learning that many instructions. For example, if you learn just MOV, PUSH, POP, CALL, and ADD, you will already be able to understand almost 60% of assembly code!