“Hello, World!” in Assembly
Building a “Hello, World!” program is the traditional way to get started in any programming language. However, in assembly it turns out that it’s a bit more difficult than in higher-level programming languages.
The reason for this isn’t just assembly code itself, which isn’t as easy to read as in higher-level languages. It’s also because in order to read and understand assembly, we need to understand what is actually going on in terms of the processor and the operating system itself.
If, for example, we wanted to fully understand how “Hello, World” in Python works at a low level, it would prove just as difficult.
The hard part is that assembly exposes the way that programs are executed by the processor, while in higher-level languages, these things are fully abstracted away.
In this article, we will cover everything needed to write and understand a “Hello, World!” program in assembly.
We won’t cover the details of every related topic, as that would require far too long of an article. However, every related topic is covered in-depth on our website, and links are provided in the text.
Introduction to Hello, World! in Assembly
Let’s start by looking at a basic “Hello, World!” program in assembly.
The following program is written for x86-64 assembly on Linux, but it provides a great starting point for other architectures and operating systems as well:
global _start
section .data
message db "Hello, World!"
section .text
_start:
mov rax, 1
mov rdi, 1
mov rsi, message
mov rdx, 13
syscall
mov rax, 60
mov rdi, 0
syscall
Overview of the Hello, World! Program in Assembly
Let’s look at this “Hello, World!” program at a high level, line by line.
Line 1: Line one is ‘global _start’. This is a directive that tells processor to begin code execution at ‘_start’, on line 7.
Line 3: On line three we find ‘section .data’. This is the data section, which contains variables.
There is one variable in the data section:
message db "Hello, World!"
Line 4: This line declares a variable called ‘message’. “db” stands for ‘define byte’, and is used to allocate memory and define the initial value of one or more bytes. It’s commonly used to define constants, strings, and other data elements.
Line 6: Line six is “section .text”, which declares start of the text section. The text section is where the executable code is located.
Line 7: The first line of the text section (line 7 of the program) is ‘_start’. Remember the ‘global _start’ directive on line 1 of the program? That line tells the processor to look for ‘_start’, and program execution begins at this point.
Lines 8 – 16: The rest of the code is divided into two parts, each of which performs something called a system call (syscall). The first syscall outputs the message variable to the standard output (the console) using a write syscall:
mov rax, 1
mov rdi, 1
mov rsi, message
mov rdx, 13
syscall
The second terminates the program using an exit syscall:
mov rax, 60
mov rdi, 0
syscall
Review of System Calls
Let’s examine system calls a bit more closely so that we can understand what is happening in the code above.
First, we can see that both system calls consist of a series of instructions – specifically, a series of mov instructions followed by a syscall instruction. Further, each mov instruction involves a register (rax, rdi, rsi, rdx).
All together, we can see that there are a number of topics that we need to understand in order to write a “Hello, World!” program in assembly:
- Registers: Registers are small memory locations that are very close to the processor and are therefore extremely fast. The size of a processors’ registers is determined by the architecture. For example, a 32-bit processor has 32-bit registers, while a 64-bit processor has 64-bit registers.
- Instructions: An assembly instruction consists of an operation mnemonic and (zero or more) operands.
- The mov instruction: Copies data from one location into another. ‘MOV rax, 1’ moves the value ‘1’ into the rax register.
- System calls: Calling functions provided by the operating system. Syscalls make it easy to perform common tasks like writing to the console, or exiting the program.
- Write syscall: Used to write output, most commonly to the standard output (console).
- Exit syscall: Used to exit the program.