DEV Community

Pratham
Pratham

Posted on

Linux File System Hunting: What I Found When I Stopped Running Commands and Started Reading the OS

A web dev student goes on a system investigation — and discovers that Linux's entire brain lives in plain text files.


I've been using Linux terminals for months now — mostly just ls, cd, npm install, running Node.js servers. I knew the basics. I knew it worked.

But I never really looked at it.

This assignment changed that. Instead of running commands to do things, I started reading what the system had to say about itself. I opened files. I traced how one file referenced another. I followed the chain from a browser DNS request all the way to the physical network interface. And what I found was genuinely surprising — Linux isn't hiding anything. Its entire nervous system is right there in plain text, waiting to be read.

Here are the 10 most interesting things I discovered.


Finding 1: /etc/hostname — Your Machine's Identity in One Line

File: /etc/hostname

$ cat /etc/hostname
devbox
Enter fullscreen mode Exit fullscreen mode

That's it. One word. The entire hostname of the machine — the name every other machine on the network uses to identify this server.

Why it exists: When multiple machines talk to each other, they need names. The hostname is this machine's name. The OS reads this file at boot time and sets the system identity.

What's interesting: I always assumed hostnames were configured deep inside some binary configuration. But they're stored in the most minimal file possible. One string, one file.

What blew my mind more: this file is what gets sent as the hostname in HTTP request headers when your Node.js server makes a request to another service. The chain from /etc/hostname → process environment → HTTP header is entirely automatic, invisible, and stored as plain text.

Why it matters for a web developer: In Docker containers, every container has its own /etc/hostname — that's how services in a Docker network identify each other. Understanding that it's just a file makes debugging container networking much less scary.


Finding 2: /etc/hosts — The Original Internet Directory

File: /etc/hosts

$ cat /etc/hosts
127.0.0.1   localhost
127.0.1.1   devbox
::1         localhost ip6-localhost
Enter fullscreen mode Exit fullscreen mode

Before DNS servers existed, this file was the internet. Every computer on ARPAnet had a copy of a central HOSTS.TXT file that mapped every hostname to an IP address. You can still feel that history here.

Why it exists: It provides a local override for DNS resolution. When your system needs to resolve a hostname, it checks /etc/hosts first before ever asking a DNS server.

What I discovered: You can add any entry here, and it will resolve system-wide instantly, without touching any network service. I added:

127.0.0.1   api.local
Enter fullscreen mode Exit fullscreen mode

And immediately, http://api.local:3000 worked in my browser — pointing to my local Node.js server. No DNS propagation wait. No registrar. Instant, local, and powerful.

The security insight: This is also a common malware tactic. Malware injects entries like 0.0.0.0 updates.microsoft.com to block update servers. Or <attacker-ip> paypal.com to redirect users to phishing sites. The file that makes development convenient is also a fundamental attack surface.


Finding 3: /etc/resolv.conf — Where Your DNS Requests Actually Go

File: /etc/resolv.conf

$ cat /etc/resolv.conf
nameserver 127.0.0.53
options edns0 trust-ad
search .
Enter fullscreen mode Exit fullscreen mode

This file determines which DNS server your machine asks when it needs to resolve a domain name.

Why it exists: When your Node.js app calls fetch("https://api.example.com"), the OS needs to resolve api.example.com to an IP address. /etc/resolv.conf tells the system where to send that DNS query.

What I discovered: On modern Ubuntu, this doesn't point to 8.8.8.8 (Google) directly. It points to 127.0.0.53 — which is systemd-resolved, a local caching DNS stub resolver running as a service. The actual upstream servers are configured inside systemd-resolved.

This means DNS resolution involves:

Your app → /etc/resolv.conf → systemd-resolved (127.0.0.53) → Real DNS server
Enter fullscreen mode Exit fullscreen mode

There's an entire caching layer between your code and the internet, and it lives at that 127.0.0.53 loopback address. I had no idea.

The discovery that surprised me: You can see the full chain by checking /run/systemd/resolve/resolv.conf — this is the actual upstream server list that systemd-resolved uses. On my machine: 8.8.8.8 and 8.8.4.4 (Google's DNS). The /etc/resolv.conf is just a pointer to the proxy.


Finding 4: /etc/nsswitch.conf — The Decision Engine for Name Resolution

File: /etc/nsswitch.conf

$ cat /etc/nsswitch.conf
hosts:   files mdns4_minimal [NOTFOUND=return] dns myhostname
Enter fullscreen mode Exit fullscreen mode

This file is the one that actually decides the order in which hostname lookups happen. And it's the answer to a question I always wondered: how does the OS know to check /etc/hosts before asking DNS?

Why it exists: Linux systems have multiple ways to look up information — files, DNS, LDAP, NIS. nsswitch.conf defines the lookup priority for each type of query. The hosts line says: check files (i.e., /etc/hosts) first, then mdns4_minimal (local network discovery), then fall back to dns.

The insight: The entire resolution chain is configurable from this one file. A system administrator could reverse the priority — checking DNS before local files — just by changing the order of words on this line. The Linux networking stack has an entire plugin system for name resolution, and it's all controlled by human-readable text.


Finding 5: /proc/net/route — Your Machine's Routing Table as a File

File: /proc/net/route

$ cat /proc/net/route
Iface    Destination  Gateway   Flags  ...  Mask        Metric
eth0     00000000     0101A8C0  0003        00000000    100
eth0     0001A8C0     00000000  0001        00FFFFFF    100
Enter fullscreen mode Exit fullscreen mode

The values are in hexadecimal little-endian format. After converting:

Destination: 0.0.0.0    → Default route (everything not matched)
Gateway:     192.168.1.1 → My router
Enter fullscreen mode Exit fullscreen mode

Why it exists: The kernel maintains a routing table that decides where to send every outgoing network packet. Every time your Node.js server makes an outbound HTTP request, the kernel checks this table to determine which network interface and which gateway to use.

What blew my mind: This isn't a config file. This is a live window into the kernel's memory. The /proc filesystem doesn't exist on disk at all — it's generated dynamically by the kernel in real time. When you cat /proc/net/route, you're not reading a file. You're querying the kernel for its current routing state.

/proc is a virtual filesystem. The kernel pretends it has files there, but the "files" are kernel data structures made to look like files so you can read them with standard Unix tools.


Finding 6: /proc/[pid]/ — Every Process Exposes Itself as a Folder

Directory: /proc/[pid]/

$ ls /proc/$(pgrep node)
cmdline  cwd  environ  exe  fd  maps  net  net/  ...
Enter fullscreen mode Exit fullscreen mode

Every single running process has its own directory inside /proc named after its Process ID. And inside that directory is a complete picture of everything about that process.

What I found inside:

/proc/<pid>/cmdline   → The exact command that started the process
/proc/<pid>/environ   → ALL environment variables the process can see
/proc/<pid>/cwd       → A symlink to the process's current working directory
/proc/<pid>/exe       → A symlink to the binary being executed
/proc/<pid>/fd/       → Every file descriptor the process has open
/proc/<pid>/maps      → Memory layout: code, stack, heap, shared libraries
/proc/<pid>/net/      → This SPECIFIC process's network view (containers!)
Enter fullscreen mode Exit fullscreen mode

The discovery that changed how I think: I ran my Node.js server and then checked /proc/<pid>/environ. There were all my environment variables — NODE_ENV, PORT, my database URL — completely visible to anyone with enough permissions to read that file.

$ cat /proc/$(pgrep node)/environ | tr '\0' '\n' | grep DB
DATABASE_URL=mongodb://localhost:27017/myapp
Enter fullscreen mode Exit fullscreen mode

This is why process isolation and file permissions are critical. On a shared server, if another process can read your /proc/<pid>/environ, it can steal your database credentials.


Finding 7: /etc/passwd and /etc/shadow — The Split Permission Model

Files: /etc/passwd and /etc/shadow

$ cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
pratham:x:1000:1000:Pratham Bhardwaj:/home/pratham:/bin/bash
Enter fullscreen mode Exit fullscreen mode
$ cat /etc/shadow
pratham:$6$rounds=5000$salt$hashed_password_here:19478:0:99999:7:::
Enter fullscreen mode Exit fullscreen mode

Why they're split: Originally, Linux stored hashed passwords directly in /etc/passwd. The problem: /etc/passwd must be world-readable (the ls -l command itself reads it to display usernames). So passwords — even hashed — were exposed to every user on the system.

The fix was to introduce /etc/shadow, which stores the actual hashes and is readable only by root. /etc/passwd now has x where the hash used to be, meaning "look in shadow."

What I learned: The $6$ prefix in shadow means SHA-512. $1$ means MD5. $2y$ means bcrypt. You can tell exactly which algorithm was used just from the prefix of the hash. This is how modern password systems allow algorithm upgrades — old hashes have the old prefix, new passwords get the new prefix, and the system handles both.

The web dev angle: This exact pattern — separate readable metadata from secret data — is the right way to design any system that handles credentials. The concept of /etc/passwd + /etc/shadow is architecturally similar to why you store the username publicly in a database but the password hash in a separate access-controlled column.


Finding 8: /etc/systemd/system/ — Every Service is a File

Directory: /etc/systemd/system/

$ ls /etc/systemd/system/
multi-user.target.wants/  nginx.service  node-app.service  ...

$ cat /etc/systemd/system/nginx.service
[Unit]
Description=A high performance web server
After=network.target

[Service]
Type=forking
ExecStartPre=/usr/sbin/nginx -t
ExecStart=/usr/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
PrivateTmp=true

[Install]
WantedBy=multi-user.target
Enter fullscreen mode Exit fullscreen mode

Every system service — Nginx, SSH, your Node.js app, the DNS resolver — is defined by a plain-text "unit file" inside /etc/systemd/system/.

What the sections mean:

  • [Unit] — What this service is, what it depends on
  • [Service] — How to start, stop, and reload it
  • [Install] — When in the boot sequence to start it

The discovery: WantedBy=multi-user.target means "start this service when the system reaches multi-user mode" — which is normal operating mode. Linux boot is essentially a chain of targets, and multi-user.target is just another file that lists what should be running.

Why this matters: You can deploy your own Node.js app as a system service by creating a file in this directory. No Docker needed. Just a text file that says "start node server.js on boot, restart it if it crashes, run it as user pratham." The entire deployment config is a 15-line text file.


Finding 9: /var/log/ — The Server's Diary

Directory: /var/log/

$ ls /var/log/
auth.log   kern.log   syslog   nginx/   apt/   ...
Enter fullscreen mode Exit fullscreen mode
$ tail -f /var/log/auth.log
May 10 16:42:01 devbox sshd[1234]: Failed password for pratham from 45.33.32.156 port 48652 ssh2
May 10 16:42:03 devbox sshd[1234]: Failed password for pratham from 45.33.32.156 port 48654 ssh2
May 10 16:42:05 devbox sshd[1234]: Failed password for pratham from 45.33.32.156 port 48656 ssh2
Enter fullscreen mode Exit fullscreen mode

When I ran tail -f /var/log/auth.log, I was watching failed SSH login attempts in real time. My dev server — a basic cloud VM — was being scanned and attacked within minutes of being created.

What the logs revealed:

  • auth.log — every login attempt (successful and failed), sudo usage, SSH activity
  • syslog — general system messages, kernel events, service starts/stops
  • kern.log — kernel-specific messages (hardware, driver events)
  • nginx/access.log — every single HTTP request to the web server

The insight that stuck with me: The failed login attempts weren't targeted attacks. They're automated bots constantly scanning the internet for open SSH ports with default credentials. The internet is hostile by default, and your server is being probed constantly. Logs make this visible.

$ grep "Failed password" /var/log/auth.log | wc -l
847
Enter fullscreen mode Exit fullscreen mode

847 failed password attempts in one day on a fresh server. With password authentication disabled and SSH keys only, all of those attempts are just noise. But without that security config, one of those 847 attempts could have succeeded.


Finding 10: /proc/net/tcp — Open Sockets as a File

File: /proc/net/tcp

$ cat /proc/net/tcp
sl  local_address  rem_address  st  ...
0:  00000000:0016  00000000:00  0APort 22 (SSH) listening
1:  0100007F:0035  00000000:00  0APort 53 (DNS)
2:  00000000:1F40  00000000:00  0APort 8000 (my Node.js)
Enter fullscreen mode Exit fullscreen mode

The addresses are in hex little-endian: 0016 = port 22, 0035 = port 53, 1F40 = port 8000.

Why it exists: This is the kernel's live list of every open TCP socket. What's listening. What's connected. What's in various states (listening = 0A, established = 01).

The insight: When you run ss -tlnp or netstat -tlnp to see open ports, those tools are just reading and formatting /proc/net/tcp. The actual source of truth isn't a network daemon — it's a kernel-virtual file in /proc.

I could write my own network monitoring tool just by reading and parsing /proc/net/tcp. No special privileges. No special libraries. Just reading a file.


The Pattern Behind Everything

After exploring all of this, I noticed one design philosophy that runs through the entire Linux file system:

"Everything is a file."

It's not just a slogan. The routing table is a file. Open processes are directories. Network sockets are files. DNS configuration is a file. Service definitions are files. System identity is a file.

This means every piece of knowledge about the system — its configuration, its current state, its history — is accessible with the same set of tools: cat, grep, tail, find. You don't need special admin dashboards. You don't need to know proprietary configuration formats. You need to know where to look and how to read.

For a web developer, this is deeply practical. When your Node.js app fails in production:

  • Check its logs in /var/log/ or journalctl -u your-service
  • Inspect its environment in /proc/<pid>/environ
  • Verify it's listening on the right port via /proc/net/tcp
  • Confirm DNS resolution is working via /etc/resolv.conf and /run/systemd/resolve/resolv.conf

Linux isn't a black box. It's a system that documents itself, constantly, in plain text — you just have to know where to look.


Wrapping Up

This assignment was supposed to be a file system exploration. It turned into understanding how the entire operating system works.

I used to think the OS was a magic layer under my code. Now I think of it as a collection of text files that a very smart kernel reads and acts on. The more you understand those files, the more control you have over your production environment.

I'm going through all of this as part of the ChaiCode Web Dev Cohort 2026 under Hitesh Chaudhary and Piyush Garg. The moment "Linux knowledge" stopped being about memorizing commands and started being about understanding why these files exist is when it finally became interesting.

Connect with me on LinkedIn or visit PrathamDEV.in.

Happy hunting! 🐧


Written by Pratham Bhardwaj | Web Dev Cohort 2026, ChaiCode

Top comments (0)