How to Code in Python: A Step-by-Step Guide for Beginners

2026-06-05·SaaS Setup

The 3,000 PDF Problem

My first real Python project wasn't a calculator or a to-do list. It was 3,000 PDF invoices sitting in a folder, each one named something useless like `INV_20240311_88472.pdf`, and my boss wanted the totals extracted and put into a spreadsheet by Friday.

I had never written a line of Python. But I had 72 hours and a strong desire to not spend them copy-pasting numbers from PDFs.

Here's exactly what I did, what worked, what didn't, and what I'd skip if I had to do it again.

Hour 1: Getting Python to Run

Downloaded Python from python.org. Version 3.11 at the time. The installer has a checkbox that says "Add Python to PATH" and I almost skipped it. Don't skip it. That checkbox is the difference between Python working and you spending 45 minutes googling "python not recognized as command."

Opened a terminal. Typed `python --version`. Got `Python 3.11.4`. Small victory but it felt huge.

Then I opened Notepad, wrote `print(f"hi {input('your name: ')}")` and saved it as `test.py`. Ran it. It asked my name and printed it back. I'd written a program. Useless, sure, but it ran.

The Real Work: Reading PDFs

Googled "python read pdf" and found about 15 different libraries. PyPDF2, pdfplumber, reportlab, pikepdf, you name it. I had no idea which one was good. Picked pdfplumber because the examples looked simplest. Installed it with `pip install pdfplumber`.

First try failed with a permissions error. Turns out on Mac you sometimes need `pip3` instead of `pip`. Also turns out I should've used a virtual environment but I didn't know that yet. If you're reading this before you install anything: `python -m venv myproject` first. Saves you a headache later, trust me.

The script that actually read a PDF:

```python

import pdfplumber

with pdfplumber.open("INV_20240311_88472.pdf") as pdf:

page = pdf.pages[0]

text = page.extract_text()

print(text)

```

It printed the invoice text to the terminal. Messy, but the numbers were there. The invoice total was always on a line starting with "Grand Total:" so I added a few lines to grab just that number.

The Loop That Changed Everything

This is where Python's real power hit me. Instead of opening files one by one:

```python

import os

import pdfplumber

folder = "/Users/me/Desktop/invoices"

for filename in os.listdir(folder):

if not filename.endswith('.pdf'):

continue

filepath = os.path.join(folder, filename)

with pdfplumber.open(filepath) as pdf:

page = pdf.pages[0]

text = page.extract_text()

for line in text.split('\n'):

if 'Grand Total:' in line:

amount = line.split(':')[-1].strip()

print(f"{filename}: {amount}")

```

Fifteen lines. It looped through every PDF in the folder and pulled the total from each one. Running it took about 30 seconds for all 3,000 files. I sat there watching the terminal fill with numbers and honestly felt like some kind of wizard. A very tired wizard who didn't understand virtual environments, but still.

Where It Went Wrong (And What I Actually Learned)

About 200 invoices in, the script crashed. `IndexError: list index out of range`. One of the PDFs had a blank first page. The page existed but had nothing to extract.

This is where error handling stops being a textbook concept and becomes real. I added:

```python

try:

page = pdf.pages[0]

text = page.extract_text()

except Exception as e:

print(f"skipping {filename}: {e}")

continue

```

Crude approach. A bare `except Exception` catches everything including keyboard interrupts, which experienced devs will tell you is bad practice. But I didn't know better and honestly it got the job done. The script finished, I had my numbers, and I dumped them into a CSV by adding 5 more lines. Opened it in Excel. Report ready by Thursday afternoon. Boss thought I'd worked through the night.

What I'd Recommend If You're Starting Today

Looking back, I stumbled through this because I didn't know a few basic things that would've saved me hours. Learn these in roughly this order:

Variables and types. Strings, ints, floats, bools. And the fact that `input()` always returns a string, so you need `int()` or `float()` to do math with user input. This trips up literally everyone at first.

Lists and dictionaries. Once you can loop through a list and look up values in a dict, you can solve maybe 70% of real-world data problems. List comprehensions in particular. `[x*2 for x in numbers if x > 0]` replaces 4 lines of for-loop with one readable line.

The `with` statement. It automatically closes files after you're done with them. You will forget to close files otherwise. I still do sometimes, which is exactly why `with` exists.

`try/except` but catching specific exceptions. Not the lazy `except Exception` I used. Catch `FileNotFoundError` when files might be missing, `ValueError` when converting strings to numbers, `KeyError` when a dictionary key might not exist. The error message literally tells you which exception type to catch.

`f-strings`. The `f"total: ${amount:.2f}"` syntax. Cleaner than concatenation and you'll use it every 5 minutes once you start.

Functions and scope. The LEGB rule (Local, Enclosing, Global, Built-in) sounds academic but it determines whether your variable is visible inside a function. Ignore it and you'll spend hours wondering why your function can't see a variable that's right there.

Honestly the rest you can google when you need it. Classes, decorators, generators, context managers you write yourself. Those matter later. For your first real project, loops and conditionals and basic file I/O will carry you surprisingly far.

One more thing about debugging that I wish I'd known earlier. When Python prints a traceback, it shows you the call stack from top to bottom, but you should read it from the bottom up. The last line is the actual error. The lines above it show you how the program got there. Most of the time, the last two lines are all you need. The rest is just noise while you're learning.

Also, `print()` is a perfectly valid debugging tool. Senior developers use it constantly. Don't let anyone tell you that you need to learn a proper debugger before you can fix bugs. Print the variable, see what it contains, figure out why it's wrong. That pattern solves maybe 80% of bugs and you can learn it in 10 seconds.

I never did learn pdfplumber properly. Still use it maybe once a month. But I learned the loop pattern and the try/except pattern and those I use every single day. You learn Python by needing it, not by reading about it. Pick something annoying in your life and make Python deal with it. That's the whole trick.