Writing Pseudocode for Data Analysis , A Beginner's Guide
A beginner's guide

Writing pseudocode for data analysis

Pseudocode guide cover

Most beginners open their editor, stare at a blank file, and then spend two hours writing code that doesn't do what they thought it would. There's a faster way of doing it.
A beginner tries solving a data task by jumping straight into coding, only to realize later they solved the wrong problem. The issue isn’t coding skill, but a lack of thinking through the logic first. Using pseudocode helps clarify the approach before writing code and avoids this mistake.

So what actually is pseudocode?

Pseudocode is a plain-English description of what you want your code to do, written in a structured way half sentence, half recipe. It's not actual code, so it doesn't need to be perfect. Think of it like sketching a floor plan before building a house.

Why bother?

πŸ’‘

Clarify your thinking

Spot gaps in your logic before writing any code.

✏️

Easy to change

Revising a few sentences beats rewriting 50 lines of Python.

πŸ’¬

Easier to share

A colleague can review your plan without knowing Python.

πŸ“„

Becomes comments

Your pseudocode lines turn directly into code comments.

How to actually write it, in 5 steps.

1
State what the procedure receives as input and returns as output.e.g. "takes a spreadsheet of sales, returns monthly totals"
2
Write each major action on its own numbered line, in the order it happens.Use plain verbs: get, compute, filter, group, display, return
3
Indent steps that happen inside a loop or condition , like a bullet point outline.Everything indented under "for each row…" happens once per row
4
Name your variables in plain English , no cryptic letters.Use "monthly total" not "mt" or "x"
5
Stop before it becomes code , avoid Python syntax, brackets, colons, etc.If it looks like code, you've gone too far

The do's and don'ts

βœ“
Use everyday English verbs"filter", "compute", "group by", "add to list"
βœ“
Include enough detail that someone can trace through the logicA reader should be able to follow it step by step
βœ“
Keep it brief , cut anything that doesn't add meaningShorter is better, as long as nothing important is missing
βœ—
Don't use Python syntaxNo colons, brackets, def, import, print(), etc.
βœ—
Don't use cryptic variable names"x", "tmp", "d2" , use "discount amount", "cleaned dataset"
βœ—
Don't over-explain standard operationsYou don't need to describe how a sum works , just say "compute the average"

How detailed should it be?

There's no single right answer , aim for the level of detail where someone else could follow your plan without guessing. A useful test: could a colleague read this and check whether it will produce the right result? If yes, you're done. If they'd have to make assumptions, add a bit more.

For most data analysis tasks, 5–10 numbered lines is plenty.

Example 1 , computing a discount price

too code-like
def disc(p, r):
  d = p * r
  fp = p - d
  return fp
good pseudocode
PROCEDURE compute_discount(price, rate):
1. discount = price Γ— rate
2. final price = price βˆ’ discount
3. return final price

Example 2 , finding rows above a threshold

good pseudocode
PROCEDURE flag_high_sales(sales data, threshold):
1. Create an empty list called "flagged rows"
2. For each row in sales data:
   2.1 If the "revenue" value is greater than threshold:
       2.1.1 Add that row to "flagged rows"
3. Return "flagged rows"

Example 3 , computing a column average

good pseudocode
PROCEDURE average_column(data, column name):
1. Extract all values from the column called "column name"
2. Remove any missing or blank values
3. Compute the sum of the remaining values
4. Divide sum by the count of remaining values
5. Return the result

Example 4 , monthly sales report from a CSV

Scenario: You have a CSV with columns: date, product, region, units sold, unit price. Your manager wants total revenue per region per month, with any months where revenue dropped more than 20% from the previous month flagged for review.
good pseudocode
PROCEDURE monthly_revenue_report(csv file):

1. Load the CSV file into a table called "sales data"
2. Remove any rows where "units sold" or "unit price" is missing
3. Add a new column "revenue" = units sold Γ— unit price
4. Add a new column "month" by extracting year and month from "date"

5. Group "sales data" by region and month
   5.1 For each group, compute total revenue β†’ call it "monthly revenue"

6. For each region:
   6.1 Sort that region's rows by month (oldest first)
   6.2 For each month after the first:
       6.2.1 Compute the % change from the previous month's revenue
       6.2.2 If the change is less than βˆ’20%, mark row as "needs review"

7. Return the summary table, sorted by region then month
Steps 1–2 handle loading and cleaning first , a separate, explicit phase before any computation.
Steps 3–4 derive new columns before grouping, so the logic is easy to follow in sequence.
Indentation at step 6.2 makes clear that the flag check happens inside the loop over months , not once at the end.
No Python syntax. A non-coder can read this and catch a logic error before any code is written.

Quick reference template

# Use this skeleton for any data analysis task

PROCEDURE name(inputs):
1. First action
2. Second action
3. For each item in a list:
   3.1 Do something with that item
   3.2 If some condition is true:
       3.2.1 Do something else
4. Return result

# Rules: plain English Β· indent loops Β· name things clearly Β· no code syntax
Pseudocode guide for data analysis Β· Based on CPSC 320 pseudocode best practices