5 Things You Can Do with Excel + Python

Excel is a powerful data management and analysis tool. It is perfect for quick reporting and calculations, but Python helps when work gets messy, repetitive, or too large. Python opens up automation, advanced analysis, and integration possibilities that go beyond Excel’s built-in features. Python libraries like pandas for data manipulation and openpyxl for direct Excel file handling make this seamless.

In this tutorial, we will show five things you can do with Excel + Python.

Consider sample sales data to explore five things you can do with Excel and Python.

1. Clean and Standardize Messy Excel Data (Repeatably)

It is common to have messy data in Excel, as real-world data is rarely clean. Often, data contains extra spaces, mixed capitalization, numbers stored as text, inconsistent formatting, missing values, duplicate entries, or data that needs restructuring before analysis. This issue ruins formulas and analyses.

Python excels at data-cleaning tasks. You can write scripts that standardize data formats across different files, fill in missing values using intelligent methods, remove duplicates, split or combine columns based on patterns, and validate data against business rules. These steps may require hours of manual find-and-replace operations in Excel. Using Python, you can create a repeatable script that processes thousands of rows in seconds.

Suppose you received messy sales data. Let’s use a Python script that reads the messy data, cleans and standardizes columns, and adds calculated fields.

Revenue = Units * UnitPrice
NetRevenue = Revenue * (1 – DiscountPct)

import pandas as pd

file_path = "SalesData.xlsx"

df = pd.read_excel(file_path, sheet_name="Sales Data")

# Clean types
df["Units"] = pd.to_numeric(df["Units"], errors="coerce").fillna(0).astype(int)
df["UnitPrice"] = pd.to_numeric(df["UnitPrice"], errors="coerce").fillna(0.0)
df["DiscountPct"] = pd.to_numeric(df["DiscountPct"], errors="coerce").fillna(0.0)
df["Returned"] = (
    df["Returned"].astype(str).str.strip().str.lower()
    .map({"yes": True, "no": False})
    .fillna(False)
)

# Add calculated fields
df["Revenue"] = df["Units"] * df["UnitPrice"]
df["NetRevenue"] = df["Revenue"] * (1 - df["DiscountPct"])

# Write back into the SAME file as a NEW sheet
with pd.ExcelWriter(file_path, engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
    df.to_excel(writer, sheet_name="CleanData", index=False)

print("Saved CleanData sheet inside:", file_path)

You will get a new sheet with a clean dataset that won’t break pivots, charts, or lookups. In Excel, you can keep using pivots/charts from cleaned data, knowing it is consistent every run.

2. Create Summaries Automatically (Repeatable Reports)

Excel has row limits and can slow down with complex calculations. Python’s pandas library handles large datasets efficiently and performs calculations much faster.

With pandas, you can work with datasets containing millions of records, perform complex grouping and aggregation operations, and execute statistical analyses that might be impractical in Excel. Python can generate pivot-style summaries and export them to Excel. Suppose you want a quick summary by Region and Category, but you don’t want to rebuild pivots every time.

import pandas as pd

file_path = "SalesData.xlsx"
clean_sheet = "CleanData"
out_sheet = "Summary"

df = pd.read_excel(file_path, sheet_name=clean_sheet)

summary = (
    df.groupby(["Region", "Category"], as_index=False)
      .agg(
          Orders=("OrderID", "count"),
          Units=("Units", "sum"),
          NetRevenue=("NetRevenue", "sum"),
          Returns=("Returned", "sum")
      )
)

with pd.ExcelWriter(file_path, engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
    summary.to_excel(writer, sheet_name=out_sheet, index=False)

print(f"✅ Saved '{out_sheet}' sheet inside: {file_path}")

You will get a revenue-by-region summary—a ready-to-share pivot-style sheet that updates whenever you rerun the script.

3. Generate Charts from Excel Data (Without Manual Formatting)

Charts are often the most time-consuming part of reporting. Excel offers standard charts, but Python’s visualization libraries like Matplotlib, Seaborn, and Plotly provide far more flexibility and sophistication. You can generate custom visualizations that automatically update when your data changes, create interactive dashboards users can explore, or produce publication-quality graphics with precise control over every element.

Let’s visualize performance by region (NetRevenue by Region).

import pandas as pd
from openpyxl import load_workbook
from openpyxl.chart import BarChart, Reference

file_path = "SalesData.xlsx"
source_sheet = "CleanData"
output_sheet = "RegionChart"   # data + chart in this one sheet

# Prepare chart data (NetRevenue by Region)
df = pd.read_excel(file_path, sheet_name=source_sheet)

chart_data = (
    df.groupby("Region", as_index=False)["NetRevenue"]
      .sum()
      .sort_values("NetRevenue", ascending=False)
)

# Write chart data into output_sheet (same workbook)
with pd.ExcelWriter(file_path, engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
    chart_data.to_excel(writer, sheet_name=output_sheet, index=False)

# Add the native Excel chart on the same sheet
wb = load_workbook(file_path)
ws = wb[output_sheet]

chart = BarChart()
chart.title = "Net Revenue by Region"
chart.y_axis.title = "Net Revenue"
chart.x_axis.title = "Region"

values = Reference(ws, min_col=2, min_row=1, max_row=ws.max_row)
labels = Reference(ws, min_col=1, min_row=2, max_row=ws.max_row)

chart.add_data(values, titles_from_data=True)
chart.set_categories(labels)

ws.add_chart(chart, "D2")   # place chart to the right of the data table

wb.save(file_path)
print(f"✅ Chart Created: {output_sheet}")

Now you have a summary of revenue by region along with a bar chart.

4. Merge Many Excel Files into One Master Table

It is common to merge weekly, monthly, or quarterly data from different sources. Merging Excel files from different people or teams is slow and error-prone. Python can combine them in seconds and track the source file.

Let’s merge weekly files into a folder named Weekly Reports/ (all with the same columns).

import pandas as pd
from pathlib import Path

base_folder = Path(__file__).resolve().parent
folder = base_folder / "Weekly Reports"

files = sorted(folder.glob("*.xlsx"))
files = [f for f in files if not f.name.startswith("~$")]  # ignore Excel lock files

print("Looking in:", folder)
print("Files found:", [f.name for f in files])

frames = []
for f in files:
    temp = pd.read_excel(f)
    temp["SourceFile"] = f.name
    frames.append(temp)

master = pd.concat(frames, ignore_index=True)
master.to_excel(base_folder / "master_report.xlsx", index=False)

print("Saved: master_report.xlsx")

You will get one consolidated table with a SourceFile column for auditing. Every week, you just need to run the script.

5. Predict Something Excel Can’t Do Easily (Machine Learning Example)

You can estimate return risk using patterns (discount, category, units, price), then write probabilities back so Excel users can filter and sort. Python can perform such machine learning operations easily.

Ours is a small dataset; still, it shows the workflow.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
 
file_path = "SalesData.xlsx"
df = pd.read_excel(file_path, sheet_name="CleanData")
 
X = df[["Region", "SalesRep", "Category", "Units", "UnitPrice", "DiscountPct"]]
y = df["Returned"].astype(int)
 
cat_cols = ["Region", "SalesRep", "Category"]
num_cols = ["Units", "UnitPrice", "DiscountPct"]
 
preprocess = ColumnTransformer(
    transformers=[
        ("cat", OneHotEncoder(handle_unknown="ignore"), cat_cols),
        ("num", "passthrough", num_cols),
    ]
)
 
model = Pipeline(steps=[
    ("prep", preprocess),
    ("clf", LogisticRegression(max_iter=1000))
])
 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
model.fit(X_train, y_train)
 
# Predict probability of return for all rows
df["ReturnProb"] = model.predict_proba(X)[:, 1]
 
with pd.ExcelWriter(file_path, engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
    df.to_excel(writer, sheet_name="WithReturnRisk", index=False)

In Excel, explore WithReturnRisk and filter ReturnProb from high to low to see which orders look risky.

Python in Excel (If Available in Your Excel)

If Excel on your computer has Python (Preview), you can run Python directly in a cell and return the result to the sheet. (See Microsoft’s overview of Python in Excel.) Here’s one simple example that reads a small range, cleans text, calculates Revenue, and returns a clean table.

In Excel, enter your dataset
Click an empty cell
Go to the Formulas tab >> select Insert Python
Paste the Python script

import pandas as pd

# Read the Excel range A1:J21 (including headers)
df = xl("A1:J21", headers=True)
 
# Clean text columns
for col in ["Region", "SalesRep"]:
    df[col] = df[col].astype(str).str.strip().str.title()
 
# Fix data types
df["OrderDate"] = pd.to_datetime(df["OrderDate"], errors="coerce")
df["Units"] = pd.to_numeric(df["Units"], errors="coerce").fillna(0).astype(int)
df["UnitPrice"] = pd.to_numeric(df["UnitPrice"], errors="coerce").fillna(0.0)
 
# Add a calculated column
df["Revenue"] = df["Units"] * df["UnitPrice"]
 
df

It will return the DataFrame, which is Python’s table object. Excel shows it as a table preview (and a card).

Now “spill” the output into cells as a clean table.

Click on Insert Data from DataFrame >> select Show DataType Card to preview the table

Select arrayPreview to bring the table into Excel
You now have standardized text and a new Revenue column

Conclusion

This article shows five things you can do with Excel + Python. Excel becomes more powerful with Python; it becomes easier to clean messy datasets, generate pivot-style summaries, automate charts, merge many Excel files, and add simple machine learning insights. Combining Excel and Python streamlines workflows, from data import/export to automation and visualization. Start with small scripts, and experiment with more libraries.

Get FREE Advanced Excel Exercises with Solutions!