Streamlining Family Finances: Using Beancount and AI to Simplify Amazon Transactions
For several years now, I’ve been using Beancount (https://github.com/beancount/beancount) to manage our family’s finances. Beancount is a double-entry bookkeeping system with a user-friendly interface called Fava, which provides insightful financial flow analyses. This system is particularly useful for tracking price changes over time and verifying the accuracy of bookings, including returns of purchases.
Our typical process involves downloading CSV files from our bank and credit card statements and consolidating them into a database via a JavaScript application. From there, we trigger a secondary process using Python and machine learning to suggest categories for each transaction and format them according to Beancount’s requirements. This approach has served us well.
However, we’ve always faced challenges with transactions from Amazon.de because it’s not immediately apparent what each purchase is for (e.g., household items, clothing, etc.). As frequent Amazon users, we have numerous transactions each month, necessitating a better solution for a consolidated view of our orders. Ideally, we’d like to use this data for AI-assisted account categorization. Nevertheless, obtaining this information in a machine-readable format, such as JSON or CSV, as a regular consumer is not straightforward. Although paid services in the US offer such solutions, they are not applicable in Europe. After extensive research, I discovered that existing methods, including certain Chrome extensions, didn’t meet our expectations. Thus, I sought an alternative solution.
I realized that Amazon sends a summary of each order via email (bestellbestaetigung@amazon.de). Parsing these emails could provide the necessary information.
As a Gmail user, I leveraged Gmail’s powerful API to extract the text from these emails using a Python script. With the assistance of a large language model (LLM), I was then able to generate a JSON file with the following prompt: “You are a parser expert. I’m going to pass you text containing the body of emails sent by Amazon after each order. Extract the order number, order date, the articles ordered, their prices, and the total price of this order. Return this as a JSON. The text will be provided between dollar ($) signs. Here it starts $<text>$.”
The model returned a JSON format for the email, such as:
{
"order_number": "302-3116391-0993119",
"order_date": "Donnerstag, 29 August",
"articles": [
{
"name": "Fahrradgabel Sternmutter Setter Steuersätze Star Nut Einbau Mutter Einstellwerkzeug für 22.2 25.4 28.6mm",
"price": "EUR 13,99"
},
{
"name": "Geschmackspulver MIX PACK mit 8x30g Proben, 8 unglaublich leckere Flavours, nur 8-14 kcal pro Portion, Aroma & angenehme Süße, vielseitig einsetzbar für Lebensmittel & Getränke, FLAVOUR UP",
"price": "EUR 18,65"
},
{
"name": "Gakago Rohrschneider Set 3-50mm mit Entgrater - Vielseitiger Rohrabschneider für alle gängigen Metalle wie Edelstahl, Kupfer, Aluminium, Stahl oder Kunststoff (bspw. PVC & PE) auch als Verbundrohr",
"price": "EUR 24,95"
},
{
"name": "Gakago Rorschneider Ersatzklingen für 5-50mm und 6-70mm (2er Pack) - Für alle gängigen Metalle wie Edelstahl, Kupfer, Aluminium, Stahl oder Kunststoff (bspw. PVC & PE) aus langlebigem Werkzeugstahl",
"price": "EUR 9,95"
}
],
"total_price": "EUR 67,54"
}
Having these orders in such a format is a significant improvement over manually reviewing order history on Amazon’s website. I plan to use this data in another process to categorize each article in the Beancount format.
This is yet another example of how LLMs can be leveraged for everyday tasks. If there is interest, I may refine the code and make it publicly available — let me know if you’d like to see that happen.