OCR-Powered Mobile App for Expense Tracking

To manage receipts and expenses efficient is a often challenge, particularly when it comes to digitizing physical receipts. In this project, I developed a mobile application that scans paper receipts, processes them using Optical Character Recognition (OCR), and extracts relevant data points such as the total sum, store name, and date of purchase. The result is a streamlined workflow for tracking expenses with minimal manual input.

Technology Stack

Frontend Framework: Apache Cordova
Initial OCR Approach: Tesseract.js (Client-side)
Final OCR Solution: PaddleOCR (Server-side)
Backend Language: PHP and Python (PHP for API endpoints)
Data Format: JSON for communication between the frontend and backend

Frontend: Cross-Platform Mobile App with Cordova

The mobile app was built using Apache Cordova, allowing for cross-platform deployment using a single codebase. The app enables users to scan physical receipts using the device’s camera. The scanned image is then sent to the backend for processing.

Initially, the plan was to handle OCR directly on the device using Tesseract.js. After testing, it became clear that this approach suffered from low recognition accuracy and unacceptable performance, particularly when processing multiple receipts in succession. This led to a shift towards server-side processing.

Backend: Server-Side OCR with PaddleOCR

After exploring several OCR libraries—including Tesseract and GNU Ocrad—I integrated PaddleOCR, an OCR system developed by PaddlePaddle. It provides highly accurate text detection, angle classification, and recognition with minimal configuration. I exposed some API endpoints via PHP, which allowed me submit the scanned images to the server, make some CRUD operation on the expense entries and calls the ocr python script.

Here’s a simplified example of how PaddleOCR was used:

from paddleocr import PaddleOCR

ocr = PaddleOCR(use_angle_cls=True, lang="en")
result = ocr.ocr(processed_image_path, cls=True)

The result is a structured list containing:

The recognized text
Bounding box coordinates
Confidence scores

This structured output made it straightforward to extract key information. I parsed the output to identify data points such as:

Total amount (by detecting terms like “total,” “sum,” etc.)
Purchase date
Store name

Data Flow and User Interaction

Once the server finishes processing, the extracted data is returned to the mobile app as a JSON object. The app then:

Overlays the bounding boxes on the receipt image
Prefills a form with the extracted information
Allows the user to verify and edit the data before saving

This hybrid approach balances automation with manual verification, enhancing both usability and data accuracy.

Expense Tracking and Reporting

Each expense item in the app has an associated status field, which indicates its current state in the workflow:

uploaded_unprocessed
auto_processed
manually_checked
approved

This status tracking enables flexible querying and reporting. For instance, users can filter by items needing review or generate monthly reports of approved expenses.

Conclusion

This project demonstrates a full-stack solution that combines mobile development, image processing, OCR, and backend integration. By leveraging PaddleOCR’s capabilities and streamlining the data extraction process, the app significantly reduces the overhead of manual expense tracking while remaining transparent and user-controllable.