pst_to_eml.py
pst_to_eml.py converts an Outlook PST archive into a folder tree of individual .eml files with attachments intact, using the external readpst tool for PST parsing and Python for MBOX-to-EML conversion.
Outlook PST files are a closed format. They cannot be opened without Outlook, cannot be searched at scale without specialist tooling, and cannot be produced in legal or regulatory proceedings without first converting them into a standard format. When an organisation receives a data subject access request, faces an eDiscovery obligation, or needs to migrate away from Outlook, the PST becomes a blocker. Manual export through the Outlook GUI is slow, inconsistent across large archives, and drops metadata.
An employee has left the organisation under disputed circumstances and their mailbox PST has been preserved as a legal hold item. The organisation's legal team needs the mailbox contents in a reviewable format for disclosure. You run this script against the PST archive on a controlled machine. The output is a date-stamped EML folder tree, with each message as an individual file with headers and attachments intact, ready to be ingested into a legal review tool or searched by the legal team directly.
pst_to_eml.py first calls the system readpst utility to extract the PST into MBOX format, then processes each MBOX file to write individual .eml messages into a mirrored folder tree. Attachments are preserved within each EML file. The script requires readpst to be installed (available on Linux/macOS via package manager; Windows via WSL). Always run on a copy of the PST, never on the original. Large archives are disk-intensive.
# Install readpst (Ubuntu/Debian) $ sudo apt-get install pst-utils # macOS $ brew install libpst # Run the conversion $ python3 pst_to_eml.py # Follow prompts: enter PST path and output directory
PST path : /evidence/mailbox_john_doe.pst
Output dir : /evidence/eml_output/
Extracting : readpst → MBOX ... done (3 folders, 4,821 messages)
Converting : MBOX → EML ... done
Output tree:
/evidence/eml_output/
Inbox/ (2,104 .eml files)
Sent Items/ (1,893 .eml files)
Deleted Items/ ( 824 .eml files)
Conversion complete — 4,821 messages written. PST path : /evidence/mailbox_archive.pst
Output dir : /evidence/eml_output/
ERROR: readpst not found at /usr/bin/readpst
Install with: sudo apt-get install pst-utils
Or set READPST_PATH in the script to the correct binary location.
If PST is password-protected or corrupt, readpst will report
errors per folder. Partial output may still be usable.Regulation map
| Framework | Control / Clause | Obligation |
|---|---|---|
| GDPR / UK GDPR | Article 15 — Subject Access Request | Data subject access requests require the ability to produce personal data held in any format, including email archives. |
| eDiscovery (US) | FRCP Rule 34 | Electronically stored information must be producible in a reasonably usable form. EML conversion satisfies this requirement. |
| ISO 27001:2022 | A.16.1.7 | Evidence relating to information security incidents must be collected and preserved. PST-to-EML supports forensic preservation. |
| DORA (EU) | Article 17 | ICT incident investigations may require retrieval of email evidence from legacy or preserved mailboxes. |
| SOX | Records Retention | Financial institution email records must be retrievable for the defined retention period regardless of storage format. |
| SEBI (India) | LODR / Surveillance | Regulatory investigations may require email evidence production from employee mailboxes in accessible formats. |
Feedback welcome: Corrections, ideas, and requests — grcguy@rtapulse.com.
Request an addition