pptx-to-md is a library to parse Microsoft PowerPoint (.pptx) slides and convert them into structured Markdown content and data, making it easy to process, use, or integrate slide data programmatically.
- 📄 Extract Slide Text: Parses and extracts text elements from slides.
- 📋 Lists & Tables: Recognizes and formats lists (ordered/unordered) and tables into Markdown.
- 🖼️ Embedded Images: Supports embedded images extraction as base64-encoded inline images.
- 💾 Memory Efficient: Use the streaming API to iterate over one slide at a time, never overloading memory.
- ⏱️ Multithreading: Optional support for multithreaded parsing of PowerPoint slides, with a significant performance increase for larger presentations.
- ⚙️ Robust & Safe APIs: Designed according to Rust best practices with explicit error handling.
- 🪄 Embedding: Used to provide pptx content and meta information in a form that is useful for embeddings
Here's an easy example to convert a PowerPoint slide into Markdown*:
use pptx_to_md::{PptxContainer, ParserConfig};
use std::path::Path;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create config instance with the `ParserConfigBuilder`
// this example is equivalent to the `ParseConfig::default()`
let config = ParserConfig::builder()
.extract_images(true)
.compress_images(true)
.quality(80)
.image_handling_mode(ImageHandlingMode::InMarkdown)
.image_output_path(None)
.include_slide_comment(true)
.build();
// alternatively use `let config = ParserConfig::default();`
// open the container with the path to your .pptx file
let pptx_container = PptxContainer::open(Path::new("path/to/your/presentation.pptx"), config)?;
// Parse all slides' xml at once single- or multithreaded
let slides = container.parse_all()?; // or `parse_all_multi_threaded()?`
for slide in slides {
// Convert each slide into Markdown
if let Some(md_content) = slide.convert_to_md() {
println!("{}", md_content);
}
// Or iterate over each slide element and match them to add custom logic
for element in &slide.elements {
match element {
SlideElement::Text(text) => { println!("{:?}\n", text) }
SlideElement::Table(table) => { println!("{:?}\n", table) }
SlideElement::Image(image_reference) => { println!("{:?}\n", image_reference) }
SlideElement::List(list) => { println!("{:?}\n", list) }
SlideElement::Unknown => { println!("An Unknown element was found.\n") }
}
}
}
Ok(())
}*for more usage examples refer to the examples directory
| Parameter | Type | Default | Description |
|---|---|---|---|
extract_images |
bool |
true |
Whether images are extracted from slides or not. If false, images can not be extracted manually either. |
compress_images |
bool |
true |
Whether images are compressed before encoding or not. Effects manually extracted images too. |
image_quality |
u8 |
80 |
Defines the image compression quality (0-100). Higher values mean better quality but larger file sizes. |
image_handling_mode |
ImageHandlingMode |
InMarkdown |
Determines how images are handled during content export |
image_output_path |
Option<PathBuf> |
None |
Output directory path for ImageHandlingMode::Save (mandatory for saving mode) |
include_slide_comment |
bool |
true |
Weather the slide number comment is included or not (<!-- Slide [n] -->) |
| Member | Description |
|---|---|
InMarkdown |
Images are embedded directly in the Markdown output using standard syntax as base64 data (![]()) |
Manually |
Image handling is delegated to the user, requiring manual copying or referencing (as base64) |
Save |
Images will be saved in a provided output directory and integrated using <a> tag syntax (<a href="file:///<abs_path>"></a>) |
pptx-to-md/
├── Cargo.toml
├── README.md
├── CHANGELOG.md
├── LICENSE-MIT
├── LICENSE-APACHE
├── examples/ # Simple examples to present the usage of this crate
│ ├── basic_usage.rs
│ ├── manual_image_extraction.rs
│ ├── memory_efficient_streaming.rs
│ ├── performance_tests.rs
│ ├── save_images.rs
│ └── slide_elements.rs
├── src/
│ ├── lib.rs # Public API
│ ├── container.rs # Pptx container handling
│ ├── parser_config.rs # Config and config builder
│ ├── slide.rs # Individual slide representation & markdown conversion
│ ├── parse_xml.rs # XML parsing logic
│ ├── parse_rels.rs # Relationship parsing logic
│ └── types.rs # Common data types used
├── tests/
│ ├── test_data/ # XML & MD test data files
└── └── slide_tests.rs # tests for md conversion logic
Include the following line in your Cargo.toml dependencies section:
[dependencies]
pptx-to-md = "0.4.0"This project is licensed under the MIT-License and Apache 2.0-Licence.
Feel free to contribute or suggest improvements!