pptx-to-md

pptx-to-md is a library to parse Microsoft PowerPoint (.pptx) slides and convert them into structured Markdown content and data, making it easy to process, use, or integrate slide data programmatically.

🚀 Features

📄 Extract Slide Text: Parses and extracts text elements from slides.
📋 Lists & Tables: Recognizes and formats lists (ordered/unordered) and tables into Markdown.
🖼️ Embedded Images: Supports embedded images extraction as base64-encoded inline images.
💾 Memory Efficient: Use the streaming API to iterate over one slide at a time, never overloading memory.
⏱️ Multithreading: Optional support for multithreaded parsing of PowerPoint slides, with a significant performance increase for larger presentations.
⚙️ Robust & Safe APIs: Designed according to Rust best practices with explicit error handling.
🪄 Embedding: Used to provide pptx content and meta information in a form that is useful for embeddings

👨‍💻 Example Usage

Here's an easy example to convert a PowerPoint slide into Markdown*:

use pptx_to_md::{PptxContainer, ParserConfig};
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create config instance with the `ParserConfigBuilder` 
    // this example is equivalent to the `ParseConfig::default()`
    let config = ParserConfig::builder()
        .extract_images(true)
        .compress_images(true)
        .quality(80)
        .image_handling_mode(ImageHandlingMode::InMarkdown)
        .image_output_path(None)
        .include_slide_comment(true)
        .build();
    // alternatively use `let config = ParserConfig::default();`
    
    // open the container with the path to your .pptx file
    let pptx_container = PptxContainer::open(Path::new("path/to/your/presentation.pptx"), config)?;
    
    // Parse all slides' xml at once single- or multithreaded
    let slides = container.parse_all()?; // or `parse_all_multi_threaded()?`
    
    for slide in slides {
        // Convert each slide into Markdown
        if let Some(md_content) = slide.convert_to_md() {
            println!("{}", md_content);
        }

        // Or iterate over each slide element and match them to add custom logic
        for element in &slide.elements {
            match element {
                SlideElement::Text(text) => { println!("{:?}\n", text) }
                SlideElement::Table(table) => { println!("{:?}\n", table) }
                SlideElement::Image(image_reference) => { println!("{:?}\n", image_reference) }
                SlideElement::List(list) => { println!("{:?}\n", list) }
                SlideElement::Unknown => { println!("An Unknown element was found.\n") }
            }
        }
    }

    Ok(())
}

*for more usage examples refer to the examples directory

Config Parameters

Parameter	Type	Default	Description
`extract_images`	`bool`	`true`	Whether images are extracted from slides or not. If false, images can not be extracted manually either.
`compress_images`	`bool`	`true`	Whether images are compressed before encoding or not. Effects manually extracted images too.
`image_quality`	`u8`	`80`	Defines the image compression quality `(0-100)`. Higher values mean better quality but larger file sizes.
`image_handling_mode`	`ImageHandlingMode`	`InMarkdown`	Determines how images are handled during content export
`image_output_path`	`Option<PathBuf>`	`None`	Output directory path for `ImageHandlingMode::Save` (mandatory for saving mode)
`include_slide_comment`	`bool`	`true`	Weather the slide number comment is included or not (`<!-- Slide [n] -->`)

Member of `ImageHandlingMode`

Member	Description
`InMarkdown`	Images are embedded directly in the Markdown output using standard syntax as `base64` data (`![]()`)
`Manually`	Image handling is delegated to the user, requiring manual copying or referencing (as `base64`)
`Save`	Images will be saved in a provided output directory and integrated using `<a>` tag syntax (`<a href="file:///<abs_path>"></a>`)

🏗 Project Structure

pptx-to-md/
├── Cargo.toml
├── README.md
├── CHANGELOG.md
├── LICENSE-MIT
├── LICENSE-APACHE
├── examples/           # Simple examples to present the usage of this crate
│   ├── basic_usage.rs
│   ├── manual_image_extraction.rs
│   ├── memory_efficient_streaming.rs
│   ├── performance_tests.rs
│   ├── save_images.rs
│   └── slide_elements.rs
├── src/
│   ├── lib.rs            # Public API
│   ├── container.rs      # Pptx container handling
│   ├── parser_config.rs  # Config and config builder
│   ├── slide.rs          # Individual slide representation & markdown conversion
│   ├── parse_xml.rs      # XML parsing logic
│   ├── parse_rels.rs     # Relationship parsing logic
│   └── types.rs          # Common data types used
├── tests/
│   ├── test_data/      # XML & MD test data files
└── └── slide_tests.rs  # tests for md conversion logic

📦 Installation

Include the following line in your Cargo.toml dependencies section:

[dependencies]
pptx-to-md = "0.4.0"

📜 License

This project is licensed under the MIT-License and Apache 2.0-Licence.

Feel free to contribute or suggest improvements!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

pptx-to-md

🚀 Features

👨‍💻 Example Usage

Config Parameters

Member of `ImageHandlingMode`

🏗 Project Structure

📦 Installation

📜 License

About

Licenses found

Uh oh!

Releases 3

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github/workflows		.github/workflows
.idea		.idea
examples		examples
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
LICENCE-MIT		LICENCE-MIT
LICENSE-APACHE		LICENSE-APACHE
README.md		README.md

License

Licenses found

nilskruthoff/pptx-parser

Folders and files

Latest commit

History

Repository files navigation

pptx-to-md

🚀 Features

👨‍💻 Example Usage

Config Parameters

Member of ImageHandlingMode

🏗 Project Structure

📦 Installation

📜 License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Member of `ImageHandlingMode`

Packages