Apryse 変換 SDK を使用して Python で変換サービスを作成する方法

概要: PDF と Office はどちらもさまざまな状況で役立つファイル形式を提供しますが、それらの間でファイルを変換するのは面倒な場合があります。この記事では、Apryse PDF 変換ワークフローを使用して Python アプリケーションで PDF を .docx、.xslx、および .pptx に変換する方法のサンプルコードと例を紹介します。

Microsoft Office は、エクセル、文書、スライドの作成と編集に欠かせない便利なアプリスイートです。Office は、ビジネス情報を扱うための定番ツールスイートであり、多くのユーザーにとって、これらの使い慣れた形式で作業できることは重要です。

一方、PDF ドキュメントには、Microsoft Office に関連付けられた .docx、.pptx、.xslx ファイルに比べて多くの利点があります。PDF は、書式設定が保持された状態で、オペレーティングシステムやアプリケーション間でドキュメントを一貫して表示できるように設計されています。この固定された表示に加えて、PDF は圧縮可能で、暗号化、編集、デジタル署名などのセキュリティ機能も備えています。

そのため、Office から PDF への変換は、Office アプリの [保存] ボタンをクリックするのと同じくらい簡単な場合が多いですが、PDF から Word への変換ワークフローは、PDF ファイルを使い慣れたアプリに戻すためのギャップを埋めるのに役立ちます。

PDF ドキュメントはコンピューターで読み取り可能なように設計されていないため、書式設定を正確に維持する PDF から Office への変換ツールを見つけるのは難しい場合があります。Apryse の PDF 変換 SDK を使用すると、Python アプリケーションで PDF 変換ワークフローを作成できます。

PDF から Office への変換
設定
Python PDF から Word への変換
PDF から Excel へ
完全なサンプルコード
PDF から Office への変換 SDK のメリット
完全な Office およびドキュメント SDK

PDF から Office への変換

ここでは、PDF 変換 SDK を使用して、サーバーまたはデスクトップ上で Python を使用して PDF を Word、Excel、PowerPoint などの Microsoft Office に変換する手順を説明します。

この機能は、構造化出力モジュールと呼ばれる Apryse Server SDK のアドオンによって提供されます。

設定

PDF から Office への変換を可能にする構造化出力モジュールをダウンロードします。
これをプロジェクトのディレクトリの lib というフォルダに配置し、以下のサンプルで参照します。

Python PDF から Word への変換

このサンプルは、PDF から DOCX ファイルに変換する方法を示しています。

wordOutputOptions = WordOutputOptions()
# Optionally convert only the first page
wordOutputOptions.SetPages(1, 1)
# Requires the Structured Output module
Convert.ToWord(filename, output_filename, wordOutputOptions)
Python PDF to PowerPoint Conversion
powerPointOutputOptions = PowerPointOutputOptions()
# Optionally convert only the first page
powerPointOutputOptions.SetPages(1, 1)
# Requires the Structured Output module
Convert.ToPowerPoint(filename, output_filename, powerPointOutputOptions)

PDF から Excel へ

excelOutputOptions = ExcelOutputOptions()
# Optionally convert only the first page
excelOutputOptions.SetPages(1, 1)
# Requires the Structured Output module
Convert.ToExcel(filename, output_filename, excelOutputOptions)

完全なサンプルコード

この長いサンプルコードスニペットは、Apryse SDK を使用して、Python で提供される一般的な PDF ドキュメントをプログラムで Word、Excel、PowerPoint に変換する方法を示しています。

#---------------------------------------------------------------------------------------
# Copyright (c) 2001-2023 by Apryse Software Inc. All Rights Reserved.
# Consult LICENSE.txt regarding license information.
#---------------------------------------------------------------------------------------

import site
site.addsitedir("../../../PDFNetC/Lib")
import sys
from PDFNetPython import *

import platform

sys.path.append("../../LicenseKey/PYTHON")
from LicenseKey import *

#---------------------------------------------------------------------------------------
# The following sample illustrates how to use the PDF.Convert utility class to convert 
# documents and files to Word, Excel and PowerPoint.
#
# The Structured Output module is an optional PDFNet Add-on that can be used to convert PDF
# and other documents into Word, Excel, PowerPoint and HTML format.
#
# The PDFTron SDK Structured Output module can be downloaded from
# https://docs.apryse.com/core/info/modules/
#
# Please contact us if you have any questions.
#---------------------------------------------------------------------------------------

# Relative path to the folder containing the test files.
inputPath = "../../TestFiles/"
outputPath = "../../TestFiles/Output/"

def main():
    # The first step in every application using PDFNet is to initialize the 
    # library. The library is usually initialized only once, but calling 
    # Initialize() multiple times is also fine.
    PDFNet.Initialize(LicenseKey)
    
    PDFNet.AddResourceSearchPath("../../../PDFNetC/Lib/")

    if not StructuredOutputModule.IsModuleAvailable():
        print("")
        print("Unable to run the sample: PDFTron SDK Structured Output module not available.")
        print("-----------------------------------------------------------------------------")
        print("The Structured Output module is an optional add-on, available for download")
        print("at https://docs.apryse.com/core/info/modules/. If you have already")
        print("downloaded this module, ensure that the SDK is able to find the required files")
        print("using the PDFNet::AddResourceSearchPath() function.")
        print("")
        return

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to Word
        print("Converting PDF to Word")

        outputFile = outputPath + "paragraphs_and_tables.docx"

        Convert.ToWord(inputPath + "paragraphs_and_tables.pdf", outputFile)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to Word, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to Word with options
        print("Converting PDF to Word with options")

        outputFile = outputPath + "paragraphs_and_tables_first_page.docx"

        wordOutputOptions = WordOutputOptions()

        # Convert only the first page
        wordOutputOptions.SetPages(1, 1)

        Convert.ToWord(inputPath + "paragraphs_and_tables.pdf", outputFile, wordOutputOptions)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to Word, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to Excel
        print("Converting PDF to Excel")

        outputFile = outputPath + "paragraphs_and_tables.xlsx"

        Convert.ToExcel(inputPath + "paragraphs_and_tables.pdf", outputFile)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to Excel, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to Excel with options
        print("Converting PDF to Excel with options")

        outputFile = outputPath + "paragraphs_and_tables_second_page.xlsx"

        excelOutputOptions = ExcelOutputOptions()

        # Convert only the second page
        excelOutputOptions.SetPages(2, 2)

        Convert.ToExcel(inputPath + "paragraphs_and_tables.pdf", outputFile, excelOutputOptions)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to Excel, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to PowerPoint
        print("Converting PDF to PowerPoint")

        outputFile = outputPath + "paragraphs_and_tables.pptx"

        Convert.ToPowerPoint(inputPath + "paragraphs_and_tables.pdf", outputFile)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to PowerPoint, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to PowerPoint with options
        print("Converting PDF to PowerPoint with options")

        outputFile = outputPath + "paragraphs_and_tables_first_page.pptx"

        powerPointOutputOptions = PowerPointOutputOptions()

        # Convert only the first page
        powerPointOutputOptions.SetPages(1, 1)

        Convert.ToPowerPoint(inputPath + "paragraphs_and_tables.pdf", outputFile, powerPointOutputOptions)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to PowerPoint, error: " + str(e))

    #-----------------------------------------------------------------------------------

    PDFNet.Terminate()
    print("Done.")
    
if __name__ == '__main__':
    main()

PDF から Office への変換 SDK のメリット

この記事の冒頭で説明したように、すべての変換ツールが PDF ファイルを正確に解析し、変換プロセス中に書式を保持できるわけではありません。

Apryse の SDK は、次のような利点により、より良い結果をもたらします。

クライアント側処理

PDF、Microsoft Office、画像、ビデオ、HTML のレンダリング、変換、編集に Microsoft Office や LibreOffice などのサーバー側の依存関係を使用せずに簡単に拡張できます。

比類のないレンダリング品質

あらゆる Web、モバイル、デスクトップアプリケーションに、Office ドキュメントの高速レンダリングと優れた精度の変換を実現します。

設計によるセキュリティ

外部依存関係がないため、データがプラットフォームから出ることなく独自のインフラストラクチャに展開でき、脆弱性を排除できます。

専門家による信頼できるサポート

経験豊富な SDK 開発者のチームが、無制限のトライアルからゴールまで、そしてそれ以降もお客様をサポートし、プロジェクトを加速します。

完全な Office およびドキュメント SDK

ユーザーが作業を完了するために必要な使い慣れた Microsoft Office 形式に PDF を迅速かつ確実に変換する必要がある場合、これがソリューションとなります。

変換に加えて、Apryse Server SDK はお客様のニーズに合わせて拡張できるように設計されています。あらゆるプラットフォームの 160 を超えるファイル形式に対して、クライアント側でのドキュメント表示、注釈付け、その他の多くのドキュメント機能のためのすぐに使用できるコンポーネントを簡単に追加できます。

Apryse では無償トライアルも提供しています。是非お試しください。

Apryse 製品の詳細は、弊社 Web サイトをご確認ください。