Channel: Python

↧

PDF Page Extraction/Selection in Python Using PyPDF

March 24, 2012, 4:03 pm

≫ Next: Molecular Sequence Generation with DendroPy

≪ Previous: Pure-Python Implementation of Fisher's Exact Test for a 2x2 Contingency Table

Site Section:

Program and Scripts

Keywords:

Python
PDF

The pyPDF package provides really nice facilities for PDF document manipulation. Here is a simple application script to extract a specified subset of pages from a PDF file.

#! /usr/bin/env python
 
###############################################################################
##
##  Copyright 2012 Jeet Sukumaran.
##
##  This program is free software; you can redistribute it and/or modify
##  it under the terms of the GNU General Public License as published by
##  the Free Software Foundation; either version 3 of the License, or
##  (at your option) any later version.
##
##  This program is distributed in the hope that it will be useful,
##  but WITHOUT ANY WARRANTY; without even the implied warranty of
##  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
##  GNU General Public License for more details.
##
##  You should have received a copy of the GNU General Public License along
##  with this program. If not, see <http://www.gnu.org/licenses/>.
##
###############################################################################
 
"""
Extract specified pages from source PDF.
"""
 
import sys
import os
import argparse
import pyPdf
 
__prog__ = os.path.basename(__file__)
__version__ = "1.0.0"
__description__ = __doc__
__author__ = 'Jeet Sukumaran'
__copyright__ = 'Copyright (C) 2012 Jeet Sukumaran.'
 
def main():
    """
    Main CLI handler.
    """
 
    parser = argparse.ArgumentParser(description=__description__)
    parser.add_argument("--version", action="version", version="%(prog)s " + __version__)
    parser.add_argument("src_pdf",
            metavar="SOURCE-PDF",
            type=argparse.FileType('rb'),
            help="path to input pdf file")
    parser.add_argument("first_page",
            metavar="FIRST-PAGE",
            type=int,
            help="number of first page (1-based index: first page is '1')")
    parser.add_argument("last_page",
            metavar="LAST-PAGE",
            type=str,
            help="number of last page; if preceded by '+' (e.g., '+30'), specifies number of pages following first page to extract")
    parser.add_argument("-o", "--output-filepath",
            type=str,
            default=None,
            help="path to output file (if not given, will write to standard output)")
 
    args = parser.parse_args()
    first_page = args.first_page - 1
    if args.last_page.startswith("+"):
        last_page = args.last_page[1:].replace(" ", "")
        if not last_page:
            sys.exit("Need to specify number of pages")
        last_page = first_page + int(last_page)
    else:
        last_page = int(args.last_page) - 1
 
    pdf_in = pyPdf.PdfFileReader(args.src_pdf)
    pdf_out = pyPdf.PdfFileWriter()
    for pg_num in range(first_page, last_page + 1):
        pdf_out.addPage(pdf_in.getPage(pg_num))
    if args.output_filepath:
        out_stream = open(os.path.expandvars(os.path.expanduser(args.output_filepath)), "wb")
    else:
        out_stream = sys.stdout
    pdf_out.write(out_stream)
    out_stream.close()
 
if __name__ == '__main__':
    main()

↧

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

September 22, 2019, 11:40 pm

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

February 16, 2017, 4:24 pm

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

January 5, 2014, 10:34 pm

Ominde Commission Report and Recommendations – Ominde Report of 1964

March 16, 2015, 5:14 am

Bureau of Internal Revenue: Regional Offices (Directory)

January 9, 2014, 11:06 pm

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

March 26, 2017, 11:23 pm

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

October 17, 2016, 7:20 am

Mp3 Download: Mdu - Kunjenjenjena

December 7, 2017, 8:16 am

How the kill the job , when DTP request running for long hours.

July 26, 2013, 2:41 am

Microsoft Intune から展開しているアプリのアップデートについて

October 17, 2016, 4:11 am

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

September 1, 2017, 10:00 pm

Car crash in Dunton Bassett leaves driver in critical condition

October 7, 2014, 7:51 am

Macky 2, Two Others In Road Accident

March 29, 2015, 5:34 am

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

May 14, 2015, 11:27 pm

Detroit mafia: D’Anna Brothers agree to plea deal

April 21, 2016, 6:56 am

Delivery block field greyed out using VA02

January 26, 2016, 2:52 pm

Muloraki Au

June 22, 2016, 1:44 am

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

October 12, 2017, 2:23 pm

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

February 9, 2018, 4:56 am

FIAT 500 B0111 B0112

July 5, 2018, 10:31 am

© 2025 //www.rssing.com