html2tex
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:HTML to LaTeX converter
HTML2TeX
========

This is a simple library and command-line tool to convert HTML documents into
LaTeX. The intended use is for preparing ebooks, but it might also be useful
for other purposes.

Command-line use
----------------

    Usage: html2tex [options] input.html [output.tex]
        -t, --title TITLE                Set (or override) the title of the document
        -a, --author AUTHOR              Set (or override) the author of the document
        -c, --document-class CLASS       Set the LaTeX document class to be used; default is book
        -h, --help                       Display this screen

If the output file is not specified, the program will create an output file
based on the input filename, replacing the extension with `.tex` – or, if there
is no extension, by appending `.tex`.

The title and author will, by default, be extracted from the HTML document,
using either Dublin Core metadata or the HTML title and an author META element.

Library use
-----------

    require "html2tex"
    
    # Return output as a string
    tex = HTML2TeX.new(html).to_tex
    
    # Write directly to a stream
    File.open("output.tex", "w") do |f|
      HTML2TeX.new(html).to_tex(f)
    end

A hash of options can be supplied as the second argument to `to_tex`; the
available keys are:

* `:author`
* `:title`
* `:class`

See the command-line section above for an explanation of these.

Dependencies
------------

* htmlentities, to decode character entities
* rubypants, to convert `'` and `"` into something more appealing

Limitations
-----------

The HTML recognised is currently limited to headings, paragraphs, and bold and
italic mark-up. This covers the majority of novels, but it's far from
comprehensive.

StringScanner is used to process the HTML, but cannot read from a stream
directly, so the entire input document must be read into memory as a string
first.

UTF-8 is assumed everywhere; other character encodings will produce odd
results. If the HTML file to be processed is not in UTF-8 encoding with unix
line endings (at least, on Linux/OS X/etc.), _fix that first_. The usual
suspects will help here:

    iconv -f windows-1252 -t utf-8 < somefile-win1252.html > somefile-utf8.html
    dos2unix somefile-utf8.html

Next steps
----------

If you have XeLaTex, you can easily turn the generated `.tex` file into a PDF:

    xelatex my-book.tex

For better results, tweak the font settings or use a custom class like [this][ebook.cls].

[ebook.cls]: http://github.com/threedaymonk/gutenberg2pdf/blob/master/ebook.cls

本源码包内暂不包含可直接显示的源代码文件,请下载源码包。