分类目录归档:Tools

WSL 服务自启动脚本

支持在Windows启动时启动WSL中的Linux服务.

安装

  • 使用 git clone 到任意目录 (e.g C:\wsl-autostart)
git clone https://github.com/troytse/wsl-autostart
  • 在注册表中加入启动项 run-regedit

  • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run下新增字符串项目 (e.g WSLAutostart) regedit-new-item

  • 设定脚本的路径 (e.g C:\wsl-autostart\start.vbs) regedit-set-path

使用

  • 修改在WSL中/etc/sudoers文件,为需要自启动的服务指定为免密码. 如:
%sudo ALL=NOPASSWD: /etc/init.d/cron
%sudo ALL=NOPASSWD: /etc/init.d/ssh
%sudo ALL=NOPASSWD: /etc/init.d/mysql
%sudo ALL=NOPASSWD: /etc/init.d/apache2
  • 修改commands.txt文件指定需要自启动的服务. 如:
/etc/init.d/cron
/etc/init.d/ssh
/etc/init.d/mysql
/etc/init.d/apache2

来源:WSL服务自启动脚本

Windows docker快速上手(含镜像设置)

下载并安装Docker的Windows桌面端

登录桌面端

  • 安装完成后,点击桌面右下角docker小图标,然后点击sign in,或注册帐号

登录命令行

  • 按Win键盘输入cmd,打开windows命令提示符,输入docker login,输入密码帐号,登录docker命令行工具

设置镜像

  • 点击桌面右下角docker小图标,然后点击setting,然后点击Daemon,然后在Rigisty mirrors里面输入https://docker.mirrors.ustc.edu.cn/,然后点击Apply

启动一次操作容器

docker run ubuntu echo 'hello world'

启动交互式容器

docker run -i -t ubuntu /bin/bash

查看容器

docker ps -a # 不带参数表示正在运行的容器,-a所有,-l最近

查看指定容器:

docker inspect name | id

重新启动停止的容器:

docker start [-i] 容器名

删除停止的容器:

docker rm name | id

启动守护式容器

docker run -d IMAGE_NAME

使用pdfbox给pdf去背景图片

前面采用了Python写的pdfrw做的,发现用acrobat不能编辑。

用pdfbox工具查看发现missing xobject。

java -jar pdfbox-app-2.0.13.jar PDFDebugger out.pdf

所以改用java的pdfbox库来写

package com.c4ys;

import org.apache.pdfbox.contentstream.PDContentStream;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdfparser.PDFStreamParser;
import org.apache.pdfbox.pdfwriter.ContentStreamWriter;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.common.PDStream;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.pdmodel.graphics.pattern.PDAbstractPattern;
import org.apache.pdfbox.pdmodel.graphics.pattern.PDTilingPattern;

import java.io.File;
import java.io.IOException;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;

public class Main {

    public static void main(String[] args) throws IOException {
        if (args.length != 3) {
            usage();
        } else {
            PDDocument doc = PDDocument.load(new File(args[0]));
            if (doc.isEncrypted()) {
                System.err.println(
                        "Error: Encrypted documents are not supported .");
                System.exit(1);
            }

            for (PDPage page : doc.getPages()) {
                List<Object> newTokens = createTokensWithoutImage(page, args[2]);
                PDStream newContents = new PDStream(doc);
                writeTokensToStream(newContents, newTokens);
                page.setContents(newContents);
                processResources(page.getResources(), args[2]);
            }

            doc.save(args[1]);
            doc.close();
        }
    }

    private static List<Object> createTokensWithoutImage(PDContentStream contentStream, String im) throws IOException {
        PDFStreamParser parser = new PDFStreamParser(contentStream);
        Object token = parser.parseNextToken();
        List<Object> newTokens = new ArrayList<Object>();
        while (token != null) {
            if (token instanceof Operator) {
                Operator op = (Operator) token;
                if (op.getName().equalsIgnoreCase("do")) {
                    COSName previous = (COSName) newTokens.get(newTokens.size() - 1);
                    System.out.println(previous.getName());
                    if (previous.getName().equalsIgnoreCase(im)) {
                        // remove the argument to this operator
                        newTokens.remove(newTokens.size() - 1);
                        token = parser.parseNextToken();
                        continue;
                    }
                }
            }
            newTokens.add(token);
            token = parser.parseNextToken();
        }
        return newTokens;
    }


    private static void processResources(PDResources resources, String im) throws IOException {
        for (COSName name : resources.getXObjectNames()) {
            PDXObject xobject = resources.getXObject(name);
            if (xobject instanceof PDFormXObject) {
                PDFormXObject formXObject = (PDFormXObject) xobject;
                writeTokensToStream(formXObject.getContentStream(),
                        createTokensWithoutImage(formXObject, im));
                processResources(formXObject.getResources(), im);
            }
        }
        for (COSName name : resources.getPatternNames()) {
            PDAbstractPattern pattern = resources.getPattern(name);
            if (pattern instanceof PDTilingPattern) {
                PDTilingPattern tilingPattern = (PDTilingPattern) pattern;
                writeTokensToStream(tilingPattern.getContentStream(),
                        createTokensWithoutImage(tilingPattern, im));
                processResources(tilingPattern.getResources(), im);
            }
        }
    }

    private static void writeTokensToStream(PDStream newContents, List<Object> newTokens) throws IOException {
        OutputStream out = newContents.createOutputStream(COSName.FLATE_DECODE);
        ContentStreamWriter writer = new ContentStreamWriter(out);
        writer.writeTokens(newTokens);
        out.close();
    }


    /**
     * This will print the usage for this document.
     */
    private static void usage() {
        System.err.println("Usage: java " + Main.class.getName() + " <input-pdf> <output-pdf> <image-object-name>");
    }
}

HTML转为PDF的两种方案(含nodejs、PHP以及Python三种实现代码)

采用chrome headless方案

为什么要采用Chrome headless

因为wkhtmltopdf内置的为qt的webkit,已经很久不更新了,很多css3以及html5都支持不友好。

Chrome官方提供的页面转换为PDF的接口

https://chromedevtools.github.io/devtools-protocol/tot/Page#method-printToPDF

命令行方式

chrome --headless --print-to-pdf=path/to/file.pdf https://example.com

参考:HTML to PDF conversion using Chrome pdfium?

NodeJS扩展

html-pdf-chrome HTML to PDF converter via Chrome/Chromium.

PHP扩展

chrome-html-to-pdf Converts HTML to PDF using Google Chrome

Chrome命令行参数列表

List of Chromium Command Line Switches

采用Qt的Webkit(PyQt5)

由于当前的chrome转换存在BUG,转换大文件时内存消耗特别大,生成的文件也比较大,测试了10多种方法后,最后决定采用PyQt5来做

#!/usr/bin/env python3

import sys
import argparse

from PyQt5.QtCore import QUrl, QMarginsF
from PyQt5.QtGui import QPageLayout, QPageSize
from PyQt5.QtWebEngineWidgets import QWebEngineView
from PyQt5.QtWidgets import QApplication


class PrinterView(QWebEngineView):
    def __init__(self, url, filename, do_preview, parent=None):
        super(PrinterView, self).__init__(parent)
        self.do_preview = do_preview
        self.setUrl(QUrl(url))
        self.setZoomFactor(1)
        self.loadFinished.connect(self.load_finished)
        self.filename = filename

    def load_finished(self):
        if self.do_preview:
            self.show()
        else:
            pageLayout = QPageLayout(QPageSize(QPageSize.A5), QPageLayout.Portrait,
                                     QMarginsF(0, 0, 0, 0))
            self.page().printToPdf(self.filename, pageLayout)
            self.page().pdfPrintingFinished.connect(on_pdf_finished)


def on_pdf_finished(result):
    if result:
        print(result)
        QApplication.exit()
    else:
        QApplication.exit(1)


if __name__ == '__main__':
    app = QApplication(sys.argv)
    parser = argparse.ArgumentParser()
    parser.add_argument("--url", "-i", help="Input URL (http://example.com, file:///home/user/example.html, ...)",
                        required=True)
    parser.add_argument("--output", "-o", help="Write pdf to this file", required=True)
    parser.add_argument("--preview", "-p", help="Open preview", action="store_true")
    args = parser.parse_args()
    a = PrinterView(args.url, args.output, args.preview)
    sys.exit(app.exec_())

 采用qt打印

import sys
import argparse

from PyQt5.QtCore import QUrl, QMarginsF
from PyQt5.QtGui import QPageLayout, QPageSize
from PyQt5.QtWebEngineWidgets import QWebEngineView, QWebEnginePage, QWebEngineProfile
from PyQt5.QtWidgets import QApplication
from PyQt5.QtPrintSupport import QPrinter, QPrintDialog


class PrinterView(QWebEngineView):
    def __init__(self, url, filename, do_preview, parent=None):
        self.printer = QPrinter()
        self.printer.setPageSize(QPrinter.A5)
        self.printer.setOrientation(QPrinter.Portrait)
        self.printer.setOutputFormat(QPrinter.PdfFormat)
        self.printer.setOutputFileName(filename)
        self.printer.setPageMargins(0, 0, 0, 0, QPrinter.Millimeter)
        super(PrinterView, self).__init__(parent)
        self.do_preview = do_preview
        self.page().profile().setHttpCacheMaximumSize(5 * 1024 * 1024 * 1024)
        self.page().profile().setHttpCacheType(QWebEngineProfile.MemoryHttpCache)
        self.setUrl(QUrl(url))
        self.setZoomFactor(1)
        self.loadFinished.connect(self.load_finished2)
        self.filename = filename

    def load_finished(self):
        if self.do_preview:
            self.show()
        else:
            pageLayout = QPageLayout(QPageSize(QPageSize.A5), QPageLayout.Portrait,
                                     QMarginsF(0, 0, 0, 0))
            self.page().printToPdf(self.filename, pageLayout)
            self.page().pdfPrintingFinished.connect(on_pdf_finished)

    def load_finished2(self):
        self.show()
        self.page().print(self.printer, on_pdf_finished)


def on_pdf_finished(result):
    if result:
        print(result)
        QApplication.exit()
    else:
        QApplication.exit(1)


if __name__ == '__main__':
    app = QApplication(sys.argv)
    parser = argparse.ArgumentParser()
    parser.add_argument("--url", "-i", help="Input URL (http://example.com, file:///home/user/example.html, ...)",
                        required=True)
    parser.add_argument("--output", "-o", help="Write pdf to this file", required=True)
    parser.add_argument("--preview", "-p", help="Open preview", action="store_true")
    args = parser.parse_args()
    a = PrinterView(args.url, args.output, args.preview)
    sys.exit(app.exec_())

使用firefox的pdf

slimer-html-pdf – convert any HTML document to PDF format using slimerjs (Gecko)

大文件合并(Python)

 def on_pdf_finished(self, result):
        if result:
            print(result + ', total ' + str(self.total))
        else:
            print("导出失败")
        self.printed = self.printed + 1
        print('导出第', self.printed, '本')
        if self.printed < self.total:
            self.print_book()
        else:
            print('开始合并')
            merger = PdfFileMerger()
            for index in range(0, self.total):
                filepath = self.filename + '.' + str(index) + '.pdf'
                merger.append(filepath)
                print('合并第', index, '本')
            merger.write(self.filename)
            merger.close()
            print('合并完成,开始清除临时文件')
            # for index in range(0, self.total):
            #     filepath = self.filename + '.' + str(index) + '.pdf'
            #     os.remove(filepath)
            print('清除临时文件完成')
            QApplication.exit()

HTML转markdown工具比较

Html2MarkDown – aTool在线工具

一款html和markdown标签互转的工具,直接输入Html,网页会自动帮你转换。

HTML2Markdown

Javascript Implementation for converting HTML to Markdown text.

html2markdown

Javascript implementation for converting HTML to Markdown text. Browser and Node.js support.

Turndown

Convert HTML into Markdown with JavaScript.

Python 新轮子 Tomd: HTML 转 Markdown 工具库

用途: 爬虫爬文章保存到本地为 Markdown 格式

LCTT选题工具

将内容复制到左侧输入框内,点击生成MD,在中部编辑器处进行二次修改,并在右侧的预览框中查看效果。 确认无误后点击上方的复制代码按钮即可将代码复制到剪贴板中!

html-to-markdown

An HTML-to-markdown conversion helper for PHP

Markdown Navigator 2.0

Markdown language support for IntelliJ platform

django2 + uwsgi + nginx

安装uwsgi模块

pip install uwsgi

测试uwsgi服务

uwsgi --http 0.0.0.0:8080 --file project/wsgi.py --static-map=/static=static

配置uwsgi.ini

# uwsig使用配置文件启动
[uwsgi]
# 项目目录
chdir=/data/pyproject/zc1024
# 指定项目的application
module=zc1024.wsgi:application
# 指定sock的文件路径
socket=/data/pyproject/zc1024/tmp/uwsgi.sock
# 进程个数
workers=4
pidfile=/data/pyproject/zc1024/tmp/uwsgi.pid
# 指定IP端口
http=127.0.0.1:8080
# 指定静态文件
static-map=/static=/data/pyproject/zc1024/static
# 启动uwsgi的用户名和用户组
uid=ning
gid=ning
# 启用主进程
master=true
# 自动移除unix Socket和pid文件当服务停止的时候
vacuum=true
# 序列化接受的内容,如果可能的话
thunder-lock=true
# 启用线程
enable-threads=true
# 设置自中断时间
harakiri=30
# 设置缓冲
post-buffering=4096
# 设置日志目录
daemonize=/data/pyproject/zc1024/tmp/uwsgi.log

运行配置

uwsgi --ini uwsgi.ini

配置nginx

 # 指定项目路径uwsgi
location / { # 这个location就和咱们Django的url(r'^admin/', admin.site.urls),
include uwsgi_params; # 导入一个Nginx模块他是用来和uWSGI进行通讯的
uwsgi_connect_timeout 30; # 设置连接uWSGI超时时间
uwsgi_pass unix:/data/pyproject/zc1024/tmp/uwsgi.sock; # 指定uwsgi的sock文件所有动态请求就会直接丢给他
}

# 指定静态文件路径
location /static/ {
alias /data/pyproject/zc1024/static/;
index index.html index.htm;
}

重新加载nginx配置

nginx -s reload