月度归档:2017年06月

golang queue packages

github.com/rylio/queue

Execute tasks in parallel with a concurrency limit

github.com/bgentry/que-go

An interoperable Golang port of the Ruby Que queuing library for PostgreSQL

goworker

goworker is a Go-based background worker that runs 10 to 100,000* times faster than Ruby-based workers.

goworker is compatible with Resque, so you can push your jobs with Rails and Resque, and consume them with Go in the background.

machinery

Machinery is an asynchronous task queue/job queue based on distributed message passing.

golang通过web管理worker

Writing worker queues, in Go

Have you ever wanted to write something that is highly concurrent, and performs as many tasks as you will let it, in parallel? Well, look no further, here is a guide on how to do just that, in Go!

go-workers - Sidekiq compatible background workers in golang

Sidekiq compatible background workers in golang.

  • reliable queueing for all queues using brpoplpush
  • handles retries
  • support custom middleware
  • customize concurrency per queue
  • responds to Unix signals to safely wait for jobs to finish before exiting.
  • provides stats on what jobs are currently running
  • well tested

NSQ - A realtime distributed messaging platform

NSQ is a successor to simplequeue (part of simplehttp) and as such is designed to (in no particular order):

  • support topologies that enable high-availability and eliminate SPOFs
  • address the need for stronger message delivery guarantees
  • bound the memory footprint of a single process (by persisting some messages to disk)
  • greatly simplify configuration requirements for producers and consumers
  • provide a straightforward upgrade path
  • improve efficiency

Background Jobs with Que-Go

Web servers should focus on serving users as quickly as possible. Any non-trivial work that could slow down your user’s experience should be done asynchronously outside of the web process.

Worker queues are commonly used to accomplish this goal. For a more in-depth description of the worker queue architectural pattern, read the Worker Dynos, Background Jobs and Queueing article. Here we’ll demonstrate this pattern using a sample Go application using Que-Go and Heroku Postgres.

使用Swoole作为服务解决PHP加载耗内存问题

采用swoole主要满足部分程序需要常驻内存和高性能的要求。

代码

<?php
require_once('function.php');

$http = new swoole_http_server("0.0.0.0", 9501);
$http->on('request', function ($request, $response) {
    $time = microtime(true);
    $error_code = 0;
    $error_msg = '';
    $result = [];
    try {
        switch ($request->server['request_uri']) {
            case '/test/':
                $result = get($request);
                break;
            default:
                $error_code = 1;
                $error_msg = '请求url不正确';
                break;

        }
    } catch (Exception $e) {
        $error_code = 1;
        $error_msg = $e->getMessage();
    }

    $time_new = microtime(true);
    $response->header("Content-Type", "text/html; charset=utf-8");
    $response->end(json_encode(
        [
            'code' => $error_code,
            'msg' => $error_msg,
            'time' => $time_new - $time,
            'data' => $result,
        ]
        , JSON_UNESCAPED_UNICODE));
});

$http->set(array(
    'worker_num' => 4,
    'reactor_num' => 4,
    'daemonize' => true,
    'pid_file' => __DIR__ . '/server.pid',
));

$http->start();

部署介绍

因为需要用到PHP的swoole扩展,所以运行脚本需要单独配置

php-service.ini

; 配置swoole 安装说明:https://wiki.swoole.com/wiki/page/6.html
[swoole]
extension=swoole.so

启动脚本

cd php-service
php -c /usr/local/etc/php-service.ini server.php

平滑重启

kill -USR2  `cat server.pid`

通过pid杀死服务

kill `cat server.pid`

nginx配置

server {
    root /data/wwwroot/;
    server_name webservice.liebiao.com;

    location / {
        proxy_http_version 1.1;
        proxy_set_header Connection "keep-alive";
        proxy_set_header X-Real-IP $remote_addr;
        if (!-e $request_filename) {
             proxy_pass http://127.0.0.1:9501;
        }
    }
}

根据文章标题查找相近文章(PHP+结巴分词)

需求

根据当前文章标题,找到相近的10篇文章

测试案例

家里想买一台婴儿理发器,自己给小孩子理发,如何选购婴儿理发器

当前采用any模式进行匹配,匹配结果

家里想买一台彩色激光多功能一体机,什么牌子、什么型号的好?
我家里想买一台家用净水器,一个重庆的朋友给我推荐德国曼稣勒净水器,广东有德国曼稣勒净水器卖吗?
滚筒洗衣机有哪些品牌,家里想买一台滚筒洗衣机好不好,有什么好处
家里想买一台跑步机,有什么需要注意的地方吗
家里想买一台修鞋机,谁告诉我买什么牌子的好呢?
家里想买一台智能电视,长虹Q2F好吗?听说是首款移动互联电视呢。
我是搞婚庆的,想买一台高清摄像机自己用,请大家帮忙参考一下!价格在1W&mdash;2W,2W多点也不要紧。
家里想买一台跑步机,不知道什么牌子跑步机比较好
壁挂式新风系统有什么特点?最近家里想买一台新风机,由于家里已
家里想买一台婴儿理发器,自己给小孩子理发,如何选购婴儿理发器

改进后,匹配结果

运宝婴儿理发器怎样啊?好用吗?我想一个运宝理发器给小孩子用。
婴儿理发器哪个牌子好呢?一般大家都是怎么给小孩子理发的呢?怎么让小孩子不乱动呢?
家里想买一台婴儿理发器,自己给小孩子理发,如何选购婴儿理发器
婴儿理发器哪个好宝妈们有给宝贝买理发器吗?哪款更方便好用且静
什么牌子的充电式婴儿理发器好用又便宜 什么牌子的充电式婴儿理
婴儿理发器,家里有的来。 我想知道婴儿理发器,哪个牌子的那款
婴儿理发器想买一个好些的婴儿理发器,要静音,充电,陶瓷头,可
婴儿理发器什么牌子的好 宝宝现在5个多月了,想自己给宝贝理发
给婴儿用的电动理发器哪个牌子好?老公要自己给儿子理发,那就买
蓄电池能接“百特静音电动婴儿理发器”吗?

改进办法

采用结巴分词获取词性

家里/s  想买/v  一台/m  婴儿/n  理发器/n  ,/w  自己/r  给/p  小孩子/n  理发/v  ,/w  如何/r  选购/v  婴儿/n  理发器/n 

获取名词

婴儿 理发器 小孩子 婴儿 理发器

然后进行搜索

相关代码

<?php

use Fukuball\Jieba\Jieba;
use Fukuball\Jieba\Finalseg;
use Fukuball\Jieba\Posseg;
use Fukuball\Jieba\JiebaAnalyse;
use NilPortugues\Sphinx\SphinxClient;

Jieba::init();
Finalseg::init();
Posseg::init();
JiebaAnalyse::init();

$sphinxSearch_new = new SphinxClient();
$sphinxSearch_new->setServer($host, 9312);
$sphinxSearch_new->setMatchMode(SPH_MATCH_EXTENDED2);
$sphinxSearch_new->setRankingMode(SPH_RANK_EXPR, 'doc_word_count');
$sphinxSearch_new->setSortMode(SPH_SORT_EXTENDED, '@weight DESC, id desc');
$sphinxSearch_new->setLimits(0, 10);


$seg_list = Posseg::cut($v);
$words = array_map(function ($wd) {
    return $wd['word'] . "/" . $wd["tag"] . " ";
}, $seg_list);
$valid_words = array_filter($seg_list, function ($wd) {
    return in_array($wd["tag"], ['n', 'ng', 'nrt', 'ns', 'nt', 'nz', 'j', 'l', 'vn']);
});
$valid_words = array_map(function ($wd) {
    return $wd['word'];
}, $valid_words);
$valid_words_str = implode(" ", $valid_words);
$sphinxSearch_new->addQuery("\"$valid_words_str\"/2", 'wd_question');
$result = $sphinxSearch_new->runQueries();

使用php扩展性能测试

默认PHP版本会比较慢,所以采用php扩展(扩展启动慢,性好执行速度还行10000次0.5秒)

<?php
$time = microtime(true);
for($i=0;$i<10000;$i++) {
    $result = jieba('小明硕士毕业于中国科学院计算所,后在日本京都大学深造', 2);
}
$timenew = microtime(true);
echo "共耗时:" . ($timenew - $time) . PHP_EOL;

词性分析

https://github.com/fxsjy/jieba/issues/411

POS = {
    "n": {  # 1. 名词  (1个一类,7个二类,5个三类)
        "n": "名词",
        "nr": "人名",
        "nr1": "汉语姓氏",
        "nr2": "汉语名字",
        "nrj": "日语人名",
        "nrf": "音译人名",
        "ns": "地名",
        "nsf": "音译地名",
        "nt": "机构团体名",
        "nz": "其它专名",
        "nl": "名词性惯用语",
        "ng": "名词性语素"
    },
    "t": {  # 2. 时间词(1个一类,1个二类)
        "t": "时间词",
        "tg": "时间词性语素"
    },
    "s": {  # 3. 处所词(1个一类)
        "s": "处所词"
    },
    "f": {  # 4. 方位词(1个一类)
        "f": "方位词"
    },
    "v": {  # 5. 动词(1个一类,9个二类)
        "v": "动词",
        "vd": "副动词",
        "vn": "名动词",
        "vshi": "动词“是”",
        "vyou": "动词“有”",
        "vf": "趋向动词",
        "vx": "形式动词",
        "vi": "不及物动词(内动词)",
        "vl": "动词性惯用语",
        "vg": "动词性语素"
    },
    "a": {  # 6. 形容词(1个一类,4个二类)
        "a": "形容词",
        "ad": "副形词",
        "an": "名形词",
        "ag": "形容词性语素",
        "al": "形容词性惯用语"
    },
    "b": {  # 7. 区别词(1个一类,2个二类)
        "b": "区别词",
        "bl": "区别词性惯用语"
    },
    "z": {  # 8. 状态词(1个一类)
        "z": "状态词"
    },
    "r": {  # 9. 代词(1个一类,4个二类,6个三类)
        "r": "代词",
        "rr": "人称代词",
        "rz": "指示代词",
        "rzt": "时间指示代词",
        "rzs": "处所指示代词",
        "rzv": "谓词性指示代词",
        "ry": "疑问代词",
        "ryt": "时间疑问代词",
        "rys": "处所疑问代词",
        "ryv": "谓词性疑问代词",
        "rg": "代词性语素"
    },
    "m": {  # 10. 数词(1个一类,1个二类)
        "m": "数词",
        "mq": "数量词"
    },
    "q": {  # 11. 量词(1个一类,2个二类)
        "q": "量词",
        "qv": "动量词",
        "qt": "时量词"
    },
    "d": {  # 12. 副词(1个一类)
        "d": "副词"
    },
    "p": {  # 13. 介词(1个一类,2个二类)
        "p": "介词",
        "pba": "介词“把”",
        "pbei": "介词“被”"
    },
    "c": {  # 14. 连词(1个一类,1个二类)
        "c": "连词",
        "cc": "并列连词"
    },
    "u": {  # 15. 助词(1个一类,15个二类)
        "u": "助词",
        "uzhe": "着",
        "ule": "了 喽",
        "uguo": "过",
        "ude1": "的 底",
        "ude2": "地",
        "ude3": "得",
        "usuo": "所",
        "udeng": "等 等等 云云",
        "uyy": "一样 一般 似的 般",
        "udh": "的话",
        "uls": "来讲 来说 而言 说来",
        "uzhi": "之",
        "ulian": "连 "  # (“连小学生都会”)
    },
    "e": {  # 16. 叹词(1个一类)
        "e": "叹词"
    },
    "y": {  # 17. 语气词(1个一类)
        "y": "语气词(delete yg)"
    },
    "o": {  # 18. 拟声词(1个一类)
        "o": "拟声词"
    },
    "h": {  # 19. 前缀(1个一类)
        "h": "前缀"
    },
    "k": {  # 20. 后缀(1个一类)
        "k": "后缀"
    },
    "x": {  # 21. 字符串(1个一类,2个二类)
        "x": "字符串",
        "xx": "非语素字",
        "xu": "网址URL"
    },
    "w": {   # 22. 标点符号(1个一类,16个二类)
        "w": "标点符号",
        "wkz": "左括号",  # ( 〔  [  {  《 【  〖 〈   半角:( [ { <
        "wky": "右括号",  # ) 〕  ] } 》  】 〗 〉 半角: ) ] { >
        "wyz": "全角左引号",  # “ ‘ 『
        "wyy": "全角右引号",  # ” ’ 』
        "wj": "全角句号",  # 。
        "ww": "问号",  # 全角:? 半角:?
        "wt": "叹号",  # 全角:! 半角:!
        "wd": "逗号",  # 全角:, 半角:,
        "wf": "分号",  # 全角:; 半角: ;
        "wn": "顿号",  # 全角:、
        "wm": "冒号",  # 全角:: 半角: :
        "ws": "省略号",  # 全角:……  …
        "wp": "破折号",  # 全角:——   --   ——-   半角:---  ----
        "wb": "百分号千分号",  # 全角:% ‰   半角:%
        "wh": "单位符号"  # 全角:¥ $ £  °  ℃  半角:$
    }
}

取重点词

$words = [];
$seg_list = jieba($t, 2);
foreach ($seg_list as $k => $v) {
    $words[] = ['t' => $v, 'w' => $k];
}
$valid_words = [];
$stop_words = [',', ',', '.', '。', '!', '!', '?', '?', ' ', ' '];
$name_words = ['n', 'ng', 'nrt', 'nr', 'ns', 'nt', 'nz', 'j', 'vn'];
foreach ($words as $k => $v) {
    // 动词
    if ($v['t'] == 'v') {
        // 最后一位
        if (!isset($words[$k + 1])) {
            $valid_words[] = $v['w'];
            continue;
        }
        // 后面接标点符号
        if (isset($words[$k + 1]) && $words[$k + 1]['t'] == 'x'
            && in_array($words[$k + 1]['w'], $stop_words)
        ) {
            $valid_words[] = $v['w'];
            continue;
        }
    }
    // 未知词
    if ($v['t'] == 'x' && !in_array($v['w'], $stop_words)) {
        // 后面接名词
        if (isset($words[$k + 1]) && in_array($words[$k + 1]['t'], ['n', 'nr', 'v', 'uj'])) {
            $valid_words[] = $v['w'];
            continue;
        }
        // 接连词+名词
        if (isset($words[$k + 1]) && ($words[$k + 1]['t'] == 'p' || $words[$k + 1]['t'] == 'c')
            && isset($words[$k + 2]) && in_array($words[$k + 2]['t'], $name_words)
        ) {
            $valid_words[] = $v['w'];
            continue;
        }
    }
    // 名词,缩略语
    if (in_array($v['t'], $name_words)) {
        $valid_words[] = $v['w'];
    }
}
$valid_words_str = implode(" ", $valid_words);

使用php转换URL从相对路径到绝对路径

Transfrom relative path into absolute URL using PHP

function rel2abs($rel, $base)
{
    /* return if already absolute URL */
    if (parse_url($rel, PHP_URL_SCHEME) != '')
        return ($rel);

    /* queries and anchors */
    if ($rel[0] == '#' || $rel[0] == '?')
        return ($base . $rel);

    /* parse base URL and convert to local variables: $scheme, $host, $path, $query, $port, $user, $pass */
    extract(parse_url($base));

    /* remove non-directory element from path */
    $path = preg_replace('#/[^/]*$#', '', $path);

    /* destroy path if relative url points to root */
    if ($rel[0] == '/')
        $path = '';

    /* dirty absolute URL */
    $abs = '';

    /* do we have a user in our URL? */
    if (isset($user)) {
        $abs .= $user;

        /* password too? */
        if (isset($pass))
            $abs .= ':' . $pass;

        $abs .= '@';
    }

    $abs .= $host;

    /* did somebody sneak in a port? */
    if (isset($port))
        $abs .= ':' . $port;

    $abs .= $path . '/' . $rel . (isset($query) ? '?' . $query : '');

    /* replace '//' or '/./' or '/foo/../' with '/' */
    $re = ['#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#'];
    for ($n = 1; $n > 0; $abs = preg_replace($re, '/', $abs, -1, $n)) {
    }

    /* absolute URL is ready! */

    return ($scheme . '://' . $abs);
}