分类目录归档:Tools

autossh sock5

原理很简单就是用ssh -D建立socks代理,然后把浏览器配置成socks代理就是了。用路由器来做ssh -D的好处是不用在每个设备上(我有一个iphone,两个ipad,一个android手机,n个虚拟机)去做ssh -D。那为什么不直接在远程服务器上提供socks5代理呢?因为socks5不能fq,流量中有关键字的话会被阻断。ssh -D建立的socks proxy是加密的,暂时还不会被阻断。

第一步: 配置服务器/root/.ssh/authorized_keys放你的public key

第二步: 把private key传到openwrt的/root/.ssh/id_rsa

第三步: 把openwrt的ssh客户端从dropbear换成openssh-client,安装autossh

rm /usr/bin/scp
rm /usr/bin/ssh
opkg update
opkg install openssh-client
opkg install autossh

第四步: 配置autossh,/etc/config/autossh 如下:

config autossh
        option ssh        '-i /root/.ssh/id_rsa -N -T -D 192.168.2.1:7080 root@<your_server_ip>'
        option gatetime        '0'
        option monitorport        '20000'
        option poll        '600'

192.168.2.1是你的路由器的lan ip。关键行是-D 192.168.2.1:7080,就是这个建立了一个socks代理。

第五步: 开机自动启动

/etc/init.d/autossh enable
/etc/init.d/autossh start

使用的时候需要连接这个路由器,应该会分配一个192.168.2.x的ip(你自己知道怎么配吧)。然后把socks代理设成192.168.2.1:7080

适合懒人的简单开机自动运行

编辑/etc/rc.local加入

autossh -M 0  -NT -D 0.0.0.0:8888 ssh_host &

waf比较(lua-nginx-module,modsecurity,naxsi)

ngx_lua_waf是一个基于lua-nginx-module(openresty)的web应用防火墙

https://github.com/loveshell/ngx_lua_waf

nginx配合modsecurity实现WAF功能

http://www.52os.net/articles/nginx-use-modsecurity-module-as-waf.html

NAXSI is an open-source, high performance, low rules maintenance WAF for NGINX

https://github.com/nbs-system/naxsi

中、小企业如何自建免费的云WAF

https://zhuanlan.zhihu.com/p/22068364

X-WAF是一款适用中、小企业的云WAF系统,让中、小企业也可以非常方便地拥有自己的免费云WAF。

https://waf.xsec.io/docs

基于openresty的Web应用安全防护系统(WAF)
http://git.oschina.net/miracleqi/OpenWAF

wikimedia导出

最近做搜索引擎性能测试,需要用到一些数据作测试,刚好发现wikimedia提供了xml的导出,而python是最好导入工具。

https://dumps.wikimedia.org/zhwiki/

nutch + hbase + elasticsearch

安装 hbase

下载

不能下载太新的版本,下载地址在: http://mirror.bit.edu.cn/apache/hbase/

cd /data/server
wget http://mirror.bit.edu.cn/apache/hbase/hbase-0.94.27/hbase-0.94.27.tar.gz
tar zxvf hbase-1.1.2-bin.tar.gz
cd hbase-0.94.27

修改配置

修改conf/hbase-site.xml

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///data/data/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>false</value>
  </property>
</configuration>

启动

 bin/start-hbase.sh

运行

 bin/hbase shell

安装nutch

下载

下载地址: http://mirror.bit.edu.cn/apache/nutch/

cd /data/server
wget http://mirror.bit.edu.cn/apache/nutch/2.3/apache-nutch-2.3-src.tar.gz
tar -zxvf apache-nutch-2.3-src.tar.gz
cd apache-nutch-2.3

配置

ivy/ivy.xml

<dependency org="org.apache.gora" name="gora-hbase" rev="0.5" conf="*->default" />  

conf/gora.properties

gora.datastore.default=org.apache.gora.hbase.store.HBaseStore

build

 ant clean
 ant runtime

config

runtime/local/conf/nutch-site.xml

<configuration>

  <property>
    <name>storage.data.store.class</name>
    <value>org.apache.gora.hbase.store.HBaseStore</value>
  </property>

  <property>
    <name>plugin.includes</name>
    <!-- do **NOT** enable the parse-html plugin, if you want proper HTML parsing. Use something like parse-tika! -->
    <value>protocol-httpclient|urlfilter-regex|parse-(text|tika|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|indexer-elastic</value>
  </property>

  <property>
    <name>db.ignore.external.links</name>
    <value>true</value>
    <!-- do not leave the seeded domains (optional) -->
  </property>

  <!-- elasticsearch index properties -->
  <property>
    <name>elastic.host</name>
    <value>localhost</value>
    <description>The hostname to send documents to using TransportClient. Either host and port must be defined or cluster.
    </description>
  </property>

  <property>
    <name>elastic.cluster</name>
    <value>elasticsearch</value>
    <description>The cluster name to discover. Either host and potr must be defined or cluster.
    </description>
  </property>

  <property>
    <name>parser.character.encoding.default</name>
    <value>utf-8</value>
  </property>

  <property>
    <name>http.agent.name</name>
    <value>Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36</value>
  </property>

  <property>
    <name>http.agent.description</name>
    <value>Programer's search</value>
  </property>

  <property>
    <name>http.robots.403.allow</name>
    <value>true</value>
  </property>

  <property>
    <name>http.agent.url</name>
    <value>http://hisearch.cn</value>
  </property>

  <property>
    <name>http.verbose</name>
    <value>true</value>
  </property>

  <property>
    <name>http.accept.language</name>
    <value>zh,zh-CN;q=0.8,en;q=0.6</value>
  </property>

  <property>
    <name>http.agent.version</name>
    <value>0.1</value>
  </property>

</configuration>

runtime/local/conf/hbase-site.xml

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///data/data/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>false</value>
  </property>
</configuration>

run

cd runtime/local
mkdir seed
echo "http://www.cnblogs.com" > seed/urls.txt
bin/nutch inject seed/
bin/nutch generate -topN 10
bin/nutch fetch -all
bin/nutch parse -all
bin/nutch updatedb
bin/craw seed/ testCraw 3
bin/nutch elasticindex elasticsearch -all

elasticsearch highlighting

PUT /my_index

{
  "mappings": {
    "doc_type": {
      "properties": {
        "content": {
          "type": "string",
          "term_vector": "with_positions_offsets",
          "analyzer": "snowball"
        }
      }
    }
  }
}

POST /_search

{
  "query": {
    "multi_match": {
      "query": "公司",
      "type": "best_fields",
      "fields": [
        "title",
        "content"
      ]
    }
  },
  "filter": {
    "term": {
      "site": "baidu.com"
    }
  },
  "highlight": {
    "fields": {
      "content": {
        "fragment_size": 100,
        "number_of_fragments": 2,
        "no_match_size": 100,
        "term_vector": "with_positions_offsets",
        "boundary_chars": " 。,?",
        "max_boundary_size": 80,
        "force_source": true
      }
    }
  }
}

centos elasticsearch 安装

download and install via https://www.elastic.co/downloads/

yum install https://download.elasticsearch.org/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.1.0/elasticsearch-2.1.0.rpm

make data and logs dir

mkdir -p /data/elastic/data
mkdir -p /data/elastic/logs
chown -R elasticsearch:elasticsearch /data/elastic/

edit config /etc/elasticsearch.yml

path.data: /data/elastic/data
path.logs: /data/elastic/logs
network.host: 127.0.0.1

edit start script /etc/init.d/elasticsearch

LOG_DIR="/data/elastic/logs"
DATA_DIR="/data/elastic/data"

install java jdk

yum install java-1.8.0-openjdk

start

systemctl enalbe elasticsearch
/etc/init.d/elasticsearch start

test

/etc/init.d/elasticsearch status
curl http://127.0.0.1:9200/

centos Redis 安装

make data dir for redis

mkdir /data/redis
chown -R redis:redis /data/redis

modify config /etc/redis.conf

daemonize yes
dir /data/redis/
appendonly yes
requirepass mypassword

restart

systemctl enalbe redis
systemctl start redis