# 🐻 CensorBear

CensorBear 是一个可插拔的内容审核管理工具。

## Installation
Add this line to your application's Gemfile:

```ruby
gem 'censor_bear'
```

And then execute:
```bash
$ bundle
```

Or install it yourself as:
```bash
$ gem install censor_bear
```

Then execute to copy migrations:

```bash
rails censor_bear:install:migrations
```

config in `config/initializers/censor_bear.rb`，configuration items can be found in the `lib/censor_bear/configuration.rb`

```rb
CensorBear.configure do |config|
  config.user_class = 'User'
  config.user_name_method = 'nickname'

  config.aliyun_green_access_key_id = 'access key id'
  config.aliyun_green_access_key_secret = 'access key  secret'
  config.aliyun_green_enable_internal = false

  config.main_app_root_path_method = 'root_path'
  config.main_app_user_path_method = 'edit_user_password_path'
end
```

## I18n 配置
复制i18n文件到`config/locales/censor_bear.yml`
```yml
"zh-CN":
  censor_log:
    action:
      mod: 审核
      banned: 禁用
      ignore: 忽略
      replace: 替换
      block: 禁用
      pass: 忽略
      review: 审核
    aasm_state:
      pending: 待审
      handled: 已处理
      suspended: 暂缓处理
    stage:
      check_search: 搜索检查
      aliyun_check: 阿里云检查
      tencent_check: 腾讯云检查
      local_check: 本地检查
      qq_regex: QQ正则检查
      wechat_regex: 微信正则检查
    label:
      normal: 正常文本
      spam: 含垃圾信息
      ad: 广告
      politics: 涉政
      terrorism: 暴恐
      abuse: 辱骂
      porn: 色情
      flood: 灌水
      contraband: 违禁
      meaningless: 无意义
      harmful: 不良场景
      customized: 自定义
```

## 🦆 需要实现的钩子方法
```ruby
censor_show_path # 被审资源路由详情
censor_images # 被审资源图片关联资源
censor_remove # 删除内容
censor_undo_remove # 撤销删除内容
censor_approve # 通过审核
censor_undo_approve # 撤销通过审核
censor_ban_user # 封禁用户
```

## origin 工作原理

### 文字

在标记的时候， 首先

## 用法
```ruby

```

## Exceptions
检测到被ban的关键词，会抛出`NotPassedException`异常，禁止用户继续创建内容，可在controller捕获该方法处理好返回值。

```ruby
class NotPassedException < StandardError; end
```


## Authentication
### Devise
```ruby
authenticate :user, ->(user) { user.admin? } do
  mount CensorBear::Engine => "/censor_bear", as: "censor_bear"
end
```

### Other(todo)
Specify a before_action method to run in `censor_bear.yml`.
```ruby
before_action_method: require_admin
```
You can define this method in your ApplicationController.
```
def require_admin
  # depending on your auth, something like...
  redirect_to root_path unless current_user && current_user.admin?
end
```
Be sure to render or redirect for unauthorized users.


## 数据库字段要求
- is_approved
- 支持软删


## 注意事项
1. 为 MOD 时，表示需要人工审核，需要被审查对象(UGC内容)支持`is_approved`字段（默认为`true`），此时内容自己能看到，别人暂时看不到，需审核通过后可见。
2. 被审查对象可以通过软删实现，隐藏（目的是防止错判，撤销），删除的内容对自己和别人都是不可见的。
3. 建立回收站机制，设定保护期，定时永久删除过期内容。
4. 忽略，表示因为某种原因暂时不做处理，针对待处理任务临时隐藏，可通过条件筛选重新查找处理。一些犹豫不觉得任务可以做搁置处理。此时用户状态是自己可见，别人不可见。
5. 对于UGC内容有审核机制(MOD状态)，对于用户昵称、个人描述、头像等处罚后直接封禁或重置。
6. 被BAN掉的内容，无法创建，直接返回异常，速错。


## Contributing

## 关于 engine 的前端开发

1. 使用了 tailwindcss  简化， 需要在开发的时候运行 yarn watch_tailwind


Contribution directions go here.

## License
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).

## 2022年07月20日开发要求

StopWord 规定的 type 有 %w[ugc username signature dialog nickname]

### UGC
censor type 为 ugc

第一优先级为 StopWord, 当 StopWord 为 banned 的时候， 中断处理

接下来是 aliyun 的检测 

当阿里云为 banned 的时候， 中断处理

**中断处理** 即为拒绝发布


- 

### 个人信息

即 `username`、`signature`、`nickname` 

分为两个部分：


#  Text authen

会对文字信息进行两重检测， 分别为 

1. 本地的 stop_word 的检测
在检测过程中， 如果命中检测， 则会生成对应的日志 censor_log。 命中检测后，结果可能为三种结果： 分别是  `banned`, `replace`, `mod`

当检测结果为 banned 的时候, 会抛出异常 NotPassedException， 整个检测会直接退出
当检测结果为 replace 的时候, 会更改文字的信息， 替换为安全文字

在模型中  可以直接指定 

`text_authen :对应字段, type: , options = {}`

**options  可选参数**

- callback_func_when_mod  方法名称， 即当检测为结果为 mod 的时候， 可以接的钩子方法
- ignore_aliyun_check 是否忽略阿里云的检测
- save_log 是否存日志

## 示例

```ruby
class Post
  text_authen :content, type: :ugc, callback_func_when_mod: :callback_func_when_mod

  def callback_func_when_mod(result, options)
    # options 即为检查时候的options   result 为  Censor 中的 Result
  end
end
```

即检测 模型的 `content` 字段


# Image Authen

会对图像进行审核 ， 注意，由于大部分图像都是需要进行一定的加密算法， 故图像不会像text_authen 一样直接指定对应的  attribute, 而是需要实现一个方法来进行生成图片的url

每当对应模型 保存成功（after_save）后， 系统会调用模型的 `fetch_need_censor_image_urls` 方法 获取需要审核的图片 urls   而该方法会寻找 指定的`aliyun_image_authen` 的 options 参数 `fetch_urls_func` 的方法返回的urls. 我们可以在模型中复写 `censor_images_trigger_condition` 来指定检测的条件， 如果返回 true， 则会触发检测。

## 调用方法

分下列几部调用

1. 在模型上使用 `aliyun_image_authen`

```ruby
aliyun_image_authen type: "对应 type", options 
```


**options 可选参数**
- scenes  默认检测场景， 如不填写， 则默认 ["porn", "terrorism"] 
- callback_url 阿里云检测后的回调的地址， 如果不填写， 则默认是 "#{CensorBear.config.aliyun_green_callback_domain}/censor_bear/aliyun_check/image_callback". 注意， 这里的默认地址已经包含了对阿里云的返回进行标准格式化， 并将结果传输至 model 的 `image_censor_answer_func(image_censor_answer_dict)`
- fetch_urls_func 获得审核 urls 的方法, 因为大多数情况， 图片的地址是需要计算出授权地址的， 所以需要一个计算后， 才能拿到 url


2. 根据情况 对模型复写以下方法`image_censor_answer_func(image_censor_answer_dict)`。
该方法是 当阿里云发送回调， CensorBear 处理完其格式后， 对模型进行的一个调用。我们可以根据这个image_censor_answer_dict, 对业务场景进行编辑， 比如 直接删除， 或者替换 url（为何不自动化处理 替换 url? 因为 image_url 是算出来的具有时效性的，而不是直接存储在属性里的， 为保持灵活性， 所以没做自动化处理）


# 直接调用方式

直接调用方式则可以在 各种 rake 中直接进行检测

## text authen

```ruby
CensorBear::Censor.new(content, type, options).check_text

# 其中 options 需要指定 record， 如对 current_user 的自我介绍进行扫描
# 则需要 options[:record] = current_user

User.all.each do |user|
  CensorBear::Censor.new(user.profile_content, "ugc", record: user).check_text
  # 若想跨过阿里云检测， 只进行本地测试， 则可以
  CensorBear::Censor.new(user.profile_content, "ugc", record: user, ignore_aliyun_check: true).check_text
end

```


## image authen

```ruby
CensorBear::Censor.new(image_urls, type, options).check_image


Post.all.each do |post|
  CensorBear::Censor.new(post.image_urls, "ugc", record: post).check_image
end

```


# 要求
1. 针对 UGC 内容，通过 censor bear 在发布之前检测
  1. 是否有踩红线的文本内容，有的话直接拒绝发布
  2. 没有踩红线但是有模糊不好确定的（通过返回的检测结果类别断定），经过标记后人工复查
  3. 判断配图是否合法，可以放到异步任务中触发，有问题在展示的时候直接阻断显示 放 内容违规 默认占位图，发送站内信
  4. 定期通过rake任务，扫描已发布内容，标记出有问题的内容，人工处理
2. 针对用户个人信息，创建 和 编辑用户个人资料时检测
  1. 检测昵称、个人信息有违规 创建时直接重置为随机字串(不打断用户注册流程)，编辑时则报错拒绝保存
  2. 头像、个人页背景图异步检测，也是在创建、编辑时触发，有问题则直接阻断重置头像、背景图为默认图片，发送站内信
  3. 定期通过rake任务，扫描已注册用户信息，标记出有问题的内容，人工处理