大道至简，新一代企业应用无栈开发

平台之上，一种语言，可视化、脚本化、全端一体化开发

开发平台>开发参考>功能服务

开发平台

Location: 开发平台开发参考功能服务

文件分类和转换

文件mime类型分类，以及不同类型文件的转换

docutils document without title

1 文件类型工具 root.mimes
2 站点文档转换 transform
3 文档加工 convert

1 文件类型工具 root.mimes

由于文件mime太复杂，几乎不能靠人肉记住和识别。

使用全局的 root.mimes 可以进行mime管理.

1.1 文件分类

目前对文件有如下分类:

office: 办公文档，下分word、excel、ppt、pdf、project这几个子类
text: 文本文件，下分config、text-data、code、text-document、html、all-text几个子类
image：图片
draw：绘图
2d：2D图
3d：3D图
mail：邮件
zip：压缩包
video：视频
audio：音频

1.2 查找某个mime所在的分类 list_categories

对于文件file_obj，查找所在类型:

categories = root.mimes.list_categories(file_obj.mime_type)

传入文件的mime类型，返回类型列表，比如doc文件返回大类和子类:

['office', 'word']

1.3 查找某个类型的所有mime类型 list_mimes

这在搜索特定类型的文件时会用到:

mimes = root.mimes.list_mimes(category='office')

传入分类的ID，返回mime的列表。

也可以直接查具体某一个子类:

mimes = root.mimes.list_mimes(category='office', sub_category='excel')

1.4 文件能否转换 has_transform

比如文件使用Flash在线预览:

can_preview = root.mimes.has_transform(source_mime=file_obj.mime_type,
        target_mime='application/x-shockwave-flash-x')

其中:

source_mime: 文件原始mime类型
target_mime: 转换之后的mime类型，application/x-shockwave-flash-x 是文档flash查看器需要的mime类型

返回Bool类型，表示能否转换，如果source_mime和target_mime相同也会返回True。

1.5 获得扩展mime的正确后缀 get_extention

不同后缀，理论上属于不同的mime_types，但是python的标准mimetypes库存在如下问题：

mime的国际标准远远不能覆盖所有文件类型，因此文件没有唯一的mimetype
guess_extentions无法从mimetype反推出正确的文件后缀。

所以系统对mimetypes进行了扩展，确保后缀和mime的一一对应：

如果文件后缀没有标准的mimetype定义，自动使用如下的格式:
```
application/x-ext-{EXTENTION_NAME}
```

新增如下方法，用于得到准确的:

extention = root.mimes.get_extention(mime_type)

1.6 获取文件分类的默认预转目标 list_default_targets_for_category

根据分类，来决定 mime-type 默认的预转目标。分类可以通过 list_categories 来获得。:

categories = root.mimes.get_categories('text/html')
print root.mimes.list_default_targets_for_category(categories[0])
# output: ['text/plain', 'text/html']

目前支持的 category 有下面几种

office
audio
video
draw
2d
image
text
zip
mail

2 站点文档转换 transform

上传文件，会自动发起全文索引相关的转换(转换为pdf/txt)。文件预览等，是在首次预览的时候才发起转换。

如果希望发起更多文档转换（如果已经转换，不会再次执行）:

file_obj.transform(targets=['application/pdf'], rm_targets=False)

targets：[可选]转换哪些mime类型，如果不填，会转换默认的一组转换
rm_targets：[可选]如果需要强制转换，删除之前的转换，重新转换

3 文档加工 convert

文档加工是一种特殊的文档转换:

my_pdf_file.convert(mime, subfile, params, driver, callback, error_callback)

其中:

mime：转换目标mime名称。每个文件加工有特殊的mime对应
subfile: 加工完成的文件，存放的文件名. 之后可以通过下面方法读取转换结果数据
params：转换特有的参数，具体参考具体的转换说明
driver: [可选] 如果有多个转换驱动，可以选择非默认的转换驱动。比如如果希望用微软的office转换，应该填写 msoffice 。

在 root.info()['convert_drivers'] 可查看系统支持的转换驱动。
callback: 【可选】转换完成之后的回调脚本，比如脚本 xxx.xxx:xxx 可以这样计算:
```
callback = my_pdf_file.api_url(request, 'xxx.xxx:xxx', internal_=True, auth_=True))
```
可在脚本中得到转换结果:
```
data = my_pdf_file.get_cache(mime, subfile)
```
error_callback: 【可选】转换失败的回掉脚本，这个脚本应该有2个参数:
- error_code：错误代码
  
  -1：文件不存在 -2：转换失败
- error_msg: 详细说明错误的原因

3.1 生成叠加图片的PDF application/x-image-on-pdf

先将图片保存到站点上(img_obj)，为生成的PDF文件指定页面加上这个图片，通常用于签章:

my_pdf.convert('application/x-image-on-pdf',
              subfile="111111.pdf",
              params = {
                          'image_device': img_obj.mdfs_device, # 图片文件的存储设备
                          'image_location': img_obj.mdfs_key, # 图片文件的存储key
                          'width':10, # [可选]图片宽度，单位毫米，不填表示图片默认宽度
                          'height':7.9, # [可选]图片高度，单位毫米，不填表示图片默认高度
                          'pages': [1, 2], # 在哪些页面加图片，如果不填，给所有页面加
                          'x_factor': 0.6, # x向的位置比例， 图片左侧在文档页面占宽比
                          'y_factor': 0.4,  # y向的位置比例，图片上方在文档页面占高比
                       },
              callback = my_pdf.api_url(request, script_name, internal_=True, auth_=True))

3.2 docx文档叠加图片 application/x-image-on-docx

在docx文档指定页面的指定位置，加上图片(通常用于签名、签章):

my_docx.convert('application/x-image-on-docx',
              subfile="111111.docx",
              params = {
                          'image_device': img_obj.mdfs_device, # 图片文件的存储设备
                          'image_location': img_obj.mdfs_key, # 图片文件的存储key
                          'width':10, # [可选]图片宽度，单位毫米，不填表示图片默认宽度
                          'height':7.9, # [可选]图片高度，单位毫米，不填表示图片默认高度
                          'pages': [1, 2], # 在哪些页面加图片
                          'x_factor': 0.6, # x向的位置比例， 图片左侧在文档页面占宽比
                          'y_factor': 0.4,  # y向的位置比例，图片上方在文档页面占高比
                       },
              callback = my_docx.api_url(request, script_name, internal_=True, auth_=True))

这个转换，默认使用Libreoffice进行转换。

如果系统支持msoffice转换引擎，也可以传递 driver 参数强制使用MS Office进行更高精度转换转换:

my_docx.convert('application/x-image-on-docx',
              subfile="111111.docx",
              driver="msoffice",
              ...
              )

3.3 xlsx 文档叠加图片 application/x-image-on-xlsx

在xlsx文档指定页面的指定位置，加上图片(通常用于签名、签章):

my_xlsx.convert('application/x-image-on-xlsx',
              subfile="111111.xlsx",
              params = {
                          'image_device': img_obj.mdfs_device, # 图片文件的存储设备
                          'image_location': img_obj.mdfs_key, # 图片文件的存储key
                          'width':10, # [可选]图片宽度，单位毫米，不填表示图片默认宽度
                          'height':7.9, # [可选]图片高度，单位毫米，不填表示图片默认高度
                          'pages': [1, 2], # 在哪些页面加图片
                          'x_factor': 0.6, # x向的位置比例， 图片左侧在文档页面占宽比
                          'y_factor': 0.4,  # y向的位置比例，图片上方在文档页面占高比
                       },
              callback = my_xlsx.api_url(request, script_name, internal_=True, auth_=True))

这个转换，默认使用Libreoffice进行转换。

如果系统支持msoffice转换引擎，也可以传递 driver 参数强制使用MS Office进行更高精度转换转换:

my_xlsx.convert('application/x-image-on-xlsx',
              subfile="111111.xlsx",
              driver="msoffice",
              ...
              )

3.4 生成给PDF加文字水印 application/x-text-watermark-pdf

如果任何一个文档生成铺满水印文字pdf文件，可以这样:

my_docx.convert('application/x-text-watermark-pdf',
              subfile="111111.pdf",
              params = {
                      'text':'default', # 水印文本
                      'font':'宋体' # 字体名称
                      'size': 20, # 字体大小,
                      'color':'#808080' # 字体颜色
                      },
              callback = my_docx.api_url(request, script_name, internal_=True, auth_=True))

3.5 对PDF/图片OCR文字识别 text/ocr

识别出文字，可以这样:

my_docx.convert('text/ocr',
              subfile="111111.txt",
              params = {
                      'language':'zh', # 识别的语言
                      },
              callback = my_pdf.api_url(request, script_name, internal_=True, auth_=True))

3.6 根据docx模板生成文档 application/x-template-docx

使用docx模板，可以更改文档中的特定标记，动态生成文档:

my_docx.convert('application/x-template-docx',
              subfile="111111.docx",
              params = {
                  'vars':{'title':{'type':'text', 'value':'我们很好'}, # type为text表示文本
                          'sign':{'type':'image', 'value':[device_name, device_key]}, # type为image表示图片
                          'number':{'type':'barcode', 'value':'1231231', 'width':200, 'height':200} # type为barcode 表示条码
                          'number2':{'type':'qrcode', 'value':'123122',  'width':200, 'height':200}, # type为qrcode 表示二维码
                          'rows':[{  # list，表示多行变量
                                  'name':{'type':'text', 'aa':'123122'}},  # 第一个文本多行变量
                                  'avatar':{'type':'image', 'value':[device_name, device_key], # 第二个图片多行变量
                                  }],
                          },
                      },
              callback = my_pdf.api_url(request, script_name, internal_=True, auth_=True))

生成参数 vars 形式为 {"替换变量":{'type':替换类型, 'value':替换数据, ...} ，其中替换类型包含:

text: 文本替换，直接将变量替换为指定的文字
image：图片替换，将变量替换为指定位置的图片
barcode：条码替换，将变量替换为包含指定信息的条码
qrcode：二维码替换，将变量替换为包含指定信息的二维码

模板必须为docx文件，其中可变变量用 {{title}} 替换，支持 Jinja2 语法，详细参看:

https://github.com/elapouya/python-docx-template/tree/master/tests/templates

3.7 图片缩略图

my_image.convert('image/x-thumbnail-png',
              callback = my_pdf.api_url(request, script_name, internal_=True, auth_=True))

默认会生成 image_large、image_preview、image_mini、image_thumb、image_tile 这几个不同大小（控制台可以设置对于分辨率）

3.8 pdf转分页图片

可以这样:

my_pdf.convert('image/png',
              callback = my_pdf.api_url(request, script_name, internal_=True, auth_=True))

每页图片命名为 transformed-0.png transformed-1.png

在回掉脚本中，得到图片:

my_pdf.get_cache('image/png', 'transformed-1.png')

3.9 查看文档信息

my_pdf.convert('application/json',
              callback = my_pdf.api_url(request, script_name, internal_=True, auth_=True))

在回掉脚本中，得到文件信息:

my_pdf.get_cache('application/json')

返回的数据形式:

{'numpages': count,  # 页数
    'version':'1.0.0',
    'dimensions':{'exceptions':[], 'width': 595, 'height': 842}, # 页面尺寸
    'links':[]} # 目录树