深入理解HTTP协议的文件上传
Content-Type介绍
Content-Type实体头部用于指示资源的MIME类型(Multipurpose Internet Mail Extensions)。MIME一般称为媒体类型(media type)或是内容类型(content type);是指示文件类型的字符串,与文件一起发送,例如:一个声音文件可能被标记为audio/ogg,一个图像文件可能是image/png。例子:
1 | |
上传文件时的Content-Type
multipart/form-data和application/octet-stream是两种不同的HTTPContent-Type类型,它们分别用于不同的文件上传情况:
multipart/form-data是一种用于在HTTP请求中传输表单数据和文件的标准方法。使用这个类型时,HTTP请求会被分成多个部分,每个部分包含一个表单字段或文件数据。这些部分会使用特定的分隔符(boundary)分隔开来,以便服务器能够正确地解析请求。
application/octet-stream是一种通用的MIME类型,表示二进制数据流。通常用于传输不带任何元数据的二进制数据,比如图像、音频、视频等文件。当使用
application/octet-stream时,HTTP请求的Body直接包含二进制数据流,而没有其他任何信息。
application/octet-stream例子
The octet-stream subtype is used to indicate that a body contains arbitrary binary data. which has two optional parameters
TYPEandPADDING.
通过HTTP PUT请求向华为OBS对象存储上传文件时,文件内容就是PUT请求Body的所有内容
1 | |
注释:
If you pass afile objectasdataparameter, aiohttp will stream it to the server automatically. streaming-uploads
Definition of multipart/form-data
In many applications, it is possible for a user to be presented with a form. The user will fill out the form, including information that is typed, generated by user input, or included from files that the user has selected. When the form is filled out已填入, the data from the form is sent from the user to the receiving application. The definition of multipart/form-data is derived from one of those applications.
HTML常见表单元素:
- 文本框:
<input type="text">- 密码框:
<input type="password">- 复选框:
<input type="checkbox">- 单选框:
<input type="radio">- 下拉列表:
<select>- 文本区域:
<textarea>表单提交时数据可以通过两种方法提交到服务器:
GET和POST。GET方法将表单数据添加到URL的末尾,适用于小量非敏感数据。POST方法将表单数据包含在HTTP请求体中,适用于大量或敏感数据。表单数据在提交前需要进行编码。HTML表单支持两种编码类型:
application/x-www-form-urlencoded和multipart/form-data。前者用于普通表单数据(键值对),后者用于包含文件上传的表单。在
application/x-www-form-urlencoded格式中,表单数据被编码为 key-value 对:key 和 value 之间用等号=连接,不同的 key-value 对之间用&符号分隔。这种格式还会对某些字符进行URL 编码(也称为百分比编码),例如空格会被编码为+, 特殊字符@会被编码为%40。
A multipart/form-data body contains a series of parts separated by a boundary.
BoundaryParameter ofmultipart/form-data
As with other multipart types, the parts are delimited with aboundarydelimiter, constructed using CRLF, –, and the value of the boundary parameter.Content-DispositionHeader Field for Each Part
Each part MUST contain aContent-Dispositionheader field RFC2183 where the disposition性情,布置,处置 type is form-data. TheContent-Dispositionheader field MUST also contain an additional parameter of name; the value of the name parameter is the original field name from the form (possibly encoded; see Section 5.1).
In most multipart types, theMIMEheader fields in each part are restricted toUS-ASCII; for compatibility with those systems, file names normally visible to users MAY be encoded using thepercent-encodingmethod.Content-TypeHeader Field for Each Part
Each part MAY have an (optional)Content-Typeheader field, which defaults totext/plain. If the contents of a file are to be sent, the file data SHOULD be labeled with an appropriate media type, if known, orapplication/octet-stream.The
CharsetParameter fortext/plainForm Data
In the case where the form data is text, the charset parameter for thetext/plainContent-Type MAY be used to indicate the character encoding used in that part:1
2
3
4
5
6
7--AaB03x
content-disposition: form-data; name="field1"
content-type: text/plain;charset=UTF-8
content-transfer-encoding: quoted-printable
Joe owes =E2=82=AC100.
--AaB03xContent-Transfer-Encoding用来说明数据的编码方式,以适应不同的传输协议。因为有些传输协议并不设计来处理二进制数据或特殊字符,因此需要使用特定的编码方式,比如Base64或Quoted-Printable,以确保数据可以在发送和接收时保持完整。
例如,发送一个包含非ASCII字符的HTML邮件,需要使用Content-Transfer-Encoding: quoted-printable来确保所有的字符都可以被正确地传输。如果有附件(比如图像或PDF文件),需要使用Content-Transfer-Encoding: base64来发送这些二进制文件。Base64和Quoted-Printable这两种编码方法的主要目的都是将非ASCII或二进制数据转换为可以在ASCII环境下处理的格式,从而使得这些数据可以通过电子邮件等只支持ASCII的网络协议进行传输。电子邮件最初设计的时候,只针对文本信息的传输。Base64:一种基于64个可打印字符来表示二进制数据的方法。用于处理二进制数据,特别是那些包含字节对齐区别的复杂数据。Quoted-Printable:又称可打印引用编码法,主要用于对邮件中的非ASCII字符进行编码。它会将非ASCII字符转换成=后面跟着两个十六进制数的形式。
Other
Content-Header Fields
Themultipart/form-datamedia type does not support any MIME header fields in parts other thanContent-Type,Content-DispositionandContent-Transfer-Encoding.
multipart/form-data脚本例子
可以通过python的aiohttp模块来发送Multipart-encoded files:
1 | |
Wirshark抓取的一次上传文件交互过程如下:
1 | |
X-Content-Type-Options: nosniff 含义如下:
The X-Content-Type-Options response HTTP header is a marker used by the server to indicate that the MIME types advertised in the Content-Type headers should be followed and not be changed. The header allows you to avoid MIME type sniffing嗅探 by saying that the MIME types are deliberately故意的 configured.
Percent-Encoding Option:
percent-encoding (as defined in RFC3986) is offered as a possible way of encoding characters in file names that are otherwise disallowed, including non-ASCII characters, spaces, control characters, and so forth诸如此类,等等. The encoding is created replacing each non-ASCII or disallowed character with a sequence, where each byte of the UTF-8 encoding of the character is represented by a percent-sign (%) followed by the (case-insensitive) hexadecimal[ˌheksəˈdesɪml] of that byte.