Content parsers¶
Parsers transform raw response bodies into readable text.
ParserManager picks the right parser from the Content-Type header;
each concrete parser handles one media type.
ParserManager¶
Maps Content-Type values to the appropriate :class:~go2web.http.parsers.abstract_parser.Parser.
The registry performs a case-insensitive substring match, so
"text/html; charset=utf-8" correctly resolves to :class:~go2web.http.parsers.html_parser.HTMLParser.
Built-in registry:
| Content-Type | Parser |
|---|---|
text/html |
:class:~go2web.http.parsers.html_parser.HTMLParser |
application/json |
:class:~go2web.http.parsers.json_parser.JSONParser |
text/plain |
:class:~go2web.http.parsers.plain_text_parser.PlainTextParser |
Example
from go2web.http.parsers.parser_manager import ParserManager manager = ParserManager() parser = manager.get_parser("application/json; charset=utf-8") parser.parse('{"key": "value"}') '{\n "key": "value"\n}'
Source code in src/go2web/http/parsers/parser_manager.py
get_parser(content_type)
¶
Return the parser for content_type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
content_type
|
str
|
The value of the |
required |
Returns:
| Name | Type | Description |
|---|---|---|
A |
Parser
|
class: |
Raises:
| Type | Description |
|---|---|
~ParseError
|
When no parser is registered for content_type. |
Source code in src/go2web/http/parsers/parser_manager.py
HTMLParser¶
Bases: Parser
Strip HTML markup and return readable plain text.
Non-content tags (<script>, <style>, <noscript>, <head>)
are removed entirely before text extraction so that rendered output
contains only visible page content.
Example
parser = HTMLParser() parser.parse("
Hello
World
") 'Hello\nWorld'
Source code in src/go2web/http/parsers/html_parser.py
parse(body)
¶
Extract visible text from an HTML document.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
body
|
str
|
Raw HTML string. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Whitespace-normalised plain text with newline separators. |
Source code in src/go2web/http/parsers/html_parser.py
JSONParser¶
Bases: Parser
Parse a JSON response body and return it pretty-printed.
Example
parser = JSONParser() parser.parse('{"name":"go2web","version":"0.1.0"}') '{\n "name": "go2web",\n "version": "0.1.0"\n}'
Source code in src/go2web/http/parsers/json_parser.py
parse(body)
¶
Deserialise body and return it indented with two spaces.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
body
|
str
|
A JSON-encoded string. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Pretty-printed JSON with |
str
|
characters are preserved. |
Raises:
| Type | Description |
|---|---|
~ParseError
|
When body is not valid JSON. |
Source code in src/go2web/http/parsers/json_parser.py
PlainTextParser¶
Bases: Parser
Pass plain-text bodies through with leading/trailing whitespace stripped.
Example
parser = PlainTextParser() parser.parse(" hello world ") 'hello world'
Source code in src/go2web/http/parsers/plain_text_parser.py
parse(body)
¶
Return body stripped of surrounding whitespace.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
body
|
str
|
A plain-text response body. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The body string with leading and trailing whitespace removed. |
Source code in src/go2web/http/parsers/plain_text_parser.py
Parser (abstract)¶
Bases: ABC
Abstract base class for response body parsers.
Subclasses implement :meth:parse to transform a raw response body string
into a human-readable representation suitable for terminal output.
To add a new content type, subclass :class:Parser and register the
instance in :class:~go2web.http.parsers.parser_manager.ParserManager.
Source code in src/go2web/http/parsers/abstract_parser.py
parse(body)
abstractmethod
¶
Parse body and return a human-readable string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
body
|
str
|
The raw response body decoded from bytes. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A cleaned, human-readable representation of body. |
Raises:
| Type | Description |
|---|---|
~ParseError
|
When the body cannot be interpreted as the expected format. |