HTML Content
HTML (HyperText Markup Language) content stored as VARCHAR. Detected by the presence of HTML tags (<p>, <div>, <a href=, <br>, <img>, etc.). Unlike XML, HTML5 allows unclosed tags, unquoted attributes, optional closing tags, and void elements. Common in CMS exports, email templates, web scraping data, and rich text fields.
HTML Content
container.object.htmlHTML (HyperText Markup Language) content stored as VARCHAR. Detected by the presence of HTML tags (<p>, <div>, <a href=, <br>, <img>, etc.). Unlike XML, HTML5 allows unclosed tags, unquoted attributes, optional closing tags, and void elements. Common in CMS exports, email templates, web scraping data, and rich text fields.
Domain
container
Category
object
Casts to
VARCHAR
Scope
Universal
Try it
CLI
$ finetype infer -i "<p>Hello world</p>"
→ container.object.htmlDuckDB
Detect
SELECT finetype('<p>Hello world</p>');
-- → 'container.object.html'Cast expression
REGEXP_REPLACE({col}, '<[^>]+>', '', 'g')Safe cast pipeline
-- Normalise and cast in one step
SELECT TRY_CAST(finetype_cast(my_column) AS VARCHAR) AS clean_value
FROM my_table
WHERE finetype(my_column) = 'container.object.html';Struct Expansion
Expression
tag_count: CAST(REGEXP_COUNT({col}, '<[a-zA-Z][^>]*>') AS INTEGER)
text_content: REGEXP_REPLACE({col}, '<[^>]+>', '', 'g')JSON Schema
finetype schema container.object.html
{
"$id": "https://meridian.online/schemas/container.object.html",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "HTML (HyperText Markup Language) content stored as VARCHAR. Detected by the presence of HTML tags (<p>, <div>, <a href=, <br>, <img>, etc.). Unlike XML, HTML5 allows unclosed tags, unquoted attributes, optional closing tags, and void elements. Common in CMS exports, email templates, web scraping data, and rich text fields.",
"examples": [
"<p>Hello world</p>",
"<div class=\"test\"><a href=\"url\">link</a></div>",
"<br><img src=\"photo.jpg\">",
"<h1>Title</h1><p>Content here.</p>",
"<ul><li>Item 1</li><li>Item 2</li></ul>",
"<table><tr><td>Cell</td></tr></table>"
],
"minLength": 3,
"pattern": "^.*<(p|div|span|a|br|img|h[1-6]|ul|ol|li|table|tr|td|th|strong|em|b|i|form|input|button|select|textarea|header|footer|nav|section|article|main|aside|figure|figcaption|blockquote|pre|code|script|style|link|meta|head|body|html)[\\s>/ ].*$",
"title": "HTML Content",
"type": "string",
"x-finetype-broad-type": "VARCHAR",
"x-finetype-transform": "REGEXP_REPLACE({col}, '<[^>]+>', '', 'g')"
}Examples
<p>Hello world</p><div class="test"><a href="url">link</a></div><br><img src="photo.jpg"><h1>Title</h1><p>Content here.</p><ul><li>Item 1</li><li>Item 2</li></ul><table><tr><td>Cell</td></tr></table>Aliases
html_contenthtml_fragment