Practical guide

Convert HTML to Markdown without scripts, styles, and navigation clutter

A reliable HTML-to-Markdown workflow starts with source HTML that already contains the article or documentation body. Extract the main content, remove navigation, cookie banners, scripts, styles, and repeated footer text, then verify heading order, code blocks, tables, links, image paths, and content loaded by JavaScript.

Last reviewed July 17, 2026 · Release 2026-07-17-adsense-r9

Who this guide is for

• Documentation teams migrating static HTML pages
• Editors saving articles or help-center exports as Markdown
• Developers cleaning HTML email, CMS, or generated documentation output
• Teams assessing whether a JavaScript application contains extractable source content

How can you tell whether the HTML contains the real content?

Open the downloaded HTML or use View Source and search for a distinctive sentence from the page. If the sentence appears in the source, conversion can usually reach it. If the file contains only an application shell and JavaScript bundle references, the visible content may require browser rendering, an export, an API, or a different capture workflow.

What should be removed before or after conversion?

Scripts, styles, navigation menus, cookie banners, account controls, repeated breadcrumbs, related-content widgets, advertisements, and footer boilerplate usually do not belong in the Markdown article. Remove them without deleting headings, captions, code examples, footnotes, or meaningful callouts.

How should links and images be normalized?

Relative URLs such as ../images/diagram.png depend on the original site structure. Resolve internal links to their final publishing paths, download or migrate image assets, add descriptive alt text, and remove javascript:, tracking, temporary blob, or signed URLs that will not remain valid.

What happens with code, tables, and semantic HTML?

Preformatted code should become fenced code blocks with a language label when known. Simple tables can become Markdown tables, while nested or layout tables need redesign. Elements such as article, main, section, nav, aside, figure, figcaption, details, and blockquote provide clues, but the output still needs human review.

Before and after example

HTML source

<nav>Docs | Pricing</nav>
<article>
  <h1>Deploy</h1>
  <p>Run the build.</p>
  <pre><code>npm run build</code></pre>
</article>
<footer>Cookie settings</footer>

Main-content Markdown

# Deploy

Run the build.

```sh
npm run build
```

Review checklist after conversion

• Confirm the main article starts at the correct heading.
• Remove navigation, cookie, account, advertisement, and footer boilerplate.
• Check that heading levels form a logical outline.
• Resolve internal links and image URLs for the new destination.
• Verify code fences, language labels, tables, quotes, and captions.
• Check whether JavaScript-loaded content is missing.
• Search for tracking parameters, unsafe URL schemes, or private application data.

Risk boundary

HTML conversion does not execute a full browser application or guarantee that dynamically loaded content is present. Do not assume the saved source includes authenticated, interactive, or client-rendered data. Respect copyright, access controls, robots directives, and the terms of the source site. Use an approved private workflow for private dashboards, customer portals, or sensitive exports.

Frequently asked questions

Can a React or Vue page be converted directly?

Only if the saved HTML contains the rendered content. An application shell without article text needs browser rendering, an export, or an API.

Will scripts and CSS be included in Markdown?

They should be removed because Markdown represents content structure, not executable page behavior or visual styling.

Why are relative links broken after conversion?

They depend on the original directory or domain. Resolve them to durable paths for the new publishing destination.

Can HTML tables become Markdown tables?

Simple rectangular data tables can. Layout tables, nested tables, row spans, and column spans require redesign.

What happens to images?

Image references may be retained, but assets, paths, alt text, permissions, and long-term hosting must be managed separately.

Can I convert a private web application export?

Use an approved private workflow and verify that the export does not contain credentials, customer data, session information, or restricted content.

Related workflows

These links provide the next format, privacy, or review step for this guide.