Good Document for Knowledge Base
The quality of automatically generated answers is closely linked to the information provided in the knowledge base. The generation solution works with text, which means the information must be in textual format. Only specific text segments are selected to create responses, so a good document must be logically structured for the required information to be found at all.
A good knowledge base document is a simple, structured text document with sequential headings and textual content. Content may include short or medium-length lists, small simple tables, and text-related images with explanatory descriptions.

Good and Bad Examples
Documents and Content
✅ Good
- Structured, thoughtful, focused, clean information.
- Up-to-date information.
- Text documents (Word, PDF, Markdown) with structured content.
❌ Bad
- More is better. Everything available is included.
- Both old and new versions – duplicate data.
- Presentations, Excel files, PDFs with many ad images, heavy graphics, Word files with complex long tables, or mixed layouts.
Tables
1. Descriptions
✅ Good: Table with a clear description
Provide context before the table so the reader knows what it contains.
❌ Bad: Table without a description
A table without a description is ambiguous. The reader is forced to guess its purpose, which can lead to misinterpretation. It won’t be clear what the table is about.
2. Simplicity
✅ Good: Simple table
Simple, clean tables with a clear structure are easily understood and processed.
❌ Bad: Complex table
Complex tables with merged cells, nested structures, or excessive columns will not be converted correctly and will confuse the reader.
The table will not be correctly converted to markdown format, the information will be incomplete, and the desired answer will not be generated.

3. Length
✅ Good: Short, focused table
Keep tables concise and focused on a single topic. If you have more data, it's better to split it into multiple smaller tables, each with its own description.
❌ Bad: Long table
Extremely long tables are difficult to read and process.
Will be split into multiple segments; full answer won’t be generated.
Image Examples
✅ Good: Image with a clear description
Provide a clear, descriptive caption. If the image is available online, you don't need to embed it; you can simply reference it with a link.
Image: Vanilla ice cream scoops in a waffle cone

❌ Bad: Image without a description
An image without a caption, alt-text, or any surrounding context is ambiguous. It's inaccessible and cannot be understood by search or AI systems. The relevant segment won’t be found.

You can reference an online image in several ways:
- With descriptive text: See the ice cream dessert image.
- As a formatted Markdown link: Ice cream dessert
- As a raw URL: https://upload.wikimedia.org/wikipedia/commons/3/31/Ice_Cream_dessert_02.jpg
List Examples
✅ Good: Short or medium-length list Use simple, ordered or unordered lists for items that are easy to scan. They are perfect for steps, ingredients, or key features.
The most commonly used strawberry varieties for making strawberry ice cream:
- Senga Sengana
- Polka
- Elsanta
- Korona
- Honeoye
❌ Bad: Long or complex lists
Very long lists or multi-level lists with extended, paragraph-length explanations are hard to read and process.
Data will be split into multiple segments; answers will be incomplete.