将 Word 文档中的选择题转换为 Moodle XML 格式（支持图片、换行和格式化编号）

moodle的word处理程序和用户名密码文件

背景

在日常教学或在线考试中，我们经常需要将 Word 文档中的试题导入到 Moodle 中。Moodle 支持多种试题导入格式，其中 XML 格式 是最灵活且功能最强大的一种。然而，手动编写 XML 文件非常繁琐，尤其是当题目中包含图片、换行或复杂格式时。

为了简化这一过程，我编写了一个 Python 程序，能够自动将 Word 文档中的选择题（包括题目和选项中的图片、换行）转换为 Moodle XML 格式，并支持格式化题目编号（如 01、02 等）。

程序功能

读取 Word 文档:
- 支持 .docx 格式的 Word 文档。
- 提取文档中的选择题（题目和选项）。
处理图片:
- 提取文档中的图片，并将其转换为 Base64 编码。
- 获取图片的缩放大小（宽度和高度），并嵌入到 XML 文件中。
处理换行:
- 支持题目和解析中的换行符，并将其转换为 HTML 的 <br> 标签。
格式化题目编号:
- 题目编号从 01 开始，保持整齐。
生成 Moodle XML 文件:
- 自动生成符合 Moodle 标准的 XML 文件。
- 支持题目和选项中的图片、换行，并保留图片的显示大小。

技术细节

1. 依赖库

python-docx: 用于读取 Word 文档。
xml.etree.ElementTree: 用于生成 XML 文件。
base64: 用于将图片转换为 Base64 编码。

安装依赖库：

pip install python-docx

2. 核心逻辑

提取图片:
- 通过 docx 库的 part.rels 获取文档中的所有图片。
- 使用 base64 将图片转换为 Base64 编码。
- 通过解析 wp:extent 元素获取图片的尺寸（宽度和高度）。
处理题目和选项:
- 遍历文档的段落，识别题目和选项。
- 使用 process_runs 函数处理段落中的文字和图片，确保图片和文字的顺序正确。
处理换行:
- 使用标志变量（如 is_question 和 is_feedback）跟踪当前正在处理的内容（题目或解析）。
- 将多个段落合并为一个完整的题目或解析内容，并使用 <br> 标签表示换行。
格式化题目编号:
- 使用 f"{idx:02d}" 将题目编号格式化为两位数（如 01、02）。
生成 XML 文件:
- 使用 xml.etree.ElementTree 构建 XML 结构。
- 将题目、选项、图片和解析嵌入到 XML 文件中。

3. 代码结构

extract_images_from_docx: 提取文档中的图片及其尺寸。
process_runs: 处理段落中的文字和图片。
extract_questions_from_docx: 提取题目和选项。
create_moodle_xml: 生成 Moodle XML 文件。
main: 主函数，读取文档并生成 XML 文件。

使用方法

准备 Word 文档:
- 确保文档中的选择题格式规范：
  - 题目：以 [题目] 开头，支持换行。
  - 选项：以 A.、B.、C.、D. 开头。
  - 正确答案：在选项后添加 [答案] 标记。
  - 解析：以 [解析] 开头，支持换行。
运行程序:
- 将 Word 文档放在程序所在目录。
- 运行程序：
```
python convert_to_moodle_xml.py
```
- 程序会生成与 Word 文档同名的 .xml 文件。
导入 Moodle:
- 登录 Moodle，进入题库管理页面。
- 选择“导入”功能，上传生成的 XML 文件。
- 检查题目和图片是否正确显示。

示例

Word 文档内容

[题目]1. 以下哪个是中国的首都？
A. 北京
B. 上海
C. 广州
D. 深圳
[答案] A
[解析] 现代中国的首都是北京。
北京已经作为首都几百年了。

[题目]2. 以下哪个是中国的国旗？
A. [图片]
B. [图片]
C. [图片]
D. [图片]
[答案] A
[解析] 中国的国旗是五星红旗。

生成的 XML 文件

<quiz>
  <question type="multichoice">
    <name>
      <text>Question 01</text>
    </name>
    <questiontext format="html">
      <text>1. 以下哪个是中国的首都？</text>
    </questiontext>
    <answernumbering>ABCD</answernumbering>
    <answer fraction="100">
      <text>北京</text>
    </answer>
    <answer fraction="0">
      <text>上海</text>
    </answer>
    <answer fraction="0">
      <text>广州</text>
    </answer>
    <answer fraction="0">
      <text>深圳</text>
    </answer>
    <generalfeedback format="html">
      <text>现代中国的首都是北京。<br>北京已经作为首都几百年了。</text>
    </generalfeedback>
  </question>
  <question type="multichoice">
    <name>
      <text>Question 02</text>
    </name>
    <questiontext format="html">
      <text>2. 以下哪个是中国的国旗？</text>
    </questiontext>
    <answernumbering>ABCD</answernumbering>
    <answer fraction="100">
      <text><img src="data:image/png;base64,..." alt="image" width="50" height="30" /></text>
    </answer>
    <answer fraction="0">
      <text><img src="data:image/png;base64,..." alt="image" width="50" height="30" /></text>
    </answer>
    <answer fraction="0">
      <text><img src="data:image/png;base64,..." alt="image" width="50" height="30" /></text>
    </answer>
    <answer fraction="0">
      <text><img src="data:image/png;base64,..." alt="image" width="50" height="30" /></text>
    </answer>
    <generalfeedback format="html">
      <text>中国的国旗是五星红旗。</text>
    </generalfeedback>
  </question>
</quiz>

注意事项

图片尺寸:
- 如果 Word 文档中的图片没有尺寸信息，程序会跳过尺寸设置。
- 可以手动调整 <img> 标签的 width 和 height 属性。
正确答案标记:
- 确保在正确答案后添加 [答案] 标记。
兼容性:
- 程序目前仅支持 .docx 格式的 Word 文档。

总结

通过这个 Python 程序，我们可以轻松地将 Word 文档中的选择题转换为 Moodle XML 格式，并支持图片、换行和格式化编号。这不仅提高了工作效率，还确保了题目在 Moodle 中的显示效果与 Word 文档一致。

如果你有类似的需求，可以尝试使用这个程序，或者根据实际需求进行修改和扩展。

代码

以下是完整的 Python 代码：

import os
import base64
from docx import Document
from xml.etree.ElementTree import Element, SubElement, tostring
from xml.dom import minidom

def extract_images_from_docx(doc):
    """提取 Word 文档中的所有图片，并返回图片字典（关系ID: (Base64 编码, 宽度, 高度)）"""
    images = {}
    for rel in doc.part.rels.values():
        if "image" in rel.target_ref:
            rel_id = rel.rId  # 获取关系ID
            image_data = rel.target_part.blob
            base64_image = base64.b64encode(image_data).decode("utf-8")
            # 获取图片尺寸
            width, height = None, None
            for paragraph in doc.paragraphs:
                for run in paragraph.runs:
                    if run.element.xpath(".//a:blip/@r:embed") == [rel_id]:
                        extent = run.element.xpath(".//wp:extent")
                        if extent:
                            width = int(extent[0].get("cx"))  # 宽度（单位：EMU）
                            height = int(extent[0].get("cy"))  # 高度（单位：EMU）
                            # 将 EMU 转换为像素（1 EMU = 1/914400 英寸，假设 96 DPI）
                            width_px = int(width / 914400 * 96)
                            height_px = int(height / 914400 * 96)
                            break
                if width is not None and height is not None:
                    break
            images[rel_id] = (base64_image, width_px, height_px)
    return images

# def process_runs(runs, images):
#     """处理段落中的 runs，保留文字和图片的顺序，并设置图片大小"""
#     content = ""
#     for run in runs:
#         # 添加文本
#         if run.text:
#             content += run.text
#         # 检查是否有图片
#         drawing_elements = run.element.xpath(".//w:drawing")
#         if drawing_elements:
#             image_rId = run.element.xpath(".//a:blip/@r:embed")[0]
#             if image_rId in images:
#                 base64_image, width_px, height_px = images[image_rId]
#                 # 插入图片的 HTML 标签，并设置大小
#                 content += f'<img src="data:image/png;base64,{base64_image}" alt="image" width="{width_px}" height="{height_px}" />'
#     return content

def process_runs(runs, images):
    """处理段落中的 runs，保留文字和图片的顺序，并设置图片为响应式"""
    content = ""
    for run in runs:
        # 添加文本
        if run.text:
            content += run.text
        # 检查是否有图片
        drawing_elements = run.element.xpath(".//w:drawing")
        if drawing_elements:
            image_rId = run.element.xpath(".//a:blip/@r:embed")[0]
            if image_rId in images:
                base64_image, width_px, height_px = images[image_rId]
                # 插入图片的 HTML 标签，并保留原始尺寸和响应式属性
                content += f'<img src="data:image/png;base64,{base64_image}" alt="image" width="{width_px}" height="{height_px}" style="max-width: 100%; height: auto;" />'
    return content

def extract_questions_from_docx(doc, images):
    """提取 Word 文档中的选择题（题目和选项），并识别正确答案和解析"""
    questions = []
    current_question = {"question_text": "", "options": [], "correct_answer": "", "feedback": ""}
    is_question = False  # 是否正在处理题目
    is_feedback = False  # 是否正在处理解析
    for paragraph in doc.paragraphs:
        text = paragraph.text.strip()
        if text.startswith("[题目]"):  # 题目开始
            if current_question["question_text"]:  # 保存上一个题目
                questions.append(current_question)
                current_question = {"question_text": "", "options": [], "correct_answer": "", "feedback": ""}
            is_question = True
            is_feedback = False
            current_question["question_text"] = process_runs(paragraph.runs, images).replace("[题目]", "").strip()
        elif text.startswith(("A.", "B.", "C.", "D.")):  # 选项
            is_question = False
            is_feedback = False
            option_content = process_runs(paragraph.runs, images)
            current_question["options"].append(option_content)
        elif text.startswith("[答案]"):  # 答案
            is_question = False
            is_feedback = False
            # 移除标记并提取选项字母
            correct_answer = text.replace("[答案]", "").strip()
            current_question["correct_answer"] = correct_answer[0]  # 提取第一个字符（A/B/C/D）
        elif text.startswith("[解析]"):  # 解析开始
            is_question = False
            is_feedback = True
            current_question["feedback"] = process_runs(paragraph.runs, images).replace("[解析]", "").strip()
        elif text:  # 题目或解析的换行内容
            if is_question:
                current_question["question_text"] += "<br>" + process_runs(paragraph.runs, images)
            elif is_feedback:
                current_question["feedback"] += "<br>" + process_runs(paragraph.runs, images)
    if current_question["question_text"]:
        questions.append(current_question)
    return questions

def create_moodle_xml(questions):
    """生成 Moodle XML 文件"""
    quiz = Element("quiz")
    for idx, question in enumerate(questions, start=1):  # 从 1 开始编号
        question_element = SubElement(quiz, "question", type="multichoice")
        name = SubElement(question_element, "name")
        # 格式化编号为两位数（如 01, 02, ..., 10, 11）
        question_number = f"{idx:02d}"
        SubElement(name, "text").text = f"Question {question_number}"
        questiontext = SubElement(question_element, "questiontext", format="html")
        SubElement(questiontext, "text").text = question["question_text"]
        # 添加选项编号方式
        SubElement(question_element, "answernumbering").text = "ABCD"
        # 添加选项
        for option_content in question["options"]:
            is_correct = option_content.startswith(question["correct_answer"] + ".")
            fraction = "100" if is_correct else "0"
            answer = SubElement(question_element, "answer", fraction=fraction)
            SubElement(answer, "text").text = option_content[2:].strip()  # 移除选项字母和点
        # 添加解析
        if question["feedback"]:
            generalfeedback = SubElement(question_element, "generalfeedback", format="html")
            SubElement(generalfeedback, "text").text = question["feedback"]
    # 格式化 XML
    xml_str = tostring(quiz, encoding="utf-8")
    dom = minidom.parseString(xml_str)
    return dom.toprettyxml(indent="  ")

def main():
    # 读取当前目录下的所有 .docx 文件
    for filename in os.listdir("."):
        if filename.endswith(".docx"):
            doc = Document(filename)
            images = extract_images_from_docx(doc)
            questions = extract_questions_from_docx(doc, images)
            xml_content = create_moodle_xml(questions)
            # 保存为 XML 文件
            output_filename = filename.replace(".docx", ".xml")
            with open(output_filename, "w", encoding="utf-8") as f:
                f.write(xml_content)
            print(f"已生成文件: {output_filename}")

if __name__ == "__main__":
    main()

希望这篇文章对你有帮助！如果有任何问题或建议，欢迎留言讨论。