写点什么

DevPod 如何重塑 AI 模型工程化:DeepSeek-OCR 从开发到生产的完整工作流解析

  • 2025-11-26
    浙江
  • 本文字数:11624 字

    阅读完需:约 38 分钟

作者:西流


开发调试到生产上线,全流程仅需一个工作区——DevPod 重新定义 AI 工程化标准,当开发与部署不再割裂,模型价值才真正释放。

简介

告别碎片化开发体验,DevPod 打造从代码到服务的一站式闭环。本文手把手演示在函数计算 Funmodel 上完成 DeepSeek-OCR 模型从云端开发、本地调试到生产部署的完整工作流,让模型真正走出实验室,实现分钟级服务化,重塑 AI 模型从开发到落地的高效路径。

回顾:为何 DevPod 让 DeepSeek-OCR 启动如此简单?

在系列第一篇《为什么别人用 DevPod 秒启 DeepSeek-OCR,你还在装环境?》中,我们见证了 DevPod 如何将原本繁琐的环境配置、依赖安装、硬件适配等过程压缩至 60 秒内完成。无需再为 CUDA 版本冲突、Python 环境隔离、模型权重下载缓慢而烦恼,DevPod 通过云端预置环境,让开发者一进入工作区就能立即与 DeepSeek-OCR 大模型进行交互,真正实现了“开箱即用”的 AI 开发体验。


DevPod 是一款云原生 AI 开发工具,提供统一工作空间与预置环境,实现开发、测试、生产环境一致性,彻底消除“环境漂移”问题。

它能一键调用 GPU 资源,支持代码编写、调试、模型调优、镜像封装与生产部署全流程操作,无需切换多平台工具。

还可与阿里云函数计算 FunModel 深度集成,提供性能监控、日志分析、在线调试与快速迭代能力,让 AI 模型从实验室到服务化落地更高效。

从开发到生产:DevPod 全流程闭环工作流

然而,启动模型仅仅是开始。在实际业务场景中,我们还需要完成模型调优、代码调试、性能测试、服务封装、生产部署等一系列环节。传统方式下,这些步骤往往涉及多个平台和工具的切换,数据和代码在不同环境间流转,极易出现“在我机器上能运行”的尴尬局面。


DevPod 通过统一的工作空间和无缝衔接的部署能力,打通了从代码到服务的最后一公里。下面,我们将通过 DeepSeek-OCR 模型的实战案例,完整演示这一工作流。

1. 开发调试阶段:VSCode + GPU 加速的云端实验室

在 DevPod 中启动 DeepSeek-OCR 环境实例后,我们立即获得一个配备 GPU 的云端 VSCode 开发环境。这不仅是一个模型运行的容器,更是一个完整的端到端研究与开发平台。基于 DeepSeek-OCR-vLLM 提供的代码推理示例,我们构建了 server.py 作为推理服务的核心入口,实现了高效、可扩展的推理接口。(完整代码详见附录)


/workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/server.py


import osimport ioimport torchimport uvicornimport requestsfrom PIL import Imagefrom fastapi import FastAPI, HTTPExceptionfrom pydantic import BaseModelfrom typing import Optional, Dict, Any, Listimport tempfileimport fitzfrom concurrent.futures import ThreadPoolExecutorimport asyncio# Set environment variablesif torch.version.cuda == '11.8':    os.environ["TRITON_PTXAS_PATH"] = "/usr/local/cuda-11.8/bin/ptxas"os.environ['VLLM_USE_V1'] = '0'os.environ["CUDA_VISIBLE_DEVICES"] = '0'from config import MODEL_PATH, CROP_MODE, MAX_CONCURRENCY, NUM_WORKERSfrom vllm import LLM, SamplingParamsfrom vllm.model_executor.models.registry import ModelRegistryfrom deepseek_ocr import DeepseekOCRForCausalLMfrom process.ngram_norepeat import NoRepeatNGramLogitsProcessorfrom process.image_process import DeepseekOCRProcessor# Register modelModelRegistry.register_model("DeepseekOCRForCausalLM", DeepseekOCRForCausalLM)# Initialize modelprint("Loading model...")...# Initialize FastAPI appapp = FastAPI(title="DeepSeek-OCR API", version="1.0.0")...@app.post("/ocr_batch", response_model=ResponseData)async def ocr_batch_inference(request: RequestData):    """    Main OCR batch processing endpoint    Accepts a list of image URLs and/or PDF URLs for OCR processing    Returns a list of OCR results corresponding to each input document    Supports both individual image processing and PDF-to-image conversion    """    print(f"Received request data: {request}")    try:        input_data = request.input        prompt = request.prompt # Get the prompt from the request        if not input_data.images and not input_data.pdfs:            raise HTTPException(status_code=400, detail="Either 'images' or 'pdfs' (or both) must be provided as lists.")        all_batch_inputs = []        final_output_parts = []        # Process images if provided        if input_data.images:            batch_inputs_images, counts_images = await process_items_async(input_data.images, is_pdf=False, prompt=prompt)            all_batch_inputs.extend(batch_inputs_images)            final_output_parts.append(counts_images)        # Process PDFs if provided        if input_data.pdfs:            batch_inputs_pdfs, counts_pdfs = await process_items_async(input_data.pdfs, is_pdf=True, prompt=prompt)            all_batch_inputs.extend(batch_inputs_pdfs)            final_output_parts.append(counts_pdfs)        if not all_batch_inputs:             raise HTTPException(status_code=400, detail="No valid images or PDF pages were processed from the input URLs.")        # Run inference on the combined batch        outputs_list = await run_inference(all_batch_inputs)        # Reconstruct final output list based on counts        final_outputs = []        output_idx = 0        # Flatten the counts list        all_counts = [count for sublist in final_output_parts for count in sublist]        for count in all_counts:            # Get 'count' number of outputs for this input            input_outputs = outputs_list[output_idx : output_idx + count]            output_texts = []            for output in input_outputs:                content = output.outputs[0].text                if '<|end▁of▁sentence|>' in content:                    content = content.replace('<|end▁of▁sentence|>', '')                output_texts.append(content)            # Combine pages if it was a multi-page PDF input (or image treated as PDF)            if count > 1:                combined_text = "\n<--- Page Split --->\n".join(output_texts)                final_outputs.append(combined_text)            else:                # Single image or single-page PDF                final_outputs.append(output_texts[0] if output_texts else "")            output_idx += count # Move to the next set of outputs        return ResponseData(output=final_outputs)    except HTTPException:        raise    except Exception as e:        raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}")...if __name__ == "__main__":    uvicorn.run(app, host="0.0.0.0", port=8000, workers=1)
复制代码

local 测试

# 终端启动推理服务$ python /workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/server.py# 开启另外一个终端$ curl -X POST \  -H "Content-Type: application/json" \  -d '{    "input": {      "pdfs": [        "https://images.devsapp.cn/test/ocr-test.pdf"      ]    },    "prompt": "<image>\nFree OCR."  }' \  http://127.0.0.1:8000/ocr_batch
复制代码


也可以通过快速访问 tab 获取代理路径,比如:https://devpod-dbbeddba-ngywxigepn.cn-hangzhou.ide.fc.aliyun.com/proxy/8000/,并通过外部的 Postman 等客户端工具直接调用调试。


测试 image

$ curl -X POST \  -H "Content-Type: application/json" \  -d '{    "input": {      "images": [        "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png"      ]    },    "prompt": "<image>\n<|grounding|>Convert the document to markdown."  }' \  "https://devpod-dbbeddba-ngywxigepn.cn-hangzhou.ide.fc.aliyun.com/proxy/8000/ocr_batch"
复制代码

测试 pdf

$ curl -X POST \  -H "Content-Type: application/json" \  -d '{    "input": {      "pdfs": [        "https://images.devsapp.cn/test/ocr-test.pdf"      ]    },    "prompt": "<image>\nFree OCR."  }' \  "https://devpod-dbbeddba-ngywxigepn.cn-hangzhou.ide.fc.aliyun.com/proxy/8000/ocr_batch"
复制代码


示例:


混合

$ curl -X POST \  -H "Content-Type: application/json" \  -d '{    "input": {      "pdfs": [        "https://images.devsapp.cn/test/ocr-test.pdf"      ],      "images": [        "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png"      ]    },    "prompt": "<image>\nFree OCR."  }' \  "https://devpod-dbbeddba-ngywxigepn.cn-hangzhou.ide.fc.aliyun.com/proxy/8000/ocr_batch"
复制代码


DevPod 的优势在于:所有依赖已预装,GPU 资源即开即用,开发者可以专注于算法优化和业务逻辑,而非环境问题。

2. 服务封装阶段:一键转换为镜像交付物

当模型在开发环境中验证通过后,下一步是将其封装为镜像交付物。在 FunModel [ 1] 的 DevPod 中,这仅需如下操作:



详情请参考 DevPod 镜像构建与 ACR 集成 [ 2]

3. 一键部署:从工作区到生产环境

镜像构建推送完毕后,镜像已经存储到 ACR,此时可以一键部署为 FunModel 模型服务。



4. 监控与迭代:闭环的开发运维体验

部署不是终点。DevPod 与 FunModel 深度集成,提供了完整的监控面板:


  • 性能监控: 实时查看 GPU 利用率、请求延迟、吞吐量

  • 日志分析: 集中收集所有实例日志,支持关键词检索

  • 变更部署记录: 每次变更配置(如卡型、扩缩容策略、 timeout 等)的部署都有记录追溯

  • 在线快捷调试: 快速测试部署后的模型服务



当需要优化模型或修复问题时,开发者可以:


  1. 在监控中发现问题

  2. 直接打开 DevPod 继续开发调试



  1. 验证修复方案

  2. 制作新的镜像,一键部署


整个过程在统一环境中完成,避免了环境不一致导致的问题,真正实现了开发与运维的无缝协作。

总结

通过本文的实战演示,我们可以看到 DevPod 不仅解决了“启动难”的问题,更构建了从代码到服务的完整闭环:


  • 环境一致性: 开发、测试、生产环境完全一致,消除“环境漂移”

  • 资源弹性: 按需分配 GPU 资源,开发时低配,生产时高配

  • 工作流集成: 无需在多个平台间切换,所有操作在一个工作区完成

  • 部署零学习曲线: 无需掌握 K8s、Dockerfile 等复杂概念,专注业务价值

DevFlow1:云端开发与部署的无缝闭环

DevFlow1 描绘了开发者基于 DevPod 实现的高效工作流:


  1. 开发者首先启动一个预配置的云端开发环境——已内置所需依赖与 GPU 资源,可即刻进行代码编写与调试。

  2. 代码修改完成后,无需手动编写 Dockerfile 或管理构建流程,只需一键操作,系统即自动将当前开发环境与代码打包为标准化镜像。

  3. 该镜像可直接部署为生产级服务,对外提供 API 接口。

  4. 当需要迭代优化时,开发者可无缝返回开发环境继续修改,再次一键构建并更新线上服务。


整个流程实现了从开发、调试到部署、迭代的全链路自动化,彻底屏蔽了基础设施的复杂性,让开发者真正聚焦于业务逻辑与模型优化本身。


DevFlow2:面向工程化的开发者工作流

DevFlow2 适用于熟悉容器化与工程化实践的开发者:


  1. 开发者从代码仓库的指定稳定版本(Commit)切入,启动专属开发环境,进行代码迭代、依赖安装及集成测试。

  2. 一旦测试验证通过且结果符合预期,开发者即可着手准备部署:手动编写或调整 Dockerfile,精确配置镜像构建逻辑,并按需设定函数入口或服务参数。

  3. 随后,系统依据该 Dockerfile 重建镜像,并执行端到端测试,以确保生产环境中的行为一致性。

  4. 最终,代码与 Dockerfile 变更一同提交至 Git,完成一次标准、可追溯且可复现的发布流程。


此流程赋予开发者对部署细节的精细控制,契合追求工程规范与长期可维护性的团队需求。



在阿里云 FunModel 平台,我们正在见证 AI 开发范式的转变:从"先建基础设施,再开发模型"到"先验证想法,再扩展规模"。DevPod 作为这一变革的核心载体,让 AI 开发者真正回归创造本身,而非被工具和环境所束缚。

了解函数计算模型服务 FunModel

FunModel 是一个面向 AI 模型开发、部署与运维的全生命周期管理平台。您只需提供模型文件(例如来自 ModelScope、Hugging Face 等社区的模型仓库),即可利用 FunModel 的自动化工具快速完成模型服务的封装与部署,并获得可直接调用的推理 API。平台在设计上旨在提升资源使用效率并简化开发部署流程。


FunModel 依托 Serverless + GPU,天然提供了简单,轻量,0 门槛的模型集成方案,给个人开发者良好的玩转模型的体验,也让企业级开发者快速高效的部署、运维和迭代模型。


在阿里云 FunModel 平台,开发者可以做到:


  • 模型的快速部署上线:从原来的以周为单位的模型接入周期降低到 5 分钟,0 开发,无排期

  • 一键扩缩容,让运维不再是负担:多种扩缩容策略高度适配业务流量,实现“无痛运维”

技术优势


访问模型广场(https://fcnext.console.aliyun.com/fun-model/cn-hangzhou/fun-model/model-market)快速部署 DeepSeek-OCR。



更多内容请参考:


  1. 模型服务 FunModel 产品文档


https://help.aliyun.com/zh/functioncompute/fc/model-service-funmodel/


  1. FunModel 快速入门


https://help.aliyun.com/zh/functioncompute/fc/quick-start


  1. FunModel 自定义部署


https://help.aliyun.com/zh/functioncompute/fc/custom-model-deployment


  1. FunModel 模型广场


https://fcnext.console.aliyun.com/fun-model/cn-hangzhou/fun-model/model-market


相关链接:


[1] FunModel


https://fcnext.console.aliyun.com/fun-model


[2] DevPod 镜像构建与 ACR 集成


https://help.aliyun.com/zh/functioncompute/fc/devpod-development-environment#491239f1cdujq


附录


完整代码:


/workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/server.py


import osimport ioimport torchimport uvicornimport requestsfrom PIL import Imagefrom fastapi import FastAPI, HTTPExceptionfrom pydantic import BaseModelfrom typing import Optional, Dict, Any, Listimport tempfileimport fitzfrom concurrent.futures import ThreadPoolExecutorimport asyncio# Set environment variablesif torch.version.cuda == '11.8':    os.environ["TRITON_PTXAS_PATH"] = "/usr/local/cuda-11.8/bin/ptxas"os.environ['VLLM_USE_V1'] = '0'os.environ["CUDA_VISIBLE_DEVICES"] = '0'from config import MODEL_PATH, CROP_MODE, MAX_CONCURRENCY, NUM_WORKERSfrom vllm import LLM, SamplingParamsfrom vllm.model_executor.models.registry import ModelRegistryfrom deepseek_ocr import DeepseekOCRForCausalLMfrom process.ngram_norepeat import NoRepeatNGramLogitsProcessorfrom process.image_process import DeepseekOCRProcessor# Register modelModelRegistry.register_model("DeepseekOCRForCausalLM", DeepseekOCRForCausalLM)# Initialize modelprint("Loading model...")llm = LLM(    model=MODEL_PATH,    hf_overrides={"architectures": ["DeepseekOCRForCausalLM"]},    block_size=256,           # Memory block size for KV cache    enforce_eager=False,      # Use eager mode for better performance with multimodal models    trust_remote_code=True,   # Allow execution of code from remote repositories    max_model_len=8192,       # Maximum sequence length the model can handle    swap_space=0,             # No swapping to CPU, keeping everything on GPU    max_num_seqs=max(MAX_CONCURRENCY, 100),  # Maximum number of sequences to process concurrently    tensor_parallel_size=1,   # Number of GPUs for tensor parallelism (1 = single GPU)    gpu_memory_utilization=0.9,  # Use 90% of GPU memory for model execution    disable_mm_preprocessor_cache=True  # Disable cache for multimodal preprocessor to avoid issues)# Configure sampling parameters# NoRepeatNGramLogitsProcessor prevents repetition in generated text by tracking n-gram patternslogits_processors = [NoRepeatNGramLogitsProcessor(ngram_size=20, window_size=50, whitelist_token_ids={128821, 128822})]sampling_params = SamplingParams(    temperature=0.0,                    # Deterministic output (greedy decoding)    max_tokens=8192,                    # Maximum number of tokens to generate    logits_processors=logits_processors, # Apply the processor to avoid repetitive text    skip_special_tokens=False,          # Include special tokens in the output    include_stop_str_in_output=True,    # Include stop strings in the output)# Initialize FastAPI appapp = FastAPI(title="DeepSeek-OCR API", version="1.0.0")class InputData(BaseModel):    """    Input data model to define what types of documents to process    images: Optional list of image URLs to process    pdfs: Optional list of PDF URLs to process    Note: At least one of these fields must be provided in a request    """    images: Optional[List[str]] = None    pdfs: Optional[List[str]] = Noneclass RequestData(BaseModel):    """    Main request model that defines the input data and optional prompt    """    input: InputData    # Add prompt as an optional field with a default value    prompt: str = '<image>\nFree OCR.' # Default promptclass ResponseData(BaseModel):    """    Response model that returns OCR results for each input document    """    output: List[str]def download_file(url: str) -> bytes:    """Download file from URL"""    try:        response = requests.get(url, timeout=30)        response.raise_for_status()        return response.content    except Exception as e:        raise HTTPException(status_code=400, detail=f"Failed to download file from URL: {str(e)}")def is_pdf_file(content: bytes) -> bool:    """Check if the content is a PDF file"""    return content.startswith(b'%PDF')def load_image_from_bytes(image_bytes: bytes) -> Image.Image:    """Load image from bytes"""    try:        image = Image.open(io.BytesIO(image_bytes))        return image.convert('RGB')    except Exception as e:        raise HTTPException(status_code=400, detail=f"Failed to load image: {str(e)}")def pdf_to_images(pdf_bytes: bytes, dpi: int = 144) -> list:    """Convert PDF to images"""    try:        images = []        pdf_document = fitz.open(stream=pdf_bytes, filetype="pdf")        zoom = dpi / 72.0        matrix = fitz.Matrix(zoom, zoom)        for page_num in range(pdf_document.page_count):            page = pdf_document[page_num]            pixmap = page.get_pixmap(matrix=matrix, alpha=False)            img_data = pixmap.tobytes("png")            img = Image.open(io.BytesIO(img_data))            images.append(img.convert('RGB'))        pdf_document.close()        return images    except Exception as e:        raise HTTPException(status_code=400, detail=f"Failed to convert PDF to images: {str(e)}")def process_single_image_sync(image: Image.Image, prompt: str) -> Dict: # Renamed and made sync    """Process a single image (synchronous function for CPU-bound work)"""    try:        cache_item = {            "prompt": prompt,            "multi_modal_data": {                "image": DeepseekOCRProcessor().tokenize_with_images(                    images=[image],                    bos=True,                    eos=True,                    cropping=CROP_MODE                )            },        }        return cache_item    except Exception as e:        raise HTTPException(status_code=500, detail=f"Failed to process image: {str(e)}")async def process_items_async(items_urls: List[str], is_pdf: bool, prompt: str) -> tuple[List[Dict], List[int]]:    """    Process a list of image or PDF URLs asynchronously.    Downloads files concurrently, then processes images/PDF pages in a thread pool.    Returns a tuple: (batch_inputs, num_results_per_input)    """    loop = asyncio.get_event_loop()    # 1. Download all files concurrently    download_tasks = [loop.run_in_executor(None, download_file, url) for url in items_urls]    contents = await asyncio.gather(*download_tasks)    # 2. Prepare arguments for processing (determine if PDF/image, count pages)    processing_args = []    num_results_per_input = []    for idx, (url, content) in enumerate(zip(items_urls, contents)):        if is_pdf:            if not is_pdf_file(content):                 raise HTTPException(status_code=400, detail=f"Provided file is not a PDF: {url}")            images = pdf_to_images(content)            num_pages = len(images)            num_results_per_input.append(num_pages)            # Each page will be processed separately            processing_args.extend([(img, prompt) for img in images])        else: # is image            if is_pdf_file(content):                # Handle case where an image URL accidentally points to a PDF                images = pdf_to_images(content)                num_pages = len(images)                num_results_per_input.append(num_pages)                processing_args.extend([(img, prompt) for img in images])            else:                image = load_image_from_bytes(content)                num_results_per_input.append(1)                processing_args.append((image, prompt))    # 3. Process images/PDF pages in parallel using ThreadPoolExecutor    with ThreadPoolExecutor(max_workers=NUM_WORKERS) as executor:        # Submit all processing tasks        process_tasks = [            loop.run_in_executor(executor, process_single_image_sync, img, prompt)            for img, prompt in processing_args        ]        # Wait for all to complete        processed_results = await asyncio.gather(*process_tasks)    return processed_results, num_results_per_inputasync def run_inference(batch_inputs: List[Dict]) -> List:    """Run inference on batch inputs"""    if not batch_inputs:        return []    try:        # Run inference on the entire batch        outputs_list = llm.generate(            batch_inputs,            sampling_params=sampling_params        )        return outputs_list    except Exception as e:        raise HTTPException(status_code=500, detail=f"Failed to run inference: {str(e)}")@app.post("/ocr_batch", response_model=ResponseData)async def ocr_batch_inference(request: RequestData):    """    Main OCR batch processing endpoint    Accepts a list of image URLs and/or PDF URLs for OCR processing    Returns a list of OCR results corresponding to each input document    Supports both individual image processing and PDF-to-image conversion    """    print(f"Received request data: {request}")    try:        input_data = request.input        prompt = request.prompt # Get the prompt from the request        if not input_data.images and not input_data.pdfs:            raise HTTPException(status_code=400, detail="Either 'images' or 'pdfs' (or both) must be provided as lists.")        all_batch_inputs = []        final_output_parts = []        # Process images if provided        if input_data.images:            batch_inputs_images, counts_images = await process_items_async(input_data.images, is_pdf=False, prompt=prompt)            all_batch_inputs.extend(batch_inputs_images)            final_output_parts.append(counts_images)        # Process PDFs if provided        if input_data.pdfs:            batch_inputs_pdfs, counts_pdfs = await process_items_async(input_data.pdfs, is_pdf=True, prompt=prompt)            all_batch_inputs.extend(batch_inputs_pdfs)            final_output_parts.append(counts_pdfs)        if not all_batch_inputs:             raise HTTPException(status_code=400, detail="No valid images or PDF pages were processed from the input URLs.")        # Run inference on the combined batch        outputs_list = await run_inference(all_batch_inputs)        # Reconstruct final output list based on counts        final_outputs = []        output_idx = 0        # Flatten the counts list        all_counts = [count for sublist in final_output_parts for count in sublist]        for count in all_counts:            # Get 'count' number of outputs for this input            input_outputs = outputs_list[output_idx : output_idx + count]            output_texts = []            for output in input_outputs:                content = output.outputs[0].text                if '<|end▁of▁sentence|>' in content:                    content = content.replace('<|end▁of▁sentence|>', '')                output_texts.append(content)            # Combine pages if it was a multi-page PDF input (or image treated as PDF)            if count > 1:                combined_text = "\n<--- Page Split --->\n".join(output_texts)                final_outputs.append(combined_text)            else:                # Single image or single-page PDF                final_outputs.append(output_texts[0] if output_texts else "")            output_idx += count # Move to the next set of outputs        return ResponseData(output=final_outputs)    except HTTPException:        raise    except Exception as e:        raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}")@app.get("/health")async def health_check():    """Health check endpoint"""    return {"status": "healthy"}@app.get("/")async def root():    """Root endpoint"""    return {"message": "DeepSeek-OCR API is running (Batch endpoint available at /ocr_batch)"}if __name__ == "__main__":    uvicorn.run(app, host="0.0.0.0", port=8000, workers=1)
复制代码


点击此次,进入 FunModel 模型广场。

发布于: 2025-11-26阅读数: 3
用户头像

阿里云云原生 2019-05-21 加入

还未添加个人简介

评论

发布
暂无评论
DevPod 如何重塑 AI 模型工程化:DeepSeek-OCR 从开发到生产的完整工作流解析_阿里云_阿里巴巴云原生_InfoQ写作社区