o ²3Ihã@s`ddlZddlZddlZddlZddlmZe e¡Zdd„Z dd„Z dd„ZGd d „d ƒZdS)éN)ÚConv1DcCs<|jj\}}tj ||¡}|jjj ¡|j_|jj|j_|S)N) ÚweightÚshapeÚtorchÚnnÚLinearÚdataÚTÚ contiguousÚbias)ÚmoduleÚin_sizeÚout_sizeÚlinear©rúh/home/air/sanwanet/gpt-api/venv/lib/python3.10/site-packages/onnxruntime/transformers/quantize_helper.pyÚ_conv1d_to_linears rcCsNt d¡t|jƒD]}|j|}t|tƒr t|ƒ}||j|<q t|ƒq dS)zsin-place This is for Dynamic Quantization, as Conv1D is not recognized by PyTorch, convert it to nn.Linear zreplace Conv1D with LinearN)ÚloggerÚdebugÚlistÚ_modulesÚ isinstancerrÚconv1d_to_linear)ÚmodelÚnamerrrrrrs úrcCs.t | ¡d¡tj d¡d}t d¡|S)Nztemp.pé)rÚsaveÚ state_dictÚosÚpathÚgetsizeÚremove)rÚsizerrrÚ_get_size_of_pytorch_model's r#c@s,eZdZeejfdd„ƒZeddd„ƒZdS)ÚQuantizeHelpercCsLt|ƒtjj|tjjh|d}t dt|ƒ›¡t dt|ƒ›¡|S)z{ Usage: model = quantize_model(model) TODO: mix of in-place and return, but results are different )Údtypez'Size of full precision Torch model(MB):z"Size of quantized Torch model(MB):) rrÚquantizationÚquantize_dynamicrrrÚinfor#)rr%Úquantized_modelrrrÚquantize_torch_model/s z#QuantizeHelper.quantize_torch_modelFcCsddlm}ddlm}||ƒjjdddt dtj |¡d›¡||||dtjj id t d |›¡t dtj |¡d›¡dS)Nr)ÚPath)r'T)ÚparentsÚexist_okz&Size of full precision ONNX model(MB):rÚDefaultTensorType)Úuse_external_data_formatÚ extra_optionszquantized model saved to:z!Size of quantized ONNX model(MB):)Úpathlibr+Úonnxruntime.quantizationr'ÚparentÚmkdirrr(rrr ÚonnxÚTensorProtoÚFLOAT)Úonnx_model_pathÚquantized_model_pathr/r+r'rrrÚquantize_onnx_model<s ü z"QuantizeHelper.quantize_onnx_modelN)F)Ú__name__Ú __module__Ú__qualname__ÚstaticmethodrÚqint8r*r:rrrrr$.s r$) Úloggingrr5rÚtransformers.modeling_utilsrÚ getLoggerr;rrrr#r$rrrrÚs