{"id":1360,"date":"2024-09-14T09:45:41","date_gmt":"2024-09-14T01:45:41","guid":{"rendered":"https:\/\/www.fanyamin.com\/wordpress\/?p=1360"},"modified":"2024-09-15T08:40:27","modified_gmt":"2024-09-15T00:40:27","slug":"%e5%88%a9%e7%94%a8-langchain-%e5%92%8c-llm-%e6%9d%a5%e7%bb%99-pdf-%e5%81%9a%e6%80%bb%e7%bb%93","status":"publish","type":"post","link":"https:\/\/www.fanyamin.com\/wordpress\/?p=1360","title":{"rendered":"\u5229\u7528 langchain \u548c LLM \u6765\u7ed9 PDF \u505a\u603b\u7ed3"},"content":{"rendered":"<p>\u5728\u7f51\u4e0a\u770b\u5230\u4e00\u4e2aPDF, \u8bb2\u7684\u662f Gstreamer \u7684\u7684\u52a8\u6001\u7ba1\u9053\u7684\u6784\u5efa, \u4e00\u77a5\u800c\u8fc7, \u6ca1\u65f6\u95f4\u7ec6\u770b, \u901a\u8fc7 langchain \u548c LLM \u7ed9\u5b83\u505a\u4e2a\u5feb\u901f\u603b\u7ed3<\/p>\n<p>\u4ee3\u7801\u5982\u4e0b<\/p>\n<pre><code class=\"language-python\">\nfrom langchain.document_loaders import UnstructuredPDFLoader\nfrom langchain.llms import OpenAI\nfrom langchain.chains import LLMChain\nfrom langchain.prompts import PromptTemplate\n\n# \u52a0\u8f7d PDF \u6587\u4ef6\npdf_loader = UnstructuredPDFLoader(&quot;path_to_your_pdf_file.pdf&quot;)\ndocuments = pdf_loader.load()\n\n# \u83b7\u53d6 PDF \u7684\u7eaf\u6587\u672c\u5185\u5bb9\npdf_text = &#039; &#039;.join([doc.page_content for doc in documents])\n\n# \u521b\u5efa LLM \u5bf9\u8c61 (\u4f7f\u7528 OpenAI GPT)\nllm = OpenAI(temperature=0.7, openai_api_key=&quot;your_openai_api_key&quot;)\n\n# \u5b9a\u4e49\u603b\u7ed3\u7684 Prompt\nprompt_template = &quot;&quot;&quot;\n\u8bf7\u603b\u7ed3\u4ee5\u4e0b\u5185\u5bb9\uff1a\n{pdf_text}\n\u603b\u7ed3\uff1a\n&quot;&quot;&quot;\n\nprompt = PromptTemplate(\n    input_variables=[&quot;pdf_text&quot;],\n    template=prompt_template,\n)\n\n# \u521b\u5efa LLMChain\nchain = LLMChain(llm=llm, prompt=prompt)\n\n# \u4f7f\u7528 LLM \u751f\u6210\u603b\u7ed3\nsummary = chain.run(pdf_text)\nprint(&quot;PDF \u603b\u7ed3\uff1a\\n&quot;, summary)\n<\/code><\/pre>\n<p>\u8f93\u51fa\u5982\u4e0b<\/p>\n<pre><code>PDF summary:\n \u603b\u7ed3\u5185\u5bb9\uff1a\n\n1. **\u6f14\u8bb2\u8005\u4fe1\u606f**\uff1a\n   - \u6f14\u8bb2\u8005\uff1aJos\u00e9 Antonio Santos Cadenas\n   - \u804c\u4f4d\uff1a\u8f6f\u4ef6\u5de5\u7a0b\u5e08\n   - \u6559\u80b2\u80cc\u666f\uff1aTelematic Systems \u7855\u58eb\n   - \u5de5\u4f5c\u7ecf\u5386\uff1aKurento Media Server (KMS) \u7ba1\u7406\u5458\n   - \u8054\u7cfb\u65b9\u5f0f\uff1asantoscadenas@gmail.com\n\n2. **GStreamer \u9759\u6001\u7ba1\u9053**\uff1a\n   - \u4f7f\u7528 `gst-launch` \u547d\u4ee4\u521b\u5efa\u590d\u6742\u7684\u5a92\u4f53\u7ba1\u9053\uff0c\u4f8b\u5982\u5c06\u89c6\u9891\u6587\u4ef6\u8f6c\u7801\u3002\n   - \u793a\u4f8b\uff1a`gst-launch-1.0 filesrc location=sample.mp4 ! qtdemux ! avdec_h264 ! queue ! vp8enc ! webmmux ! filesink location=sample.webm`\n\n3. **GStreamer \u52a8\u6001\u5143\u7d20**\uff1a\n   - GStreamer \u63d0\u4f9b\u4e86\u4e00\u4e9b\u52a8\u6001\u5143\u7d20\uff08\u5982 `autovideosrc`, `autovideosink`, `decodebin`, `playbin`\uff09\uff0c\u7b80\u5316\u4e86\u52a8\u6001\u7ba1\u9053\u7684\u521b\u5efa\u3002\n   - \u793a\u4f8b\uff1a`gst-launch-1.0 filesrc location=sample.mp4 ! decodebin ! vp8enc ! webmmux ! filesink location=sample.webm`\n\n4. **\u52a8\u6001\u7ba1\u9053\u4e0d\u4f9d\u8d56\u4e8e\u5a92\u4f53**\uff1a\n   - \u52a8\u6001\u7ba1\u9053\u53ef\u4ee5\u6839\u636e\u5a92\u4f53\u6d41\u800c\u4e0d\u662f\u5916\u90e8\u6761\u4ef6\u521b\u5efa\u3002\n   - \u4f8b\u5982\uff1a\u6839\u636e\u7528\u6237\u64cd\u4f5c\u6dfb\u52a0\u6216\u66ff\u6362\u5143\u7d20\u3002\n\n5. **\u521b\u5efa\u52a8\u6001\u7ba1\u9053\u7684\u8981\u6c42**\uff1a\n   - \u9700\u8981\u6df1\u5165\u7406\u89e3 GStreamer \u7684\u5de5\u4f5c\u539f\u7406\uff0c\u5305\u62ec\u5a92\u4f53\u5982\u4f55\u5728 pads \u4e4b\u95f4\u6d41\u52a8\u3001\u534f\u5546\u5982\u4f55\u8fdb\u884c\u3001\u6d41\u7ebf\u7a0b\u5982\u4f55\u5de5\u4f5c\u4ee5\u53ca probes \u5982\u4f55\u5de5\u4f5c\u3002\n\n6. **\u5728\u64ad\u653e\u65f6\u6dfb\u52a0\u548c\u79fb\u9664\u5143\u7d20**\uff1a\n   - \u9519\u8bef\u65b9\u5f0f\uff1a\u76f4\u63a5\u8c03\u7528 `gst_pad_unlink`\uff0c\u53ef\u80fd\u5bfc\u81f4\u6b7b\u9501\u6216\u7ade\u4e89\u6761\u4ef6\u3002\n   - \u6b63\u786e\u65b9\u5f0f\uff1a\u4f7f\u7528 probe \u7b49\u5f85 pad \u7a7a\u95f2\uff0c\u786e\u4fdd\u5728\u65ad\u5f00\u8fde\u63a5\u65f6\u4e0d\u4f1a\u5f71\u54cd\u5a92\u4f53\u6d41\u3002\n\n7. **\u5728 tee \u540e\u6dfb\u52a0\u5143\u7d20**\uff1a\n   - tee \u4f1a\u5904\u7406\u65ad\u5f00\u7684 pads\uff0c\u4f46\u5efa\u8bae\u5728 idle \u6216 block \u56de\u8c03\u4e2d\u5904\u7406\u5143\u7d20\u8fde\u63a5\uff0c\u4ee5\u907f\u514d\u72b6\u6001\u66f4\u6539\u6216\u534f\u5546\u671f\u95f4\u7684\u95ee\u9898\u3002\n\n8. **\u6ce8\u610f\u534f\u5546**\uff1a\n   - \u5728 tee \u540e\u8fde\u63a5\u5143\u7d20\u65f6\uff0c\u9700\u8981\u6ce8\u610f\u534f\u5546\u53ef\u80fd\u4f1a\u5f71\u54cd\u5176\u4ed6\u5206\u652f\uff0c\u5efa\u8bae\u4f7f\u7528 `capsfilter` \u6216\u8f6c\u6362\u5668\u6765\u7b80\u5316\u534f\u5546\u3002\n\n9. **\u79fb\u9664\u5143\u7d20**\uff1a\n   - \u6709\u65f6\u9700\u8981\u5141\u8bb8\u5143\u7d20\u5904\u7406\u6240\u6709\u961f\u5217\u4e2d\u7684\u7f13\u51b2\u533a\uff08\u4f8b\u5982\u5f55\u5236\u65f6\uff09\uff0c\u65ad\u5f00\u8fde\u63a5\u540e\u9700\u8981\u53d1\u9001 EOS \u5e76\u7b49\u5f85\u7ba1\u9053\u672b\u7aef\u5904\u7406\u3002\n\n10. **\u5b9e\u65f6\u7ba1\u9053\u5efa\u8bae**\uff1a\n    - \u5728\u5b9e\u65f6\u7ba1\u9053\u4e2d\uff0c\u5143\u7d20\u5e94\u914d\u7f6e\u4e3a\u5c3d\u53ef\u80fd\u5feb\u5730\u5de5\u4f5c\uff0c\u5e76\u6dfb\u52a0\u961f\u5217\u4ee5\u5206\u79bb\u4e0d\u540c\u7ebf\u7a0b\u4e2d\u7684\u5904\u7406\u3002\n\n11. **\u603b\u7ed3\u5efa\u8bae**\uff1a\n    - \u52a8\u6001\u7ba1\u9053\u4e0d\u5bb9\u6613\uff0c\u7406\u89e3\u8981\u5c1d\u8bd5\u7684\u64cd\u4f5c\u53ca\u5176\u5bf9\u7ba1\u9053\u5176\u4ed6\u90e8\u5206\u7684\u5f71\u54cd\u3002\n    - \u65ad\u5f00\u8fde\u63a5\u524d\u963b\u585e pads\uff0c\u8fde\u63a5\u524d\u540c\u6b65\u65b0\u5143\u7d20\u7684\u72b6\u6001\uff0c\u59cb\u7ec8\u5148\u8fde\u63a5 sink \u5143\u7d20\u4ee5\u907f\u514d\u5a92\u4f53\u6cc4\u6f0f\u3002\n\n12. **\u8054\u7cfb\u65b9\u5f0f\u548c\u8d44\u6e90**\uff1a\n    - \u6f14\u8bb2\u8005\u90ae\u7bb1\uff1asantoscadenas@gmail.com\n    - Kurento \u76f8\u5173\u94fe\u63a5\uff1a[Kurento](http:\/\/www.kurento.org), [GitHub](http:\/\/www.github.com\/kurento), [Twitter](https:\/\/twitter.com\/kurentoms)\n    - \u5176\u4ed6\u76f8\u5173\u7ec4\u7ec7\uff1a[NUBOMEDIA](http:\/\/www.nubomedia.eu), [FI-WARE](http:\/\/www.fi-ware.org), [EU](http:\/\/ec.europa.eu)<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>\u5728\u7f51\u4e0a\u770b\u5230\u4e00\u4e2aPDF, \u8bb2\u7684\u662f Gstreamer \u7684\u7684\u52a8\u6001\u7ba1\u9053\u7684\u6784\u5efa, \u4e00\u77a5\u800c\u8fc7, \u6ca1\u65f6\u95f4\u7ec6\u770b, \u901a\u8fc7 langchain \u548c LLM \u7ed9\u5b83\u505a\u4e2a\u5feb\u901f\u603b\u7ed3 \u4ee3\u7801\u5982\u4e0b from langchain.document_loaders import UnstructuredPDFLoader from langchain.llms import OpenAI from langchain.chains import LLMChain from langchain.prompts import PromptTemplate # \u52a0\u8f7d PDF \u6587\u4ef6 pdf_loader = UnstructuredPDFLoader(&quot;path_to_your_pdf_file.pdf&quot;) documents = pdf_loader.load() # \u83b7\u53d6 PDF \u7684\u7eaf\u6587\u672c\u5185\u5bb9 pdf_text = &#039; &#039;.join([doc.page_content for doc in documents]) # \u521b\u5efa LLM \u5bf9\u8c61 (\u4f7f\u7528 [&hellip;] <a class=\"read-more\" href=\"https:\/\/www.fanyamin.com\/wordpress\/?p=1360\" title=\"Permanent Link to: \u5229\u7528 langchain \u548c LLM \u6765\u7ed9 PDF \u505a\u603b\u7ed3\">&rarr;Read&nbsp;more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"class_list":["post-1360","post","type-post","status-publish","format-standard","hentry","category-5"],"_links":{"self":[{"href":"https:\/\/www.fanyamin.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/1360"}],"collection":[{"href":"https:\/\/www.fanyamin.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.fanyamin.com\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.fanyamin.com\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.fanyamin.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1360"}],"version-history":[{"count":3,"href":"https:\/\/www.fanyamin.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/1360\/revisions"}],"predecessor-version":[{"id":1363,"href":"https:\/\/www.fanyamin.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/1360\/revisions\/1363"}],"wp:attachment":[{"href":"https:\/\/www.fanyamin.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1360"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.fanyamin.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1360"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.fanyamin.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1360"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}