基于約束型TD3的動(dòng)態(tài)探索噪聲改進(jìn)算法

打開(kāi)文本圖片集
中圖分類號(hào):TP181;TP301.6;TP242 文獻(xiàn)標(biāo)識(shí)碼:A 文章編號(hào):2096-4706(2025)07-0103-06
Abstract: Aiming atthe problem that unconstrained exploration maycause damage to the mobile car,thisstudy proposes a ReinforcementLearning methodthatcombinesadaptive noiseexplorationandLagrangemultiplierconstraints,aiming tooptimize thetrajectoryplaningofthecarreachingthe targetpoint.Thismethodimprovestheexplorationefciencybydynamically adjusting the noise,uses the TD3algorithmtodeal with thecontinuousaction space,and uses the Lagrange multiplier method to deal withtheconstraints,whichis diferentfromthe wayofaddingthepenaltyofunexpectedbehaviordirectlyintheMarkov Decision Process(MDP).Simulation experiments show that this methodcan effectively guidethecar to avoid obstacles,educe theviolationofconstraints,andensurethesafetyandreliabilityofthetask,showinggoodtrainingconvergencecharacteristics.
Keywords: SafetyReinforcementLearning; ConstrainedMarkovDecision Proces;trajectoryplanning;TD3algorithm
0 引言
隨著自動(dòng)化技術(shù)的飛速發(fā)展,機(jī)器人技術(shù)已在工業(yè)制造、服務(wù)業(yè)等眾多領(lǐng)域得以廣泛應(yīng)用[1],成為提升作業(yè)效率與操作精確度的關(guān)鍵要素。(剩余8693字)