Teoria / Otimização

Veja os artigos deste label, com traduções para PT-BR.

Ver todos os labels

Artigos

📐Teoria / Otimização223 artigo(s) encontrados

Limpar filtro

Theory/Optimization • Score 75

A construction of an optimal base for conditional attribute and attributional condition implications in triadic contexts

arXiv:2601.01467v1 Announce Type: new Abstract: This article studies implications in triadic contexts. Specifically, we focus on those introduced by Ganter and Obiedkov, namely conditional attribute and attributional condition implications. Our aim is to construct an optimal base for these implications.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Alinhamento Semântico de Grafos de Conhecimento Multilíngues via Projeções Vetoriais Contextualizadas

O artigo apresenta nosso trabalho em um sistema de alinhamento de ontologias cross-linguais que utiliza correspondência de similaridade coseno baseada em embeddings. As entidades da ontologia são enriquecidas contextualmente por meio de descrições criadas com técnicas inovadoras. Avaliamos nosso trabalho na trilha multifarm OAEI-2022, alcançando 71% de F1 score, indicando a eficácia do nosso pipeline de alinhamento.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Aletheia: Quantifying Cognitive Conviction in Reasoning Models via Regularized Inverse Confusion Matrix

arXiv:2601.01532v1 Announce Type: new Abstract: In the progressive journey toward Artificial General Intelligence (AGI), current evaluation paradigms face an epistemological crisis. Static benchmarks measure knowledge breadth but fail to quantify the depth of belief. While Simhi et al. (2025) defined the CHOKE phenomenon in standard QA, we extend this framework to quantify "Cognitive Conviction" in System 2 reasoning models. We propose Project Aletheia, a cognitive physics framework that employs Tikhonov Regularization to invert the judge's confusion matrix. To validate this methodology without relying on opaque private data, we implement a Synthetic Proxy Protocol. Our preliminary pilot study on 2025 baselines (e.g., DeepSeek-R1, OpenAI o1) suggests that while reasoning models act as a "cognitive buffer," they may exhibit "Defensive OverThinking" under adversarial pressure. Furthermore, we introduce the Aligned Conviction Score (S_aligned) to verify that conviction does not compromise safety. This work serves as a blueprint for measuring AI scientific integrity.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

AI Agent Systems: Architectures, Applications, and Evaluation

arXiv:2601.01743v1 Announce Type: new Abstract: AI agents -- systems that combine foundation models with reasoning, planning, memory, and tool use -- are rapidly becoming a practical interface between natural-language intent and real-world computation. This survey synthesizes the emerging landscape of AI agent architectures across: (i) deliberation and reasoning (e.g., chain-of-thought-style decomposition, self-reflection and verification, and constraint-aware decision making), (ii) planning and control (from reactive policies to hierarchical and multi-step planners), and (iii) tool calling and environment interaction (retrieval, code execution, APIs, and multimodal perception). We organize prior work into a unified taxonomy spanning agent components (policy/LLM core, memory, world models, planners, tool routers, and critics), orchestration patterns (single-agent vs.\ multi-agent; centralized vs.\ decentralized coordination), and deployment settings (offline analysis vs.\ online interactive assistance; safety-critical vs.\ open-ended tasks). We discuss key design trade-offs -- latency vs.\ accuracy, autonomy vs.\ controllability, and capability vs.\ reliability -- and highlight how evaluation is complicated by non-determinism, long-horizon credit assignment, tool and environment variability, and hidden costs such as retries and context growth. Finally, we summarize measurement and benchmarking practices (task suites, human preference and utility metrics, success under constraints, robustness and security) and identify open challenges including verification and guardrails for tool actions, scalable memory and context management, interpretability of agent decisions, and reproducible evaluation under realistic workloads.

Fonte: arXiv cs.AI

NLP/LLMs • Score 90

Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement

arXiv:2601.01562v1 Announce Type: new Abstract: We present Logics-STEM, a state-of-the-art reasoning model fine-tuned on Logics-STEM-SFT-Dataset, a high-quality and diverse dataset at 10M scale that represents one of the largest-scale open-source long chain-of-thought corpora. Logics-STEM targets reasoning tasks in the domains of Science, Technology, Engineering, and Mathematics (STEM), and exhibits exceptional performance on STEM-related benchmarks with an average improvement of 4.68% over the next-best model at 8B scale. We attribute the gains to our data-algorithm co-design engine, where they are jointly optimized to fit a gold-standard distribution behind reasoning. Data-wise, the Logics-STEM-SFT-Dataset is constructed from a meticulously designed data curation engine with 5 stages to ensure the quality, diversity, and scalability, including annotation, deduplication, decontamination, distillation, and stratified sampling. Algorithm-wise, our failure-driven post-training framework leverages targeted knowledge retrieval and data synthesis around model failure regions in the Supervised Fine-tuning (SFT) stage to effectively guide the second-stage SFT or the reinforcement learning (RL) for better fitting the target distribution. The superior empirical performance of Logics-STEM reveals the vast potential of combining large-scale open-source data with carefully designed synthetic data, underscoring the critical role of data-algorithm co-design in enhancing reasoning capabilities through post-training. We make both the Logics-STEM models (8B and 32B) and the Logics-STEM-SFT-Dataset (10M and downsampled 2.2M versions) publicly available to support future research in the open-source community.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

CaveAgent: Transforming LLMs into Stateful Runtime Operators

arXiv:2601.01569v1 Announce Type: new Abstract: LLM-based agents are increasingly capable of complex task execution, yet current agentic systems remain constrained by text-centric paradigms. Traditional approaches rely on procedural JSON-based function calling, which often struggles with long-horizon tasks due to fragile multi-turn dependencies and context drift. In this paper, we present CaveAgent, a framework that transforms the paradigm from "LLM-as-Text-Generator" to "LLM-as-Runtime-Operator." We introduce a Dual-stream Context Architecture that decouples state management into a lightweight semantic stream for reasoning and a persistent, deterministic Python Runtime stream for execution. In addition to leveraging code generation to efficiently resolve interdependent sub-tasks (e.g., loops, conditionals) in a single step, we introduce \textit{Stateful Runtime Management} in CaveAgent. Distinct from existing code-based approaches that remain text-bound and lack the support for external object injection and retrieval, CaveAgent injects, manipulates, and retrieves complex Python objects (e.g., DataFrames, database connections) that persist across turns. This persistence mechanism acts as a high-fidelity external memory to eliminate context drift, avoid catastrophic forgetting, while ensuring that processed data flows losslessly to downstream applications. Comprehensive evaluations on Tau$^2$-bench, BFCL and various case studies across representative SOTA LLMs demonstrate CaveAgent's superiority. Specifically, our framework achieves a 10.5\% success rate improvement on retail tasks and reduces total token consumption by 28.4\% in multi-turn scenarios. On data-intensive tasks, direct variable storage and retrieval reduces token consumption by 59\%, allowing CaveAgent to handle large-scale data that causes context overflow failures in both JSON-based and Code-based agents.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Bayesian Orchestration of Multi-LLM Agents for Cost-Aware Sequential Decision-Making

arXiv:2601.01522v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as autonomous decision agents in settings with asymmetric error costs: hiring (missed talent vs wasted interviews), medical triage (missed emergencies vs unnecessary escalation), and fraud detection (approved fraud vs declined legitimate payments). The dominant design queries a single LLM for a posterior over states, thresholds "confidence," and acts; we prove this is inadequate for sequential decisions with costs. We propose a Bayesian, cost-aware multi-LLM orchestration framework that treats LLMs as approximate likelihood models rather than classifiers. For each candidate state, we elicit likelihoods via contrastive prompting, aggregate across diverse models with robust statistics, and update beliefs with Bayes rule under explicit priors as new evidence arrives. This enables coherent belief updating, expected-cost action selection, principled information gathering via value of information, and fairness gains via ensemble bias mitigation. In resume screening with costs of 40000 USD per missed hire, 2500 USD per interview, and 150 USD per phone screen, experiments on 1000 resumes using five LLMs (GPT-4o, Claude 4.5 Sonnet, Gemini Pro, Grok, DeepSeek) reduce total cost by 294000 USD (34 percent) versus the best single-LLM baseline and improve demographic parity by 45 percent (max group gap 22 to 5 percentage points). Ablations attribute 51 percent of savings to multi-LLM aggregation, 43 percent to sequential updating, and 20 percent to disagreement-triggered information gathering, consistent with the theoretical benefits of correct probabilistic foundations.

Fonte: arXiv cs.AI

MLOps/Systems • Score 85

Roteamento Consciente de Energia para Grandes Modelos de Raciocínio

Modelos de raciocínio grandes (LRMs) apresentam custos de energia de inferência heterogêneos, dependendo do modelo utilizado e da quantidade de raciocínio realizada. Para reduzir o consumo de energia, é crucial escolher o LRM adequado e operá-lo de forma eficiente. O desempenho dos sistemas que distribuem tarefas entre diferentes LRMs individuais depende do equilíbrio entre o fornecimento médio de energia e as flutuações estocásticas.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Structured Decomposition for LLM Reasoning: Cross-Domain Validation and Semantic Web Integration

arXiv:2601.01609v1 Announce Type: new Abstract: Rule-based reasoning over natural language input arises in domains where decisions must be auditable and justifiable: clinical protocols specify eligibility criteria in prose, evidence rules define admissibility through textual conditions, and scientific standards dictate methodological requirements. Applying rules to such inputs demands both interpretive flexibility and formal guarantees. Large language models (LLMs) provide flexibility but cannot ensure consistent rule application; symbolic systems provide guarantees but require structured input. This paper presents an integration pattern that combines these strengths: LLMs serve as ontology population engines, translating unstructured text into ABox assertions according to expert-authored TBox specifications, while SWRL-based reasoners apply rules with deterministic guarantees. The framework decomposes reasoning into entity identification, assertion extraction, and symbolic verification, with task definitions grounded in OWL 2 ontologies. Experiments across three domains (legal hearsay determination, scientific method-task application, clinical trial eligibility) and eleven language models validate the approach. Structured decomposition achieves statistically significant improvements over few-shot prompting in aggregate, with gains observed across all three domains. An ablation study confirms that symbolic verification provides substantial benefit beyond structured prompting alone. The populated ABox integrates with standard semantic web tooling for inspection and querying, positioning the framework for richer inference patterns that simpler formalisms cannot express.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

MathLedger: Um Substrato de Aprendizado Verificável com Feedback Atestado por Ledger

Os sistemas de IA contemporâneos alcançam desempenho extraordinário, mas permanecem opacos e não verificáveis, criando uma crise de confiança para implantações críticas de segurança. Apresentamos o MathLedger, um substrato para cognição de máquina verificável que integra verificação formal, atestação criptográfica e dinâmicas de aprendizado em um único loop epistêmico.

Fonte: arXiv cs.AI

Multimodal • Score 92

Um modelo unificado de compreensão e geração multimodal para pesquisa científica interdisciplinar

A descoberta científica depende cada vez mais da integração de dados heterogêneos e de alta dimensão entre disciplinas. Apresentamos o FuXi-Uni, um modelo nativo unificado que suporta a compreensão científica e a geração de dados multimodais de alta fidelidade, alinhando tokens científicos interdisciplinares com tokens de linguagem natural e empregando um decodificador científico. Validamos o FuXi-Uni em ciências da Terra e biomedicina.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Beyond Gemini-3-Pro: Revisiting LLM Routing and Aggregation at Scale

arXiv:2601.01330v1 Announce Type: new Abstract: Large Language Models (LLMs) have rapidly advanced, with Gemini-3-Pro setting a new performance milestone. In this work, we explore collective intelligence as an alternative to monolithic scaling, and demonstrate that open-source LLMs' collaboration can surpass Gemini-3-Pro. We first revisit LLM routing and aggregation at scale and identify three key bottlenecks: (1) current train-free routers are limited by a query-based paradigm focusing solely on textual similarity; (2) recent aggregation methods remain largely static, failing to select appropriate aggregators for different tasks;(3) the complementarity of routing and aggregation remains underutilized. To address these problems, we introduce JiSi, a novel framework designed to release the full potential of LLMs' collaboration through three innovations: (1) Query-Response Mixed Routing capturing both semantic information and problem difficulty; (2) Support-Set-based Aggregator Selection jointly evaluating the aggregation and domain capacity of aggregators; (3) Adaptive Routing-Aggregation Switch dynamically leveraging the advantages of routing and aggregation. Comprehensive experiments on nine benchmarks demonstrate that JiSi can surpass Gemini-3-Pro with only 47% costs by orchestrating ten open-source LLMs, while outperforming mainstream baselines. It suggests that collective intelligence represents a novel path towards Artificial General Intelligence (AGI).

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Comentário sobre: Seu Cérebro no ChatGPT: Acumulação de Dívida Cognitiva ao Usar um Assistente de IA para Tarefas de Redação de Ensaios

O trabalho recentemente publicado intitulado Seu Cérebro no ChatGPT: Acumulação de Dívida Cognitiva ao Usar um Assistente de IA para Tarefas de Redação de Ensaios, de Kosmyna et al. (2025), gerou um debate intenso sobre inteligência artificial (IA) e desempenho humano. Parabenizamos Kosmyna et al. pela pesquisa importante e pela coleta de um conjunto de dados valioso. Oferecemos comentários construtivos para melhorar a prontidão do manuscrito para publicação revisada por pares.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Colapso de Contexto: Aprendizado em Contexto e Colapso de Modelo

Esta tese investiga dois fenômenos chave em grandes modelos de linguagem (LLMs): aprendizado em contexto (ICL) e colapso de modelo. Estudamos ICL em um transformer linear com pesos atados treinado em tarefas de regressão linear, mostrando que a minimização da perda em contexto leva a uma transição de fase nos parâmetros aprendidos.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Universal Conditional Logic: A Formal Language for Prompt Engineering

arXiv:2601.00880v1 Announce Type: new Abstract: We present Universal Conditional Logic (UCL), a mathematical framework for prompt optimization that transforms prompt engineering from heuristic practice into systematic optimization. Through systematic evaluation (N=305, 11 models, 4 iterations), we demonstrate significant token reduction (29.8%, t(10)=6.36, p < 0.001, Cohen's d = 2.01) with corresponding cost savings. UCL's structural overhead function O_s(A) explains version-specific performance differences through the Over-Specification Paradox: beyond threshold S* = 0.509, additional specification degrades performance quadratically. Core mechanisms -- indicator functions (I_i in {0,1}), structural overhead (O_s = gamma * sum(ln C_k)), early binding -- are validated. Notably, optimal UCL configuration varies by model architecture -- certain models (e.g., Llama 4 Scout) require version-specific adaptations (V4.1). This work establishes UCL as a calibratable framework for efficient LLM interaction, with model-family-specific optimization as a key research direction.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Decomposing LLM Self-Correction: The Accuracy-Correction Paradox and Error Depth Hypothesis

arXiv:2601.00828v1 Announce Type: new Abstract: Large Language Models (LLMs) are widely believed to possess self-correction capabilities, yet recent studies suggest that intrinsic self-correction--where models correct their own outputs without external feedback--remains largely ineffective. In this work, we systematically decompose self-correction into three distinct sub-capabilities: error detection, error localization, and error correction. Through cross-model experiments on GSM8K-Complex (n=500 per model, 346 total errors) with three major LLMs, we uncover a striking Accuracy-Correction Paradox: weaker models (GPT-3.5, 66% accuracy) achieve 1.6x higher intrinsic correction rates than stronger models (DeepSeek, 94% accuracy)--26.8% vs 16.7%. We propose the Error Depth Hypothesis: stronger models make fewer but deeper errors that resist self-correction. Error detection rates vary dramatically across architectures (10% to 82%), yet detection capability does not predict correction success--Claude detects only 10% of errors but corrects 29% intrinsically. Surprisingly, providing error location hints hurts all models. Our findings challenge linear assumptions about model capability and self-improvement, with important implications for the design of self-refinement pipelines.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Theory Trace Card: Theory-Driven Socio-Cognitive Evaluation of LLMs

arXiv:2601.01878v1 Announce Type: new Abstract: Socio-cognitive benchmarks for large language models (LLMs) often fail to predict real-world behavior, even when models achieve high benchmark scores. Prior work has attributed this evaluation-deployment gap to problems of measurement and validity. While these critiques are insightful, we argue that they overlook a more fundamental issue: many socio-cognitive evaluations proceed without an explicit theoretical specification of the target capability, leaving the assumptions linking task performance to competence implicit. Without this theoretical grounding, benchmarks that exercise only narrow subsets of a capability are routinely misinterpreted as evidence of broad competence: a gap that creates a systemic validity illusion by masking the failure to evaluate the capability's other essential dimensions. To address this gap, we make two contributions. First, we diagnose and formalize this theory gap as a foundational failure that undermines measurement and enables systematic overgeneralization of benchmark results. Second, we introduce the Theory Trace Card (TTC), a lightweight documentation artifact designed to accompany socio-cognitive evaluations, which explicitly outlines the theoretical basis of an evaluation, the components of the target capability it exercises, its operationalization, and its limitations. We argue that TTCs enhance the interpretability and reuse of socio-cognitive evaluations by making explicit the full validity chain, which links theory, task operationalization, scoring, and limitations, without modifying benchmarks or requiring agreement on a single theory.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Counterfactual Self-Questioning for Stable Policy Optimization in Language Models

arXiv:2601.00885v1 Announce Type: new Abstract: Recent work on language model self-improvement shows that models can refine their own reasoning through reflection, verification, debate, or self-generated rewards. However, most existing approaches rely on external critics, learned reward models, or ensemble sampling, which increases complexity and training instability. We propose Counterfactual Self-Questioning, a framework in which a single language model generates and evaluates counterfactual critiques of its own reasoning. The method produces an initial reasoning trace, formulates targeted questions that challenge potential failure points, and generates alternative reasoning trajectories that expose incorrect assumptions or invalid steps. These counterfactual trajectories provide structured relative feedback that can be directly used for policy optimization without auxiliary models. Experiments on multiple mathematical reasoning benchmarks show that counterfactual self-questioning improves accuracy and training stability, particularly for smaller models, enabling scalable self-improvement using internally generated supervision alone.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Reading Between the Lines: Deconfounding Causal Estimates using Text Embeddings and Deep Learning

arXiv:2601.01511v1 Announce Type: new Abstract: Estimating causal treatment effects in observational settings is frequently compromised by selection bias arising from unobserved confounders. While traditional econometric methods struggle when these confounders are orthogonal to structured covariates, high-dimensional unstructured text often contains rich proxies for these latent variables. This study proposes a Neural Network-Enhanced Double Machine Learning (DML) framework designed to leverage text embeddings for causal identification. Using a rigorous synthetic benchmark, we demonstrate that unstructured text embeddings capture critical confounding information that is absent from structured tabular data. However, we show that standard tree-based DML estimators retain substantial bias (+24%) due to their inability to model the continuous topology of embedding manifolds. In contrast, our deep learning approach reduces bias to -0.86% with optimized architectures, effectively recovering the ground-truth causal parameter. These findings suggest that deep learning architectures are essential for satisfying the unconfoundedness assumption when conditioning on high-dimensional natural language data

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Enhancing Temporal Awareness in LLMs for Temporal Point Processes

arXiv:2601.00845v1 Announce Type: new Abstract: Temporal point processes (TPPs) are crucial for analyzing events over time and are widely used in fields such as finance, healthcare, and social systems. These processes are particularly valuable for understanding how events unfold over time, accounting for their irregularity and dependencies. Despite the success of large language models (LLMs) in sequence modeling, applying them to temporal point processes remains challenging. A key issue is that current methods struggle to effectively capture the complex interaction between temporal information and semantic context, which is vital for accurate event modeling. In this context, we introduce TPP-TAL (Temporal Point Processes with Enhanced Temporal Awareness in LLMs), a novel plug-and-play framework designed to enhance temporal reasoning within LLMs. Rather than using the conventional method of simply concatenating event time and type embeddings, TPP-TAL explicitly aligns temporal dynamics with contextual semantics before feeding this information into the LLM. This alignment allows the model to better perceive temporal dependencies and long-range interactions between events and their surrounding contexts. Through comprehensive experiments on several benchmark datasets, it is shown that TPP-TAL delivers substantial improvements in temporal likelihood estimation and event prediction accuracy, highlighting the importance of enhancing temporal awareness in LLMs for continuous-time event modeling. The code is made available at https://github.com/chenlilil/TPP-TAL

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

XAI-MeD: Explainable Knowledge Guided Neuro-Symbolic Framework for Domain Generalization and Rare Class Detection in Medical Imaging

arXiv:2601.02008v1 Announce Type: new Abstract: Explainability domain generalization and rare class reliability are critical challenges in medical AI where deep models often fail under real world distribution shifts and exhibit bias against infrequent clinical conditions This paper introduces XAIMeD an explainable medical AI framework that integrates clinically accurate expert knowledge into deep learning through a unified neuro symbolic architecture XAIMeD is designed to improve robustness under distribution shift enhance rare class sensitivity and deliver transparent clinically aligned interpretations The framework encodes clinical expertise as logical connectives over atomic medical propositions transforming them into machine checkable class specific rules Their diagnostic utility is quantified through weighted feature satisfaction scores enabling a symbolic reasoning branch that complements neural predictions A confidence weighted fusion integrates symbolic and deep outputs while a Hunt inspired adaptive routing mechanism guided by Entropy Imbalance Gain EIG and Rare Class Gini mitigates class imbalance high intra class variability and uncertainty We evaluate XAIMeD across diverse modalities on four challenging tasks i Seizure Onset Zone SOZ localization from rs fMRI ii Diabetic Retinopathy grading across 6 multicenter datasets demonstrate substantial performance improvements including 6 percent gains in cross domain generalization and a 10 percent improved rare class F1 score far outperforming state of the art deep learning baselines Ablation studies confirm that the clinically grounded symbolic components act as effective regularizers ensuring robustness to distribution shifts XAIMeD thus provides a principled clinically faithful and interpretable approach to multimodal medical AI.

Fonte: arXiv cs.AI

RL • Score 85

Acelerando a Busca em Árvores de Monte-Carlo com Políticas Posteriores Otimizadas

Introduzimos um algoritmo recursivo de busca em árvore de Monte-Carlo estilo AlphaZero, chamado 'RMCTS'. A vantagem do RMCTS sobre o MCTS-UCB do AlphaZero é a velocidade. O RMCTS explora a árvore de busca de maneira em largura, permitindo que as inferências da rede ocorram em grandes lotes, reduzindo significativamente o custo de latência da GPU.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models

arXiv:2601.01321v1 Announce Type: new Abstract: Digital twins, as precise digital representations of physical systems, have evolved from passive simulation tools into intelligent and autonomous entities through the integration of artificial intelligence technologies. This paper presents a unified four-stage framework that systematically characterizes AI integration across the digital twin lifecycle, spanning modeling, mirroring, intervention, and autonomous management. By synthesizing existing technologies and practices, we distill a unified four-stage framework that systematically characterizes how AI methodologies are embedded across the digital twin lifecycle: (1) modeling the physical twin through physics-based and physics-informed AI approaches, (2) mirroring the physical system into a digital twin with real-time synchronization, (3) intervening in the physical twin through predictive modeling, anomaly detection, and optimization strategies, and (4) achieving autonomous management through large language models, foundation models, and intelligent agents. We analyze the synergy between physics-based modeling and data-driven learning, highlighting the shift from traditional numerical solvers to physics-informed and foundation models for physical systems. Furthermore, we examine how generative AI technologies, including large language models and generative world models, transform digital twins into proactive and self-improving cognitive systems capable of reasoning, communication, and creative scenario generation. Through a cross-domain review spanning eleven application domains, including healthcare, aerospace, smart manufacturing, robotics, and smart cities, we identify common challenges related to scalability, explainability, and trustworthiness, and outline directions for responsible AI-driven digital twin systems.

Fonte: arXiv cs.AI

Theory/Optimization • Score 85

CNC-TP: Classifier Nominal Concept Based on Top-Pertinent Attributes

arXiv:2601.01976v1 Announce Type: new Abstract: Knowledge Discovery in Databases (KDD) aims to exploit the vast amounts of data generated daily across various domains of computer applications. Its objective is to extract hidden and meaningful knowledge from datasets through a structured process comprising several key steps: data selection, preprocessing, transformation, data mining, and visualization. Among the core data mining techniques are classification and clustering. Classification involves predicting the class of new instances using a classifier trained on labeled data. Several approaches have been proposed in the literature, including Decision Tree Induction, Bayesian classifiers, Nearest Neighbor search, Neural Networks, Support Vector Machines, and Formal Concept Analysis (FCA). The last one is recognized as an effective approach for interpretable and explainable learning. It is grounded in the mathematical structure of the concept lattice, which enables the generation of formal concepts and the discovery of hidden relationships among them. In this paper, we present a state-of-theart review of FCA-based classifiers. We explore various methods for computing closure operators from nominal data and introduce a novel approach for constructing a partial concept lattice that focuses on the most relevant concepts. Experimental results are provided to demonstrate the efficiency of the proposed method.

Fonte: arXiv cs.AI

MLOps/Systems • Score 85

A New Benchmark for the Appropriate Evaluation of RTL Code Optimization

arXiv:2601.01765v1 Announce Type: new Abstract: The rapid progress of artificial intelligence increasingly relies on efficient integrated circuit (IC) design. Recent studies have explored the use of large language models (LLMs) for generating Register Transfer Level (RTL) code, but existing benchmarks mainly evaluate syntactic correctness rather than optimization quality in terms of power, performance, and area (PPA). This work introduces RTL-OPT, a benchmark for assessing the capability of LLMs in RTL optimization. RTL-OPT contains 36 handcrafted digital designs that cover diverse implementation categories including combinational logic, pipelined datapaths, finite state machines, and memory interfaces. Each task provides a pair of RTL codes, a suboptimal version and a human-optimized reference that reflects industry-proven optimization patterns not captured by conventional synthesis tools. Furthermore, RTL-OPT integrates an automated evaluation framework to verify functional correctness and quantify PPA improvements, enabling standardized and meaningful assessment of generative models for hardware design optimization.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Alinhamento de Admissibilidade

Este artigo introduz o Alinhamento de Admissibilidade: uma reformulação do alinhamento de IA como uma propriedade de seleção de ações e decisões admissíveis sobre distribuições de resultados sob incerteza, avaliada através do comportamento de políticas candidatas.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

ChaosBench-Logic: A Benchmark for Logical and Symbolic Reasoning on Chaotic Dynamical Systems

arXiv:2601.01982v1 Announce Type: new Abstract: Large language models (LLMs) excel at natural language tasks but remain brittle in domains requiring precise logical and symbolic reasoning. Chaotic dynamical systems provide an especially demanding test because chaos is deterministic yet often misinterpreted as randomness or complexity. We introduce ChaosBench-Logic, a benchmark that evaluates LLM reasoning across 30 diverse dynamical systems using a unified first-order logic (FOL) ontology. Each system is annotated with truth assignments for 11 semantic predicates, and 621 questions are generated across seven reasoning categories, including multi-hop implications, cross-system analogies, counterfactual reasoning, bias probes, and multi-turn dialogues. We define metrics for logical accuracy, implication consistency, dialogue coherence, and contradiction, and we release an open-source evaluation pipeline. Initial experiments show that frontier LLMs such as GPT-4, Claude 3.5 Sonnet, Gemini 2.5 Flash, and the open-source LLaMA-3 70B achieve 91-94% per-item accuracy, yet still score 0% on compositional items and exhibit fragile global coherence. Dialogue-level accuracy ranges from 53.1% (GPT-4 CoT) to 75.5% (LLaMA-3 zero-shot). ChaosBench-Logic provides a rigorous testbed for diagnosing such failures and a foundation for developing neuro-symbolic approaches that improve scientific reasoning in LLMs.

Fonte: arXiv cs.AI

Multimodal • Score 85

MMP-A*: Multimodal Perception Enhanced Incremental Heuristic Search on Path Planning

arXiv:2601.01910v1 Announce Type: new Abstract: Autonomous path planning requires a synergy between global reasoning and geometric precision, especially in complex or cluttered environments. While classical A* is valued for its optimality, it incurs prohibitive computational and memory costs in large-scale scenarios. Recent attempts to mitigate these limitations by using Large Language Models for waypoint guidance remain insufficient, as they rely only on text-based reasoning without spatial grounding. As a result, such models often produce incorrect waypoints in topologically complex environments with dead ends, and lack the perceptual capacity to interpret ambiguous physical boundaries. These inconsistencies lead to costly corrective expansions and undermine the intended computational efficiency. We introduce MMP-A*, a multimodal framework that integrates the spatial grounding capabilities of vision-language models with a novel adaptive decay mechanism. By anchoring high-level reasoning in physical geometry, the framework produces coherent waypoint guidance that addresses the limitations of text-only planners. The adaptive decay mechanism dynamically regulates the influence of uncertain waypoints within the heuristic, ensuring geometric validity while substantially reducing memory overhead. To evaluate robustness, we test the framework in challenging environments characterized by severe clutter and topological complexity. Experimental results show that MMP-A* achieves near-optimal trajectories with significantly reduced operational costs, demonstrating its potential as a perception-grounded and computationally efficient paradigm for autonomous navigation.

Fonte: arXiv cs.AI

RL • Score 85

Reinforcement Learning Enhanced Multi-hop Reasoning for Temporal Knowledge Question Answering

arXiv:2601.01195v1 Announce Type: new Abstract: Temporal knowledge graph question answering (TKGQA) involves multi-hop reasoning over temporally constrained entity relationships in the knowledge graph to answer a given question. However, at each hop, large language models (LLMs) retrieve subgraphs with numerous temporally similar and semantically complex relations, increasing the risk of suboptimal decisions and error propagation. To address these challenges, we propose the multi-hop reasoning enhanced (MRE) framework, which enhances both forward and backward reasoning to improve the identification of globally optimal reasoning trajectories. Specifically, MRE begins with prompt engineering to guide the LLM in generating diverse reasoning trajectories for a given question. Valid reasoning trajectories are then selected for supervised fine-tuning, serving as a cold-start strategy. Finally, we introduce Tree-Group Relative Policy Optimization (T-GRPO), a recursive, tree-structured learning-by-exploration approach. At each hop, exploration establishes strong causal dependencies on the previous hop, while evaluation is informed by multi-path exploration feedback from subsequent hops. Experimental results on two TKGQA benchmarks indicate that the proposed MRE-based model consistently surpasses state-of-the-art (SOTA) approaches in handling complex multi-hop queries. Further analysis highlights improved interpretability and robustness to noisy temporal annotations.

Fonte: arXiv cs.AI

NLP/LLMs • Score 85

Simulated Reasoning is Reasoning

arXiv:2601.02043v1 Announce Type: new Abstract: Reasoning has long been understood as a pathway between stages of understanding. Proper reasoning leads to understanding of a given subject. This reasoning was conceptualized as a process of understanding in a particular way, i.e., "symbolic reasoning". Foundational Models (FM) demonstrate that this is not a necessary condition for many reasoning tasks: they can "reason" by way of imitating the process of "thinking out loud", testing the produced pathways, and iterating on these pathways on their own. This leads to some form of reasoning that can solve problems on its own or with few-shot learning, but appears fundamentally different from human reasoning due to its lack of grounding and common sense, leading to brittleness of the reasoning process. These insights promise to substantially alter our assessment of reasoning and its necessary conditions, but also inform the approaches to safety and robust defences against this brittleness of FMs. This paper offers and discusses several philosophical interpretations of this phenomenon, argues that the previously apt metaphor of the "stochastic parrot" has lost its relevance and thus should be abandoned, and reflects on different normative elements in the safety- and appropriateness-considerations emerging from these reasoning models and their growing capacity.

Fonte: arXiv cs.AI

NLP/LLMs • Score 95

Language as Mathematical Structure: Examining Semantic Field Theory Against Language Games

arXiv:2601.00448v1 Announce Type: new Abstract: Large language models (LLMs) offer a new empirical setting in which long-standing theories of linguistic meaning can be examined. This paper contrasts two broad approaches: social constructivist accounts associated with language games, and a mathematically oriented framework we call Semantic Field Theory. Building on earlier work by the author, we formalize the notions of lexical fields (Lexfelder) and linguistic fields (Lingofelder) as interacting structures in a continuous semantic space. We then analyze how core properties of transformer architectures-such as distributed representations, attention mechanisms, and geometric regularities in embedding spaces-relate to these concepts. We argue that the success of LLMs in capturing semantic regularities supports the view that language exhibits an underlying mathematical structure, while their persistent limitations in pragmatic reasoning and context sensitivity are consistent with the importance of social grounding emphasized in philosophical accounts of language use. On this basis, we suggest that mathematical structure and language games can be understood as complementary rather than competing perspectives. The resulting framework clarifies the scope and limits of purely statistical models of language and motivates new directions for theoretically informed AI architectures.

Fonte: arXiv cs.CL

NLP/LLMs • Score 96

Ajuste Fino Online de Decision Transformers com Gradientes de RL Puro

Os Decision Transformers (DTs) surgiram como um poderoso framework para tomada de decisão sequencial, formulando o aprendizado por reforço offline (RL) como um problema de modelagem de sequência. No entanto, a extensão dos DTs para configurações online com gradientes de RL puro permanece amplamente inexplorada. Identificamos o relabeling de retorno retrospectivo como um obstáculo crítico para o ajuste fino baseado em RL.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

Construindo um Matemático Neuro-Simbólico a Partir de Princípios Fundamentais

Modelos de Linguagem de Grande Escala (LLMs) apresentam falhas lógicas persistentes em raciocínios complexos devido à falta de um framework axiomático interno. Propomos o Mathesis, uma arquitetura neuro-simbólica que codifica estados matemáticos como hipergrafos de ordem superior e utiliza um Kernel de Raciocínio Simbólico (SRK), um motor lógico diferenciável que mapeia restrições para uma paisagem de energia contínua.

Fonte: arXiv cs.AI

RL • Score 95

Mitigando o viés otimista na estimativa e otimização de risco entrópico

A medida de risco entrópico é amplamente utilizada em decisões críticas em economia, ciência da gestão, finanças e sistemas de controle críticos, pois captura riscos extremos associados a perdas incertas. Este trabalho apresenta um procedimento de bootstrap paramétrico que corrige o viés do estimador empírico de risco entrópico, melhorando a precisão na tomada de decisões.

Fonte: arXiv stat.ML

NLP/LLMs • Score 95

PolarGrad: Uma Classe de Otimizadores de Gradiente de Matriz a Partir de uma Perspectiva Unificadora de Pré-condicionamento

O aumento contínuo da escala dos modelos de deep learning e dos dados de treinamento destaca a importância crítica de métodos de otimização eficientes. Neste artigo, introduzimos um framework unificador para analisar métodos de pré-condicionamento 'conscientes de matriz', levando a uma nova classe de métodos de otimização que demonstram convergência mais rápida.

Fonte: arXiv stat.ML

RL • Score 95

Projetando uma Rede de Sensores Ótima Através da Minimização da Perda de Informação

O design experimental ótimo é um tópico clássico em estatística, com muitos problemas e soluções bem estudados. Este trabalho investiga o posicionamento de sensores para monitorar processos espaço-temporais, considerando a dimensão temporal em nossa modelagem e otimização. Apresentamos um novo critério de posicionamento de sensores baseado em modelo, juntamente com um algoritmo de otimização altamente eficiente.

Fonte: arXiv stat.ML

Vision • Score 93

Redes Neurais Espinhadas Personalizadas com Sinapses Ferroelectricas para Processamento de Sinais EEG

As interfaces cérebro-computador (BCIs) baseadas em eletroencefalografia (EEG) são fortemente afetadas por sinais neurais não estacionários, limitando a generalização de modelos independentes de sujeito. Este trabalho demonstra que redes neurais espinhadas (SNNs) podem ser implementadas em dispositivos sinápticos memristivos ferroelectricos para decodificação adaptativa de imagens motoras baseadas em EEG, mesmo sob restrições de dispositivo.

Fonte: arXiv cs.AI

NLP/LLMs • Score 95

Limite de Largura Infinita de uma Única Camada de Atenção: Análise via Programas Tensor

No presente artigo, identificamos rigorosamente a distribuição do limite de largura infinita de variáveis dentro de uma única camada de atenção, utilizando o framework Tensor Programs. Derivamos a forma exata dessa lei limite, demonstrando que ela se desvia fundamentalmente da Gaussianidade. Nossos experimentos numéricos validam as previsões teóricas, confirmando a eficácia da teoria em largura finita e a descrição precisa de atenções com cabeçotes finitos.

Fonte: arXiv stat.ML

Vision • Score 95

Simulação como Supervisão: Pré-treinamento Mecânico para Descoberta Científica

A modelagem científica enfrenta um trade-off entre a interpretabilidade da teoria mecanicista e o poder preditivo do machine learning. Apresentamos as Simulation-Grounded Neural Networks (SGNNs), um framework que incorpora conhecimento de domínio nos dados de treinamento, permitindo que o modelo aprenda padrões amplos de possibilidade física e seja mais robusto a erros de especificação do modelo.

Fonte: arXiv stat.ML

Vision • Score 96

Aprendizado por Reforço Multiagente para Jogos de Liquidez

Este trabalho explora o uso de métodos de enxame na modelagem de liquidez dos mercados financeiros, unindo Jogos de Liquidez e Enxames Racionais. A pesquisa propõe um modelo teórico onde agentes independentes maximizam a liquidez do mercado sem necessidade de coordenação, contribuindo para a eficiência do mercado e a lucratividade individual.

Fonte: arXiv cs.AI

Vision • Score 96

Detecção Adaptativa de Coordenação Causal para Mídias Sociais: Um Framework Guiado por Memória com Aprendizado Semi-Supervisionado

Detectar comportamentos inautênticos coordenados em mídias sociais é um desafio crítico. Propomos o framework Adaptive Causal Coordination Detection (ACCD), que utiliza uma arquitetura progressiva em três estágios para aprender e reter configurações de detecção otimizadas. O ACCD melhora a identificação de relações causais e reduz a necessidade de rotulagem manual, alcançando um F1-score de 87,3% em detecção de ataques coordenados.

Fonte: arXiv cs.AI

NLP/LLMs • Score 95

Learning Speech Representations with Variational Predictive Coding

arXiv:2601.00100v1 Announce Type: cross Abstract: Despite being the best known objective for learning speech representations, the HuBERT objective has not been further developed and improved. We argue that it is the lack of an underlying principle that stalls the development, and, in this paper, we show that predictive coding under a variational view is the principle behind the HuBERT objective. Due to its generality, our formulation provides opportunities to improve parameterization and optimization, and we show two simple modifications that bring immediate improvements to the HuBERT objective. In addition, the predictive coding formulation has tight connections to various other objectives, such as APC, CPC, wav2vec, and BEST-RQ. Empirically, the improvement in pre-training brings significant improvements to four downstream tasks: phone classification, f0 tracking, speaker recognition, and automatic speech recognition, highlighting the importance of the predictive coding interpretation.

Fonte: arXiv cs.CL

NLP/LLMs • Score 95

Comparative Efficiency Analysis of Lightweight Transformer Models: A Multi-Domain Empirical Benchmark for Enterprise NLP Deployment

arXiv:2601.00444v1 Announce Type: new Abstract: In the rapidly evolving landscape of enterprise natural language processing (NLP), the demand for efficient, lightweight models capable of handling multi-domain text automation tasks has intensified. This study conducts a comparative analysis of three prominent lightweight Transformer models - DistilBERT, MiniLM, and ALBERT - across three distinct domains: customer sentiment classification, news topic classification, and toxicity and hate speech detection. Utilizing datasets from IMDB, AG News, and the Measuring Hate Speech corpus, we evaluated performance using accuracy-based metrics including accuracy, precision, recall, and F1-score, as well as efficiency metrics such as model size, inference time, throughput, and memory usage. Key findings reveal that no single model dominates all performance dimensions. ALBERT achieves the highest task-specific accuracy in multiple domains, MiniLM excels in inference speed and throughput, and DistilBERT demonstrates the most consistent accuracy across tasks while maintaining competitive efficiency. All results reflect controlled fine-tuning under fixed enterprise-oriented constraints rather than exhaustive hyperparameter optimization. These results highlight trade-offs between accuracy and efficiency, recommending MiniLM for latency-sensitive enterprise applications, DistilBERT for balanced performance, and ALBERT for resource-constrained environments.

Fonte: arXiv cs.CL

NLP/LLMs • Score 95

Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach

arXiv:2601.00388v1 Announce Type: new Abstract: Recent advances in vision-language models have opened up new possibilities for reasoning-driven image geolocalization. However, existing approaches often rely on synthetic reasoning annotations or external image retrieval, which can limit interpretability and generalizability. In this paper, we present Geo-R, a retrieval-free framework that uncovers structured reasoning paths from existing ground-truth coordinates and optimizes geolocation accuracy via reinforcement learning. We propose the Chain of Region, a rule-based hierarchical reasoning paradigm that generates precise, interpretable supervision by mapping GPS coordinates to geographic entities (e.g., country, province, city) without relying on model-generated or synthetic labels. Building on this, we introduce a lightweight reinforcement learning strategy with coordinate-aligned rewards based on Haversine distance, enabling the model to refine predictions through spatially meaningful feedback. Our approach bridges structured geographic reasoning with direct spatial supervision, yielding improved localization accuracy, stronger generalization, and more transparent inference. Experimental results across multiple benchmarks confirm the effectiveness of Geo-R, establishing a new retrieval-free paradigm for scalable and interpretable image geolocalization. To facilitate further research and ensure reproducibility, both the model and code will be made publicly available.

Fonte: arXiv cs.CL

NLP/LLMs • Score 95

Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak

arXiv:2601.00213v1 Announce Type: cross Abstract: The widespread deployment of large language models (LLMs) has raised growing concerns about their misuse risks and associated safety issues. While prior studies have examined the safety of LLMs in general usage, code generation, and agent-based applications, their vulnerabilities in automated algorithm design remain underexplored. To fill this gap, this study investigates this overlooked safety vulnerability, with a particular focus on intelligent optimization algorithm design, given its prevalent use in complex decision-making scenarios. We introduce MalOptBench, a benchmark consisting of 60 malicious optimization algorithm requests, and propose MOBjailbreak, a jailbreak method tailored for this scenario. Through extensive evaluation of 13 mainstream LLMs including the latest GPT-5 and DeepSeek-V3.1, we reveal that most models remain highly susceptible to such attacks, with an average attack success rate of 83.59% and an average harmfulness score of 4.28 out of 5 on original harmful prompts, and near-complete failure under MOBjailbreak. Furthermore, we assess state-of-the-art plug-and-play defenses that can be applied to closed-source models, and find that they are only marginally effective against MOBjailbreak and prone to exaggerated safety behaviors. These findings highlight the urgent need for stronger alignment techniques to safeguard LLMs against misuse in algorithm design.

Fonte: arXiv cs.CL

NLP/LLMs • Score 96

O Cavalo de Troia no Vocabulário: Sabotagem Sutil da Composição de LLM

O ecossistema de LLM de pesos abertos é cada vez mais definido por técnicas de composição de modelos que remixam capacidades de diversas fontes. Um pré-requisito crítico para aplicar esses métodos é o transplante de tokenizers, que alinha vocabulários incompatíveis a um espaço de embedding compartilhado. Demonstramos que esse passo de interoperabilidade introduz uma vulnerabilidade na cadeia de suprimentos.

Fonte: arXiv cs.LG

Theory/Optimization • Score 92

Inferência de Variáveis Instrumentais Não Paramétricas com Muitos Instrumentos Fracos

Estudamos a inferência em funcionais lineares no problema de variável instrumental não paramétrica (NPIV) com um instrumento de valor discreto sob um regime assintótico de muitos instrumentos fracos, onde o número de valores do instrumento cresce com o tamanho da amostra. Um exemplo motivador chave é a estimativa de efeitos causais de longo prazo em um novo experimento com apenas resultados de curto prazo.

Fonte: arXiv stat.ML

RL • Score 96

ClinicalReTrial: Um Agente de IA Autoevolutivo para Otimização de Protocolos de Ensaios Clínicos

O fracasso de ensaios clínicos continua sendo um gargalo central no desenvolvimento de medicamentos, onde pequenas falhas no design do protocolo podem comprometer irreversivelmente os resultados. Este artigo propõe o ClinicalReTrial, um framework de agente de IA autoevolutivo que aborda essa lacuna ao tratar o raciocínio de ensaios clínicos como um problema iterativo de redesign de protocolo.

Fonte: arXiv cs.AI

NLP/LLMs • Score 95

From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning

arXiv:2601.00215v1 Announce Type: cross Abstract: Reinforcement learning (RL) has emerged as a promising approach for eliciting reasoning chains before generating final answers. However, multimodal large language models (MLLMs) generate reasoning that lacks integration of visual information. This limits their ability to solve problems that demand accurate visual perception, such as visual puzzles. We show that visual perception is the key bottleneck in such tasks: converting images into textual descriptions significantly improves performance, yielding gains of 26.7% for Claude 3.5 and 23.6% for Claude 3.7. To address this, we investigate reward-driven RL as a mechanism to unlock long visual reasoning in open-source MLLMs without requiring costly supervision. We design and evaluate six reward functions targeting different reasoning aspects, including image understanding, thinking steps, and answer accuracy. Using group relative policy optimization (GRPO), our approach explicitly incentivizes longer, structured reasoning and mitigates bypassing of visual information. Experiments on Qwen-2.5-VL-7B achieve 5.56% improvements over the base model, with consistent gains across both in-domain and out-of-domain settings.

Fonte: arXiv cs.CL

NLP/LLMs • Score 95

Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning

arXiv:2601.00791v1 Announce Type: cross Abstract: We present a training-free method for detecting valid mathematical reasoning in large language models through spectral analysis of attention patterns. By treating attention matrices as adjacency matrices of dynamic graphs over tokens, we extract four interpretable spectral diagnostics, the Fiedler value (algebraic connectivity), high-frequency energy ratio (HFER), graph signal smoothness, and spectral entropy, that exhibit statistically significant differences between valid and invalid mathematical proofs. Experiments across seven transformer models from four independent architectural families (Meta Llama, Alibaba Qwen, Microsoft Phi, and Mistral AI) demonstrate that this spectral signature produces effect sizes up to Cohen's $d = 3.30$ ($p < 10^{-116}$), enabling 85.0--95.6\% classification accuracy under rigorous evaluation, with calibrated thresholds reaching 93--95\% on the full dataset. The method requires no training data, fine-tuning, or learned classifiers: a single threshold on a spectral metric suffices for high accuracy. Through systematic label correction, we discover that the spectral method detects logical coherence rather than compiler acceptance, identifying mathematically valid proofs that formal verifiers reject due to technical failures. We further identify an architectural dependency: Mistral-7B's Sliding Window Attention shifts the discriminative signal from HFER to late-layer Smoothness ($d = 2.09$, $p_{\text{MW}} = 1.16 \times 10^{-48}$), revealing that attention mechanism design affects which spectral features capture reasoning validity. These findings establish spectral graph analysis as a principled framework for reasoning verification with immediate applications to hallucination detection and AI safety monitoring.

Fonte: arXiv cs.CL

NLP/LLMs • Score 96

DA-DPO: Otimização de Preferências Consciente da Dificuldade e Custo-Eficiente para Reduzir Alucinações em MLLMs

O Direct Preference Optimization (DPO) demonstrou grande potencial para mitigar alucinações em Multimodal Large Language Models (MLLMs). No entanto, abordagens existentes frequentemente sofrem com overfitting devido ao desequilíbrio de dificuldade nos dados de preferência. Propomos o Difficulty-Aware Direct Preference Optimization (DA-DPO), um framework custo-efetivo que equilibra o processo de aprendizado.

Fonte: arXiv cs.AI

Vision • Score 95

Context-Aware Pesticide Recommendation via Few-Shot Pest Recognition for Precision Agriculture

arXiv:2601.00243v1 Announce Type: new Abstract: Effective pest management is crucial for enhancing agricultural productivity, especially for crops such as sugarcane and wheat that are highly vulnerable to pest infestations. Traditional pest management methods depend heavily on manual field inspections and the use of chemical pesticides. These approaches are often costly, time-consuming, labor-intensive, and can have a negative impact on the environment. To overcome these challenges, this study presents a lightweight framework for pest detection and pesticide recommendation, designed for low-resource devices such as smartphones and drones, making it suitable for use by small and marginal farmers. The proposed framework includes two main components. The first is a Pest Detection Module that uses a compact, lightweight convolutional neural network (CNN) combined with prototypical meta-learning to accurately identify pests even when only a few training samples are available. The second is a Pesticide Recommendation Module that incorporates environmental factors like crop type and growth stage to suggest safe and eco-friendly pesticide recommendations. To train and evaluate our framework, a comprehensive pest image dataset was developed by combining multiple publicly available datasets. The final dataset contains samples with different viewing angles, pest sizes, and background conditions to ensure strong generalization. Experimental results show that the proposed lightweight CNN achieves high accuracy, comparable to state-of-the-art models, while significantly reducing computational complexity. The Decision Support System additionally improves pest management by reducing dependence on traditional chemical pesticides and encouraging sustainable practices, demonstrating its potential for real-time applications in precision agriculture.

Fonte: arXiv cs.CV

RL • Score 96

Modelagem de Estratégia Baseada em Regras Quantitativas no Classic Indian Rummy: Uma Abordagem de Otimização Métrica

O variante de 13 cartas do Classic Indian Rummy é um jogo sequencial de informação incompleta que requer raciocínio probabilístico e tomada de decisão combinatória. Este artigo propõe um framework baseado em regras para jogo estratégico, impulsionado por uma nova métrica de avaliação de mãos chamada MinDist.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

MethConvTransformer: Um Framework de Deep Learning para Detecção de Doença de Alzheimer em Múltiplos Tecidos

A doença de Alzheimer (AD) é um distúrbio neurodegenerativo multifatorial caracterizado por declínio cognitivo progressivo. O MethConvTransformer é um framework de deep learning baseado em transformer que integra perfis de metilação de DNA de tecidos cerebrais e periféricos, permitindo a descoberta de biomarcadores. O modelo supera as abordagens convencionais de machine learning, oferecendo biomarcadores epigenéticos robustos e interpretabilidade multi-resolução.

Fonte: arXiv cs.AI

Theory/Optimization • Score 90

Garantias de Aproximação Mais Fortes para Maximização Não-Monótona de Funções $eta$-Fracas DR-Submodulares

Maximizar objetivos submodulares sob restrições é um problema fundamental em machine learning e otimização. Estudamos a maximização de uma função $eta$-fraca DR-submodular não negativa e não monótona sobre um corpo convexo fechado para baixo. Nosso principal resultado é um algoritmo de aproximação cuja garantia depende suavemente de $eta$.

Fonte: arXiv cs.LG

Vision • Score 96

Avaliação de Detectores de Anomalias para Problemas de Classificação Industrial Altamente Desequilibrados Simulados

O machine learning oferece soluções potenciais para problemas atuais em sistemas industriais, como controle de qualidade e manutenção preditiva, mas enfrenta barreiras únicas em aplicações industriais. Este artigo apresenta uma avaliação abrangente de algoritmos de detecção de anomalias usando um conjunto de dados simulado que reflete restrições de engenharia do mundo real.

Fonte: arXiv cs.AI

NLP/LLMs • Score 93

Correspondência Perfeita de Peso Mínimo Neural para Códigos de Erro Quântico

A realização do potencial completo da computação quântica requer a Correção de Erros Quânticos (QEC). A QEC reduz as taxas de erro ao codificar informações lógicas em qubits físicos redundantes. Neste trabalho, propomos um decodificador orientado a dados chamado Correspondência Perfeita de Peso Mínimo Neural (NMWPM), que utiliza uma arquitetura híbrida para prever pesos de arestas dinâmicos, demonstrando uma redução significativa na Taxa de Erro Lógica (LER) em comparação com as referências padrão.

Fonte: arXiv cs.AI

Theory/Optimization • Score 92

SingBAG Pro: Accelerating point cloud-based iterative reconstruction for 3D photoacoustic imaging under arbitrary array

arXiv:2601.00551v1 Announce Type: new Abstract: High-quality three-dimensional (3D) photoacoustic imaging (PAI) is gaining increasing attention in clinical applications. To address the challenges of limited space and high costs, irregular geometric transducer arrays that conform to specific imaging regions are promising for achieving high-quality 3D PAI with fewer transducers. However, traditional iterative reconstruction algorithms struggle with irregular array configurations, suffering from high computational complexity, substantial memory requirements, and lengthy reconstruction times. In this work, we introduce SlingBAG Pro, an advanced reconstruction algorithm based on the point cloud iteration concept of the Sliding ball adaptive growth (SlingBAG) method, while extending its compatibility to arbitrary array geometries. SlingBAG Pro maintains high reconstruction quality, reduces the number of required transducers, and employs a hierarchical optimization strategy that combines zero-gradient filtering with progressively increased temporal sampling rates during iteration. This strategy rapidly removes redundant spatial point clouds, accelerates convergence, and significantly shortens overall reconstruction time. Compared to the original SlingBAG algorithm, SlingBAG Pro achieves up to a 2.2-fold speed improvement in point cloud-based 3D PA reconstruction under irregular array geometries. The proposed method is validated through both simulation and in vivo mouse experiments, and the source code is publicly available at https://github.com/JaegerCQ/SlingBAG_Pro.

Fonte: arXiv cs.CV

RL • Score 95

Disentangling Hardness from Noise: An Uncertainty-Driven Model-Agnostic Framework for Long-Tailed Remote Sensing Classification

arXiv:2601.00278v1 Announce Type: new Abstract: Long-Tailed distributions are pervasive in remote sensing due to the inherently imbalanced occurrence of grounded objects. However, a critical challenge remains largely overlooked, i.e., disentangling hard tail data samples from noisy ambiguous ones. Conventional methods often indiscriminately emphasize all low-confidence samples, leading to overfitting on noisy data. To bridge this gap, building upon Evidential Deep Learning, we propose a model-agnostic uncertainty-aware framework termed DUAL, which dynamically disentangles prediction uncertainty into Epistemic Uncertainty (EU) and Aleatoric Uncertainty (AU). Specifically, we introduce EU as an indicator of sample scarcity to guide a reweighting strategy for hard-to-learn tail samples, while leveraging AU to quantify data ambiguity, employing an adaptive label smoothing mechanism to suppress the impact of noise. Extensive experiments on multiple datasets across various backbones demonstrate the effectiveness and generalization of our framework, surpassing strong baselines such as TGN and SADE. Ablation studies provide further insights into the crucial choices of our design.

Fonte: arXiv cs.CV

Vision • Score 95

OmniVaT: Single Domain Generalization for Multimodal Visual-Tactile Learning

arXiv:2601.00352v1 Announce Type: new Abstract: Visual-tactile learning (VTL) enables embodied agents to perceive the physical world by integrating visual (VIS) and tactile (TAC) sensors. However, VTL still suffers from modality discrepancies between VIS and TAC images, as well as domain gaps caused by non-standardized tactile sensors and inconsistent data collection procedures. We formulate these challenges as a new task, termed single domain generalization for multimodal VTL (SDG-VTL). In this paper, we propose an OmniVaT framework that, for the first time, successfully addresses this task. On the one hand, OmniVaT integrates a multimodal fractional Fourier adapter (MFFA) to map VIS and TAC embeddings into a unified embedding-frequency space, thereby effectively mitigating the modality gap without multi-domain training data or careful cross-modal fusion strategies. On the other hand, it also incorporates a discrete tree generation (DTG) module that obtains diverse and reliable multimodal fractional representations through a hierarchical tree structure, thereby enhancing its adaptivity to fluctuating domain shifts in unseen domains. Extensive experiments demonstrate the superior cross-domain generalization performance of OmniVaT on the SDG-VTL task.

Fonte: arXiv cs.CV

NLP/LLMs • Score 96

Pergunte, Esclareça, Otimize: Colaboração Humano-Agente LLM para um Controle de Estoque Mais Inteligente

A gestão de estoque continua sendo um desafio para muitas pequenas e médias empresas que carecem de expertise para implementar métodos avançados de otimização. Este artigo investiga se os Large Language Models (LLMs) podem ajudar a superar essa lacuna, propondo um framework híbrido que separa rigorosamente o raciocínio semântico do cálculo matemático.

Fonte: arXiv cs.AI

Evaluation/Benchmarks • Score 95

Reparametrização Categórica com Modelos de Difusão Denoising

A otimização baseada em gradiente com variáveis categóricas geralmente depende de estimadores de função de pontuação, que são imparciais, mas ruidosos, ou de relaxamentos contínuos que substituem a distribuição discreta por um substituto suave. Neste artigo, estendemos essa família de relaxamentos introduzindo uma reparametrização suave baseada em difusão para distribuições categóricas, permitindo um sampler de difusão sem treinamento.

Fonte: arXiv stat.ML

RL • Score 96

IA Generativa Nativa em Nuvem para Síntese Automatizada de Planogramas: Uma Abordagem de Modelo de Difusão para Otimização de Varejo em Múltiplas Lojas

A criação de planogramas é um desafio significativo para o varejo, exigindo em média 30 horas por layout complexo. Este artigo apresenta uma arquitetura nativa em nuvem utilizando modelos de difusão para gerar automaticamente planogramas específicos para cada loja, aprendendo com arranjos de prateleiras bem-sucedidos em várias localizações de varejo.

Fonte: arXiv cs.LG

RL • Score 93

GRL-SNAM: Aprendizado de Reforço Geométrico com Hamiltonianos Diferenciais de Caminho para Navegação e Mapeamento Simultâneos em Ambientes Desconhecidos

Apresentamos o GRL-SNAM, um framework de aprendizado de reforço geométrico para Navegação e Mapeamento Simultâneos (SNAM) em ambientes desconhecidos. O problema de SNAM é desafiador, pois requer o design de políticas hierárquicas ou conjuntas de múltiplos agentes que controlam o movimento de um robô em um ambiente sem mapa, adquirindo informações por meio de sensores.

Fonte: arXiv cs.LG

NLP/LLMs • Score 96

Framework de Otimização Bayesiana Dinâmica para Ajuste de Instruções na Descoberta de Equações Diferenciais Parciais

Modelos de Linguagem de Grande Escala (LLMs) mostram potencial para a descoberta de equações, mas suas saídas são altamente sensíveis à formulação dos prompts, um fenômeno que chamamos de fragilidade das instruções. Para resolver isso, propomos o NeuroSymBO, que reformula a engenharia de prompts como um problema de decisão sequencial.

Fonte: arXiv cs.LG

RL • Score 96

Yahtzee: Técnicas de Aprendizado por Reforço para Jogos Combinatórios Estocásticos

Yahtzee é um jogo clássico de dados com uma estrutura estocástica e combinatória, apresentando recompensas atrasadas, o que o torna um interessante benchmark de RL em escala média. Este trabalho formula Yahtzee como um Processo de Decisão de Markov (MDP) e treina agentes de auto-jogo utilizando diversos métodos de gradiente de política.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

Aprendizado por Reforço com Aproximação de Função para Processos Não-Markovianos

Estudamos métodos de aprendizado por reforço com aproximação de função linear sob processos de estado e custo não-Markovianos. Consideramos inicialmente o método de avaliação de política e demonstramos que o algoritmo converge sob condições adequadas de ergodicidade. Além disso, mostramos que o limite corresponde ao ponto fixo de um operador conjunto composto por uma projeção ortogonal e o operador de Bellman de um processo de decisão auxiliar extit{Markov}.

Fonte: arXiv cs.LG

Evaluation/Benchmarks • Score 96

Ajuste Fino Robusto de Grafos com Prompting Adversarial de Grafos

O método de Ajuste Fino Eficiente em Parâmetros (PEFT) se destacou como um paradigma dominante para adaptar modelos GNN pré-treinados a tarefas específicas. No entanto, métodos PEFT existentes geralmente mostram vulnerabilidades significativas a ruídos e ataques na topologia de grafos e atributos/nomeações de nós. Propomos integrar aprendizado adversarial ao prompting de grafos, desenvolvendo um novo framework de Adversarial Graph Prompting (AGP) para alcançar um ajuste fino robusto.

Fonte: arXiv cs.LG

RL • Score 93

Exploração nos Limites

No problema de identificação do melhor braço (BAI) com confiança fixa, o objetivo é identificar rapidamente a opção ótima enquanto controla a probabilidade de erro abaixo de um limite desejado. Introduzimos uma formulação relaxada que requer controle de erro válido assintoticamente em relação a um tamanho mínimo de amostra, permitindo uma melhor adequação a cenários do mundo real.

Fonte: arXiv cs.LG

Vision • Score 96

Engenharia de Recursos Híbridos Otimizada para Detecção de Arritmias Eficiente em Recursos em Sinais de ECG: Um Framework de Otimização

As doenças cardiovasculares, especialmente as arritmias, continuam a ser uma das principais causas de mortalidade global, exigindo monitoramento contínuo via Internet das Coisas Médicas (IoMT). Este estudo propõe um framework centrado em dados e eficiente em recursos que prioriza a engenharia de características em vez da complexidade, alcançando alta precisão diagnóstica com um modelo leve.

Fonte: arXiv cs.LG

RL • Score 96

Computação de Reservatório Sequencial para Previsão Espacial e Temporal de Alta Dimensionalidade de Forma Eficiente

A previsão de sistemas espaciais e temporais de alta dimensionalidade continua sendo um desafio computacional para redes neurais recorrentes (RNNs) e modelos de memória de longo e curto prazo (LSTM). Introduzimos uma arquitetura de Computação de Reservatório Sequencial (Sequential RC) que decompõe um grande reservatório em uma série de reservatórios menores e interconectados, melhorando a eficiência e reduzindo custos computacionais.

Fonte: arXiv cs.LG

Vision • Score 96

Atribuição de Conteúdo Gerado por IA Desconhecida e Consciente

O avanço rápido de modelos generativos fotorealistas tornou crucial atribuir a origem do conteúdo sintético, passando da detecção binária de real ou falso para identificar o modelo específico que produziu uma imagem. Este estudo investiga a distinção de saídas de um modelo gerador-alvo (ex: OpenAI Dalle 3) em relação a outras fontes.

Fonte: arXiv cs.LG

RL • Score 95

Decomposição Tucker Esparsa e Regularização Gráfica para Previsão de Séries Temporais de Alta Dimensionalidade

Métodos existentes de modelos autorregressivos vetoriais para análise de séries temporais multivariadas utilizam aproximação de matriz de baixa classificação ou decomposição Tucker para reduzir a dimensão do problema de superparametrização. Este artigo propõe um método de decomposição Tucker esparsa com regularização gráfica para séries temporais autorregressivas vetoriais de alta dimensionalidade.

Fonte: arXiv stat.ML

RL • Score 96

Dominação Quântica King-Ring no Xadrez: Uma Abordagem QAOA

O Quantum Approximate Optimization Algorithm (QAOA) é amplamente testado em instâncias aleatórias sintéticas, mas carece de estrutura semântica e interpretabilidade humana. Apresentamos a Dominação Quântica King-Ring (QKRD), um benchmark em escala NISQ derivado de posições táticas de xadrez, oferecendo 5.000 instâncias estruturadas. Usando QKRD, avaliamos escolhas de design do QAOA e mostramos que técnicas informadas por problemas revelam vantagens ocultas em instâncias aleatórias.

Fonte: arXiv cs.LG

RL • Score 93

E-GRPO: Passos de Alta Entropia Impulsionam Aprendizado por Reforço Eficaz para Modelos de Fluxo

O aprendizado por reforço recente aprimorou os modelos de correspondência de fluxo na alinhamento de preferências humanas. Observamos que passos de alta entropia permitem uma exploração mais eficiente, enquanto passos de baixa entropia resultam em roll-outs indistintos. Propomos o E-GRPO, uma otimização de política relativa em grupo consciente da entropia para aumentar a entropia dos passos de amostragem de SDE.

Fonte: arXiv cs.LG

RL • Score 96

Métodos Semânticos Podem Aprimorar Táticas em Esportes Coletivos? Uma Metodologia para Futebol com Aplicações Mais Amplas

Este artigo explora como o raciocínio em espaço semântico, tradicionalmente utilizado em linguística computacional, pode ser estendido à tomada de decisão tática em esportes coletivos. A metodologia proposta modela configurações táticas como estruturas semânticas composicionais, representando cada jogador como um vetor multidimensional que integra atributos técnicos, físicos e psicológicos.

Fonte: arXiv cs.AI

NLP/LLMs • Score 95

CPPO: Contrastive Perception for Vision Language Policy Optimization

arXiv:2601.00501v1 Announce Type: new Abstract: We introduce CPPO, a Contrastive Perception Policy Optimization method for finetuning vision-language models (VLMs). While reinforcement learning (RL) has advanced reasoning in language models, extending it to multimodal reasoning requires improving both the perception and reasoning aspects. Prior works tackle this challenge mainly with explicit perception rewards, but disentangling perception tokens from reasoning tokens is difficult, requiring extra LLMs, ground-truth data, forced separation of perception from reasoning by policy model, or applying rewards indiscriminately to all output tokens. CPPO addresses this problem by detecting perception tokens via entropy shifts in the model outputs under perturbed input images. CPPO then extends the RL objective function with a Contrastive Perception Loss (CPL) that enforces consistency under information-preserving perturbations and sensitivity under information-removing ones. Experiments show that CPPO surpasses previous perception-rewarding methods, while avoiding extra models, making training more efficient and scalable.

Fonte: arXiv cs.CV

Vision • Score 94

Fluxos de Kernel Orientados a Tarefas: Compressão de Classificação de Rótulos e Filtragem Espectral Laplaciana

Apresentamos uma teoria de aprendizado de características em redes amplas regularizadas por L2, mostrando que o aprendizado supervisionado é inerentemente compressivo. Derivamos uma ODE de kernel que prevê uma evolução espectral de 'preenchimento de água' e provamos que, para qualquer estado estacionário estável, a classificação do kernel é limitada pelo número de classes ($C$).

Fonte: arXiv cs.LG

Vision • Score 95

It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models

arXiv:2601.00090v1 Announce Type: new Abstract: Contemporary text-to-image models exhibit a surprising degree of mode collapse, as can be seen when sampling several images given the same text prompt. While previous work has attempted to address this issue by steering the model using guidance mechanisms, or by generating a large pool of candidates and refining them, in this work we take a different direction and aim for diversity in generations via noise optimization. Specifically, we show that a simple noise optimization objective can mitigate mode collapse while preserving the fidelity of the base model. We also analyze the frequency characteristics of the noise and show that alternative noise initializations with different frequency profiles can improve both optimization and search. Our experiments demonstrate that noise optimization yields superior results in terms of generation quality and variety.

Fonte: arXiv cs.CV

NLP/LLMs • Score 95

Toward Better Temporal Structures for Geopolitical Events Forecasting

arXiv:2601.00430v1 Announce Type: new Abstract: Forecasting on geopolitical temporal knowledge graphs (TKGs) through the lens of large language models (LLMs) has recently gained traction. While TKGs and their generalization, hyper-relational temporal knowledge graphs (HTKGs), offer a straightforward structure to represent simple temporal relationships, they lack the expressive power to convey complex facts efficiently. One of the critical limitations of HTKGs is a lack of support for more than two primary entities in temporal facts, which commonly occur in real-world events. To address this limitation, in this work, we study a generalization of HTKGs, Hyper-Relational Temporal Knowledge Generalized Hypergraphs (HTKGHs). We first derive a formalization for HTKGHs, demonstrating their backward compatibility while supporting two complex types of facts commonly found in geopolitical incidents. Then, utilizing this formalization, we introduce the htkgh-polecat dataset, built upon the global event database POLECAT. Finally, we benchmark and analyze popular LLMs on the relation prediction task, providing insights into their adaptability and capabilities in complex forecasting scenarios.

Fonte: arXiv cs.CL

NLP/LLMs • Score 95

Memory Bank Compression for Continual Adaptation of Large Language Models

arXiv:2601.00756v1 Announce Type: cross Abstract: Large Language Models (LLMs) have become a mainstay for many everyday applications. However, as data evolve their knowledge quickly becomes outdated. Continual learning aims to update LLMs with new information without erasing previously acquired knowledge. Although methods such as full fine-tuning can incorporate new data, they are computationally expensive and prone to catastrophic forgetting, where prior knowledge is overwritten. Memory-augmented approaches address this by equipping LLMs with a memory bank, that is an external memory module which stores information for future use. However, these methods face a critical limitation, in particular, the memory bank constantly grows in the real-world scenario when large-scale data streams arrive. In this paper, we propose MBC, a model that compresses the memory bank through a codebook optimization strategy during online adaptation learning. To ensure stable learning, we also introduce an online resetting mechanism that prevents codebook collapse. In addition, we employ Key-Value Low-Rank Adaptation in the attention layers of the LLM, enabling efficient utilization of the compressed memory representations. Experiments with benchmark question-answering datasets demonstrate that MBC reduces the memory bank size to 0.3% when compared against the most competitive baseline, while maintaining high retention accuracy during online adaptation learning. Our code is publicly available at https://github.com/Thomkat/MBC.

Fonte: arXiv cs.CL

Vision • Score 95

ReMA: A Training-Free Plug-and-Play Mixing Augmentation for Video Behavior Recognition

arXiv:2601.00311v1 Announce Type: new Abstract: Video behavior recognition demands stable and discriminative representations under complex spatiotemporal variations. However, prevailing data augmentation strategies for videos remain largely perturbation-driven, often introducing uncontrolled variations that amplify non-discriminative factors, which finally weaken intra-class distributional structure and representation drift with inconsistent gains across temporal scales. To address these problems, we propose Representation-aware Mixing Augmentation (ReMA), a plug-and-play augmentation strategy that formulates mixing as a controlled replacement process to expand representations while preserving class-conditional stability. ReMA integrates two complementary mechanisms. Firstly, the Representation Alignment Mechanism (RAM) performs structured intra-class mixing under distributional alignment constraints, suppressing irrelevant intra-class drift while enhancing statistical reliability. Then, the Dynamic Selection Mechanism (DSM) generates motion-aware spatiotemporal masks to localize perturbations, guiding them away from discrimination-sensitive regions and promoting temporal coherence. By jointly controlling how and where mixing is applied, ReMA improves representation robustness without additional supervision or trainable parameters. Extensive experiments on diverse video behavior benchmarks demonstrate that ReMA consistently enhances generalization and robustness across different spatiotemporal granularities.

Fonte: arXiv cs.CV

Theory/Optimization • Score 89

Do Aprendizado Contínuo ao SGD e Retorno: Melhores Taxas para Modelos Lineares Contínuos

Neste estudo, analisamos o cenário comum de aprendizado contínuo, onde um modelo superparametrizado é ajustado sequencialmente a um conjunto de tarefas realizáveis em conjunto. Provamos que o ajuste de uma tarefa é equivalente a um único passo de stochastic gradient descent (SGD) em um objetivo modificado, estabelecendo novas taxas de esquecimento universais.

Fonte: arXiv stat.ML

NLP/LLMs • Score 92

Physio-DPO: Aligning Large Language Models with the Protein Energy Landscape to Eliminate Structural Hallucinations

arXiv:2601.00647v1 Announce Type: new Abstract: Large Protein Language Models have shown strong potential for generative protein design, yet they frequently produce structural hallucinations, generating sequences with high linguistic likelihood that fold into thermodynamically unstable conformations. Existing alignment approaches such as Direct Preference Optimization are limited in this setting, as they model preferences as binary labels and ignore the continuous structure of the physical energy landscape. We propose Physio-DPO, a physics informed alignment framework that grounds protein language models in thermodynamic stability. Physio-DPO introduces a magnitude aware objective that scales optimization updates according to the energy gap between native structures and physics perturbed hard negatives. Experiments show that Physio-DPO consistently outperforms strong baselines including SFT, PPO, and standard DPO, reducing self consistency RMSD to 1.28 \AA\ and increasing foldability to 92.8%. Qualitative analysis further demonstrates that Physio-DPO effectively mitigates structural hallucinations by recovering biophysical interactions such as hydrophobic core packing and hydrogen bond networks.

Fonte: arXiv cs.CL

RL • Score 96

Colocação Ótima de Táxis Consciente do Tráfego Usando Aprendizado por Reforço Baseado em Redes Neurais Gráficas

No contexto do transporte em cidades inteligentes, o emparelhamento eficiente da oferta de táxis com a demanda de passageiros requer a integração em tempo real de dados da rede de tráfego urbano e padrões de mobilidade. Este artigo apresenta um framework de aprendizado por reforço (RL) baseado em grafos para a colocação ótima de táxis em ambientes metropolitanos.

Fonte: arXiv cs.LG

RL • Score 96

Uma abordagem multi-algoritmo para o balanceamento da carga de trabalho operacional de recursos humanos em um sistema de entrega urbana de última milha

A atribuição eficiente de carga de trabalho à força de trabalho é crucial em sistemas de entrega de pacotes de última milha. Este artigo aborda o problema do balanceamento da carga de trabalho operacional em sistemas de entrega urbana, propondo uma abordagem multi-algoritmo que otimiza o tempo de entrega e garante uma distribuição equilibrada da carga de trabalho entre os trabalhadores.

Fonte: arXiv cs.AI

RL • Score 96

Amostras Adversariais Não São Criadas Iguais

No último década, diversas teorias foram propostas para explicar a vulnerabilidade generalizada das redes neurais profundas a ataques de evasão adversariais. Este trabalho defende que amostras que utilizam características frágeis, mas preditivas, e aquelas que não utilizam, representam dois tipos de fraquezas adversariais e devem ser diferenciadas na avaliação da robustez adversarial.

Fonte: arXiv cs.LG

NLP/LLMs • Score 96

HFedMoE: Aprendizado Federado Heterogêneo Consciente de Recursos com Mixture-of-Experts

Embora o aprendizado federado (FL) permita o ajuste fino de grandes modelos de linguagem (LLMs) sem comprometer a privacidade dos dados, o tamanho substancial de um LLM torna o treinamento em dispositivos impraticável para clientes com recursos limitados, como dispositivos móveis. Modelos Mixture-of-Experts (MoE) surgiram como uma solução eficiente em termos de computação, ativando apenas um subconjunto esparso de especialistas durante o treinamento do modelo.

Fonte: arXiv cs.LG

Theory/Optimization • Score 93

Otimização Bi-objetiva Guiada por Interpretabilidade: Alinhando Precisão e Explicabilidade

Este artigo apresenta a Otimização Bi-objetiva Guiada por Interpretabilidade (IGBO), um framework que treina modelos interpretáveis incorporando conhecimento de domínio estruturado por meio de uma formulação bi-objetiva. O IGBO codifica hierarquias de importância de características como um Grafo Acíclico Direcionado (DAG) e utiliza Gradientes Integrados Temporais (TIG) para medir a importância das características.

Fonte: arXiv cs.LG

MLOps/Systems • Score 96

Avatar Forcing: Geração Interativa de Avatares de Cabeça em Tempo Real para Conversação Natural

A geração de cabeças falantes cria avatares realistas a partir de retratos estáticos para comunicação virtual e criação de conteúdo. No entanto, modelos atuais não transmitem a sensação de comunicação verdadeiramente interativa, gerando respostas unidimensionais que carecem de engajamento emocional. Propomos o Avatar Forcing, um novo framework para geração de avatares que modela interações em tempo real entre usuários e avatares através de difusão forçada.

Fonte: arXiv cs.LG

Theory/Optimization • Score 92

Rede Neural de Entrada Esparsa usando Regularização Côncava em Grupo

Neste artigo, investigamos o problema da seleção de características em redes neurais, propondo um framework de redes neurais de entrada esparsa com regularização côncava em grupo. Este método visa superar a seleção de variáveis irrelevantes, utilizando uma penalização côncava adequada na norma $l_2$ dos pesos, resultando em um modelo que utiliza apenas um subconjunto reduzido das variáveis originais.

Fonte: arXiv stat.ML

Vision • Score 95

Estimativa de densidade espectral de séries temporais funcionais em grandes domínios usando deep learning

Derivamos um estimador da densidade espectral de uma série temporal funcional que é a saída de uma rede neural multilayer perceptron. O estimador é motivado por dificuldades na computação de estimadores de densidade espectral existentes para séries temporais de funções definidas em grades muito grandes, como em modelos climáticos e exames médicos.

Fonte: arXiv stat.ML

NLP/LLMs • Score 95

TotalFM: An Organ-Separated Framework for 3D-CT Vision Foundation Models

arXiv:2601.00260v1 Announce Type: new Abstract: While foundation models in radiology are expected to be applied to various clinical tasks, computational cost constraints remain a major challenge when training on 3D-CT volumetric data. In this study, we propose TotalFM, a radiological foundation model that efficiently learns the correspondence between 3D-CT images and linguistic expressions based on the concept of organ separation, utilizing a large-scale dataset of 140,000 series. By automating the creation of organ volume and finding-sentence pairs through segmentation techniques and Large Language Model (LLM)-based radiology report processing, and by combining self-supervised pre-training via VideoMAE with contrastive learning using volume-text pairs, we aimed to balance computational efficiency and representation capability. In zero-shot organ-wise lesion classification tasks, the proposed model achieved higher F1 scores in 83% (5/6) of organs compared to CT-CLIP and 64% (9/14) of organs compared to Merlin. These results suggest that the proposed model exhibits high generalization performance in a clinical evaluation setting using actual radiology report sentences. Furthermore, in zero-shot finding-wise lesion classification tasks, our model achieved a higher AUROC in 83% (25/30) of finding categories compared to Merlin. We also confirmed performance comparable to existing Vision-Language Models (VLMs) in radiology report generation tasks. Our results demonstrate that the organ-separated learning framework can serve as a realistic and effective design guideline for the practical implementation of 3D-CT foundation models.

Fonte: arXiv cs.CV

RL • Score 95

Integração de Multi-Armed Bandit, Aprendizado Ativo e Computação Distribuída para Otimização Escalável

Problemas modernos de otimização em domínios científicos e de engenharia frequentemente dependem de avaliações black-box caras. Propomos o ALMAB-DC, um framework modular e unificado para otimização black-box escalável que integra aprendizado ativo, multi-armed bandits e computação distribuída, com aceleração opcional por GPU. Resultados empíricos mostram que ALMAB-DC supera consistentemente otimizadores black-box de última geração.

Fonte: arXiv stat.ML

Vision • Score 95

Towards Syn-to-Real IQA: A Novel Perspective on Reshaping Synthetic Data Distributions

arXiv:2601.00225v1 Announce Type: new Abstract: Blind Image Quality Assessment (BIQA) has advanced significantly through deep learning, but the scarcity of large-scale labeled datasets remains a challenge. While synthetic data offers a promising solution, models trained on existing synthetic datasets often show limited generalization ability. In this work, we make a key observation that representations learned from synthetic datasets often exhibit a discrete and clustered pattern that hinders regression performance: features of high-quality images cluster around reference images, while those of low-quality images cluster based on distortion types. Our analysis reveals that this issue stems from the distribution of synthetic data rather than model architecture. Consequently, we introduce a novel framework SynDR-IQA, which reshapes synthetic data distribution to enhance BIQA generalization. Based on theoretical derivations of sample diversity and redundancy's impact on generalization error, SynDR-IQA employs two strategies: distribution-aware diverse content upsampling, which enhances visual diversity while preserving content distribution, and density-aware redundant cluster downsampling, which balances samples by reducing the density of densely clustered areas. Extensive experiments across three cross-dataset settings (synthetic-to-authentic, synthetic-to-algorithmic, and synthetic-to-synthetic) demonstrate the effectiveness of our method. The code is available at https://github.com/Li-aobo/SynDR-IQA.

Fonte: arXiv cs.CV

Vision • Score 95

NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

arXiv:2601.00393v1 Announce Type: new Abstract: In this paper, we propose NeoVerse, a versatile 4D world model that is capable of 4D reconstruction, novel-trajectory video generation, and rich downstream applications. We first identify a common limitation of scalability in current 4D world modeling methods, caused either by expensive and specialized multi-view 4D data or by cumbersome training pre-processing. In contrast, our NeoVerse is built upon a core philosophy that makes the full pipeline scalable to diverse in-the-wild monocular videos. Specifically, NeoVerse features pose-free feed-forward 4D reconstruction, online monocular degradation pattern simulation, and other well-aligned techniques. These designs empower NeoVerse with versatility and generalization to various domains. Meanwhile, NeoVerse achieves state-of-the-art performance in standard reconstruction and generation benchmarks. Our project page is available at https://neoverse-4d.github.io

Fonte: arXiv cs.CV

RL • Score 96

Rumo a uma Teoria Física da Inteligência

Apresentamos uma teoria física da inteligência fundamentada no processamento irreversível de informações em sistemas sujeitos a leis de conservação. Um sistema inteligente é modelado como um processo acoplado agente-ambiente, cuja evolução transforma informações em trabalho direcionado a objetivos. Introduzimos o framework Conservation-Congruent Encoding (CCE) para conectar informações ao estado físico.

Fonte: arXiv cs.AI

Theory/Optimization • Score 89

Pesquisa sobre Newsvendor Orientado a Dados: Análise Unificada e Espectro de Arrependimentos Alcançáveis

No problema do Newsvendor, o objetivo é adivinhar o número que será retirado de uma distribuição, com consequências assimétricas para palpites altos ou baixos. Esta pesquisa analisa variantes do Newsvendor orientado a dados, preenchendo lacunas na literatura e simplificando provas, e mostra que o espectro de arrependimentos entre $1/ ext{sqrt}{n}$ e $1/n$ pode ser alcançado.

Fonte: arXiv stat.ML

Vision • Score 96

EIA-SEC: Framework Melhorado de Actor-Critic para Controle Colaborativo de Multi-UAV na Agricultura Inteligente

A aplicação generalizada da tecnologia de comunicação sem fio tem promovido o desenvolvimento da agricultura inteligente, onde veículos aéreos não tripulados (UAVs) desempenham um papel multifuncional. Neste trabalho, modelamos um processo de decisão de Markov para resolver o problema de planejamento de trajetória de multi-UAV e propomos o novo framework Elite Imitation Actor-Shared Ensemble Critic (EIA-SEC). Resultados experimentais mostram que o EIA-SEC supera as referências de ponta em desempenho de recompensa, estabilidade de treinamento e velocidade de convergência.

Fonte: arXiv cs.LG

Vision • Score 96

Aprendizado de Cláusulas Direcionado por Conflito com Heurísticas VSIDS para Layout Discreto de Instalações

Este artigo estuda o uso de Aprendizado de Cláusulas Direcionado por Conflito (CDCL) com heurísticas VSIDS como um motor computacional para problemas de layout discreto de instalações. O problema de layout é modelado como um problema de atribuição combinatória com uma estrutura lógica densa, resultante de restrições de adjacência, separação e disponibilidade de slots.

Fonte: arXiv cs.AI

RL • Score 96

Explicações Fiéis e Estáveis de Neurônios para Interpretabilidade Mecanística Confiável

A identificação de neurônios é uma ferramenta popular na interpretabilidade mecanística, visando descobrir os conceitos interpretáveis por humanos representados por neurônios individuais em redes profundas. Embora algoritmos como Network Dissection e CLIP-Dissect tenham alcançado grande sucesso empírico, uma base teórica rigorosa ainda está ausente, o que é crucial para permitir explicações confiáveis. Este trabalho apresenta a primeira análise teórica de desafios fundamentais relacionados à fidelidade e estabilidade das explicações de neurônios.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

MSC-180: Um Benchmark para Prova Formal Automatizada de Teoremas a partir da Classificação de Assuntos Matemáticos

O Automated Theorem Proving (ATP) é uma direção de pesquisa central em inteligência artificial para alcançar raciocínio formal e verificação. Propomos o MSC-180, um benchmark baseado na classificação de assuntos matemáticos MSC2020, que compreende 180 problemas de verificação formal, abrangendo níveis de graduação e pós-graduação, para avaliar e impulsionar o desenvolvimento de sistemas de IA com habilidades genuínas de raciocínio matemático.

Fonte: arXiv cs.AI

Evaluation/Benchmarks • Score 93

Otimização de Roteamento de Atribuição: Solucionadores para Problemas Sob Restrições

Estudamos o problema de Roteamento-Atribuição Conjunto (JRA), onde itens devem ser atribuídos um a um a espaços reservados, enquanto determinamos simultaneamente um ciclo Hamiltoniano que visita todos os nós exatamente uma vez. Desenvolvemos um solucionador adaptado para cenários práticos de planejamento de embalagem com restrições mais ricas.

Fonte: arXiv cs.AI

Vision • Score 93

Few-Shot Learning de um Modelo de Rede Neural Baseado em Grafo Sem Retropropagação

Propondo uma abordagem estrutural-gráfica para classificar imagens de contorno em um regime de few-shot sem retropropagação, este trabalho apresenta um modelo onde a estrutura é a portadora de explicações. A imagem é codificada como um grafo atribuído, e a generalização é alcançada por meio da formação de atratores de conceito.

Fonte: arXiv cs.AI

Evaluation/Benchmarks • Score 96

Aprendizado por Transferência Baseado em Clustering para Algoritmo Evolutivo Multimodal Multiobjetivo Dinâmico

A otimização multimodal multiobjetivo dinâmica enfrenta o desafio de rastrear simultaneamente múltiplos conjuntos pareto ótimos equivalentes e manter a diversidade populacional em ambientes variáveis. Este artigo apresenta um novo conjunto de funções de teste e um algoritmo inovador baseado em um mecanismo de resposta dinâmica de Clustering-based Autoencoder, visando melhorar a diversidade e a convergência em algoritmos evolutivos.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

Population-Evolve: um Método de Amostragem Paralela e Evolutiva para Raciocínio Matemático em LLMs

O escalonamento em tempo de teste surgiu como uma direção promissora para aprimorar as capacidades de raciocínio de Large Language Models nos últimos anos. Neste trabalho, propomos o Population-Evolve, um método livre de treinamento inspirado em Algoritmos Genéticos para otimizar o raciocínio em LLMs, mantendo uma população dinâmica de soluções candidatas.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

Gabliteration: Modificação Adaptativa de Pesos Neurais Multi-Direcional para Alteração Comportamental Seletiva em Grandes Modelos de Linguagem

Apresentamos o Gabliteration, uma nova técnica de modificação de pesos neurais que avança além dos métodos tradicionais de abliteration, implementando projeções multi-direcionais adaptativas com seleção de camadas regularizada. Nossa abordagem supera limitações fundamentais dos métodos existentes, mantendo a qualidade do modelo ao modificar padrões comportamentais específicos.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

Compreendendo a Cadeia de Pensamento em Grandes Modelos de Linguagem via Análise de Dados Topológicos

Com o desenvolvimento de grandes modelos de linguagem (LLMs) e a introdução da técnica de cadeia de raciocínio longa, a capacidade de raciocínio dos LLMs em resolução de problemas complexos foi significativamente aprimorada. Este trabalho analisa a qualidade da cadeia de raciocínio a partir de uma perspectiva estrutural, utilizando homologia persistente da Análise de Dados Topológicos (TDA) para mapear etapas de raciocínio e extrair características topológicas.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

Vibe Reasoning: Extraindo Capacidades Matemáticas de IA de Fronteira -- Um Estudo de Caso sobre o Problema 6 do IMO 2025

Apresentamos o Vibe Reasoning, um paradigma colaborativo entre humanos e IA para resolver problemas matemáticos complexos. Nossa principal percepção é que modelos de IA de fronteira já possuem o conhecimento necessário para resolver problemas desafiadores, mas não sabem como, o que ou quando aplicá-lo. Este trabalho demonstra essa abordagem através do Problema 6 do IMO 2025.

Fonte: arXiv cs.AI

RL • Score 96

SD2AIL: Aprendizado por Imitação Adversarial a partir de Demonstrações Sintéticas via Modelos de Difusão

O Aprendizado por Imitação Adversarial (AIL) é um framework dominante que infere recompensas a partir de demonstrações de especialistas para guiar a otimização de políticas. Inspirados pelo sucesso dos modelos de difusão, propomos o SD2AIL, que utiliza demonstrações sintéticas para aumentar as demonstrações de especialistas, introduzindo também uma estratégia de replay priorizado para maximizar a eficácia das demonstrações.

Fonte: arXiv cs.LG

RL • Score 96

AL-GNN: Aprendizado Contínuo de Grafos Preservando a Privacidade e Livre de Replay via Aprendizado Analítico

O aprendizado contínuo de grafos (CGL) permite que redes neurais de grafos aprendam incrementalmente a partir de dados estruturados em grafos sem esquecer o conhecimento previamente adquirido. O AL-GNN é um novo framework que elimina a necessidade de retropropagação e buffers de replay, utilizando princípios da teoria do aprendizado analítico para otimizar o aprendizado.

Fonte: arXiv cs.LG

RL • Score 95

Stochastic Optimization with Optimal Importance Sampling

arXiv:2504.03560v2 Announce Type: replace-cross Abstract: Importance Sampling (IS) is a widely used variance reduction technique for enhancing the efficiency of Monte Carlo methods, particularly in rare-event simulation and related applications. Despite its effectiveness, the performance of IS is highly sensitive to the choice of the proposal distribution and often requires stochastic calibration. While the design and analysis of IS have been extensively studied in estimation settings, applying IS within stochastic optimization introduces a fundamental challenge: the decision variable and the importance sampling distribution are mutually dependent, creating a circular optimization structure. This interdependence complicates both convergence analysis and variance control. We consider convex stochastic optimization problems with linear constraints and propose a single-loop stochastic approximation algorithm, based on a joint variant of Nesterov's dual averaging, that jointly updates the decision variable and the importance sampling distribution, without time-scale separation or nested optimization. The method is globally convergent and achieves minimal asymptotic variance among stochastic gradient schemes, matching the performance of an oracle sampler adapted to the optimal solution.

Fonte: arXiv stat.ML

MLOps/Systems • Score 96

Mecanismos de Memória Dependentes de Modalidade em Computação Neuromórfica Cross-Modal

As redes neurais spiking (SNNs) com memória prometem computação neuromórfica energeticamente eficiente, mas sua generalização entre modalidades sensoriais permanece inexplorada. Apresentamos o primeiro estudo abrangente de ablação cross-modal dos mecanismos de memória em SNNs, avaliando redes de Hopfield, Redes Recorrentes Gated Hierárquicas (HGRNs) e aprendizado contrastivo supervisionado (SCL) em conjuntos de dados neuromórficos visuais (N-MNIST) e auditivos (SHD).

Fonte: arXiv cs.LG

Theory/Optimization • Score 93

O Gargalo de Interação das Redes Neurais Profundas: Descoberta, Prova e Modulação

Compreender que tipos de estruturas cooperativas as redes neurais profundas (DNNs) podem representar continua sendo um problema fundamental, mas insuficientemente compreendido. Este trabalho investiga como as DNNs codificam interações sob diferentes níveis de complexidade contextual e como esses padrões de interação microscópica moldam a capacidade de representação macroscópica.

Fonte: arXiv cs.LG

NLP/LLMs • Score 95

Research on a hybrid LSTM-CNN-Attention model for text-based web content classification

arXiv:2512.18475v1 Announce Type: new Abstract: This study presents a hybrid deep learning architecture that integrates LSTM, CNN, and an Attention mechanism to enhance the classification of web content based on text. Pretrained GloVe embeddings are used to represent words as dense vectors that preserve semantic similarity. The CNN layer extracts local n-gram patterns and lexical features, while the LSTM layer models long-range dependencies and sequential structure. The integrated Attention mechanism enables the model to focus selectively on the most informative parts of the input sequence. A 5-fold cross-validation setup was used to assess the robustness and generalizability of the proposed solution. Experimental results show that the hybrid LSTM-CNN-Attention model achieved outstanding performance, with an accuracy of 0.98, precision of 0.94, recall of 0.92, and F1-score of 0.93. These results surpass the performance of baseline models based solely on CNNs, LSTMs, or transformer-based classifiers such as BERT. The combination of neural network components enabled the model to effectively capture both fine-grained text structures and broader semantic context. Furthermore, the use of GloVe embeddings provided an efficient and effective representation of textual data, making the model suitable for integration into systems with real-time or near-real-time requirements. The proposed hybrid architecture demonstrates high effectiveness in text-based web content classification, particularly in tasks requiring both syntactic feature extraction and semantic interpretation. By combining presented mechanisms, the model addresses the limitations of individual architectures and achieves improved generalization. These findings support the broader use of hybrid deep learning approaches in NLP applications, especially where complex, unstructured textual data must be processed and classified with high reliability.

Fonte: arXiv cs.CL

Theory/Optimization • Score 89

Treinamento Ótimo de Fonte é Subótimo para Transferência

Provamos que treinar um modelo de fonte de forma ótima para sua própria tarefa é genericamente subótimo quando o objetivo é a transferência a montante. Estudamos o problema de otimização do lado da fonte na regressão ridge L2-SP e mostramos um desajuste fundamental entre a regularização ótima para a fonte e a regularização ótima para a transferência.

Fonte: arXiv stat.ML

NLP/LLMs • Score 95

Formulação Automática de Problemas Independente de Solver via LLMs para Design Orientado a Simulação de Alto Custo

No domínio de design orientado a simulação de alto custo, traduzir requisitos de design ambíguos em uma formulação de otimização matemática é um gargalo para otimizar o desempenho do produto. Propomos o APF, um framework para formulação automatizada de problemas independente de solver via LLMs, que converte automaticamente os requisitos em linguagem natural dos engenheiros em modelos de otimização executáveis.

Fonte: arXiv cs.CL

Theory/Optimization • Score 89

Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences

arXiv:2512.13123v3 Announce Type: replace-cross Abstract: We study stopping rules for stochastic gradient descent (SGD) for convex optimization from the perspective of anytime-valid confidence sequences. Classical analyses of SGD provide convergence guarantees in expectation or at a fixed horizon, but offer no statistically valid way to assess, at an arbitrary time, how close the current iterate is to the optimum. We develop an anytime-valid, data-dependent upper confidence sequence for the weighted average suboptimality of projected SGD, constructed via nonnegative supermartingales and requiring no smoothness or strong convexity. This confidence sequence yields a simple stopping rule that is provably $\varepsilon$-optimal with probability at least $1-\alpha$, with explicit bounds on the stopping time under standard stochastic approximation stepsizes. To the best of our knowledge, these are the first rigorous, time-uniform performance guarantees and finite-time $\varepsilon$-optimality certificates for projected SGD with general convex objectives, based solely on observable trajectory quantities.

Fonte: arXiv stat.ML

RL • Score 96

LeJOT: Uma Solução Inteligente de Orquestração de Custos de Trabalho para a Plataforma Databricks

Com os avanços rápidos em tecnologias de big data, a plataforma Databricks tornou-se fundamental para empresas e instituições de pesquisa. No entanto, gerenciar os custos operacionais crescentes associados à execução de trabalhos é um desafio crítico. Apresentamos o LeJOT, um framework de orquestração de custos de trabalho que utiliza machine learning para previsão de tempo de execução e um modelo de otimização baseado em solver para alocação de recursos em tempo real.

Fonte: arXiv cs.LG

MLOps/Systems • Score 96

A Cama Procrusteana das Séries Temporais: O Viés de Otimização da Função de Perda Pontual

Otimizar modelos de séries temporais por meio de funções de perda pontuais (por exemplo, MSE) baseando-se em uma suposição falha de independência e distribuição idêntica pontual (i.i.d.) que desconsidera a estrutura temporal causal. Este artigo analisa o Expectation of Optimization Bias (EOB) e revela que quanto mais determinística e estruturada a série temporal, mais severo é o viés causado pela função de perda pontual.

Fonte: arXiv cs.LG

RL • Score 92

Por que a Maioria dos Algoritmos de Bandit de Otimismo Tem a Mesma Análise de Arrependimento: Um Teorema Unificador Simples

Vários algoritmos de bandit estocásticos baseados em otimismo -- incluindo UCB, UCB-V, linear UCB e GP-UCB de braço finito -- alcançam arrependimento logarítmico usando provas que, apesar de diferenças superficiais, seguem essencialmente a mesma estrutura. Este artigo isola os ingredientes mínimos por trás dessas análises.

Fonte: arXiv cs.LG

RL • Score 95

ReGal: A First Look at PPO-based Legal AI for Judgment Prediction and Summarization in India

arXiv:2512.18014v1 Announce Type: new Abstract: This paper presents an early exploration of reinforcement learning methodologies for legal AI in the Indian context. We introduce Reinforcement Learning-based Legal Reasoning (ReGal), a framework that integrates Multi-Task Instruction Tuning with Reinforcement Learning from AI Feedback (RLAIF) using Proximal Policy Optimization (PPO). Our approach is evaluated across two critical legal tasks: (i) Court Judgment Prediction and Explanation (CJPE), and (ii) Legal Document Summarization. Although the framework underperforms on standard evaluation metrics compared to supervised and proprietary models, it provides valuable insights into the challenges of applying RL to legal texts. These challenges include reward model alignment, legal language complexity, and domain-specific adaptation. Through empirical and qualitative analysis, we demonstrate how RL can be repurposed for high-stakes, long-document tasks in law. Our findings establish a foundation for future work on optimizing legal reasoning pipelines using reinforcement learning, with broader implications for building interpretable and adaptive legal AI systems.

Fonte: arXiv cs.CL

NLP/LLMs • Score 95

Learning to Prioritize IT Tickets: A Comparative Evaluation of Embedding-based Approaches and Fine-Tuned Transformer Models

arXiv:2512.17916v1 Announce Type: new Abstract: Prioritizing service tickets in IT Service Management (ITSM) is critical for operational efficiency but remains challenging due to noisy textual inputs, subjective writing styles, and pronounced class imbalance. We evaluate two families of approaches for ticket prioritization: embedding-based pipelines that combine dimensionality reduction, clustering, and classical classifiers, and a fine-tuned multilingual transformer that processes both textual and numerical features. Embedding-based methods exhibit limited generalization across a wide range of thirty configurations, with clustering failing to uncover meaningful structures and supervised models highly sensitive to embedding quality. In contrast, the proposed transformer model achieves substantially higher performance, with an average F1-score of 78.5% and weighted Cohen's kappa values of nearly 0.80, indicating strong alignment with true labels. These results highlight the limitations of generic embeddings for ITSM data and demonstrate the effectiveness of domain-adapted transformer architectures for operational ticket prioritization.

Fonte: arXiv cs.CL

NLP/LLMs • Score 95

Rumo a Agentes Eficientes: Um Co-Design da Arquitetura de Inferência e do Sistema

O rápido desenvolvimento de agentes baseados em large language models (LLMs) abriu novas possibilidades para raciocínio autônomo em múltiplas interações e tomada de decisão com ferramentas. No entanto, sua implementação no mundo real é dificultada por ineficiências severas que surgem não da inferência isolada do modelo, mas da latência sistêmica acumulada ao longo dos ciclos de raciocínio, crescimento de contexto e interações heterogêneas com ferramentas.

Fonte: arXiv cs.CL

RL • Score 96

ARC: Aproveitando Representações Composicionais para Aprendizado entre Problemas em VRPs

Os Problemas de Roteamento de Veículos (VRPs) com atributos diversos do mundo real têm gerado interesse recente em abordagens de aprendizado entre problemas que generalizam eficientemente entre variantes. Propomos o ARC (Attribute Representation via Compositional Learning), um framework de aprendizado entre problemas que aprende representações de atributos desentranhadas, decompondo-as em dois componentes complementares.

Fonte: arXiv cs.LG

RL • Score 96

Aprendizado por Reforço Contínuo Guiado por Demonstração em Ambientes Dinâmicos

O aprendizado por reforço (RL) se destaca em várias aplicações, mas enfrenta dificuldades em ambientes dinâmicos. O aprendizado por reforço contínuo (CRL) permite que agentes de RL aprendam e se adaptem continuamente, mas equilibrar estabilidade e plasticidade continua sendo um desafio. Propomos o aprendizado por reforço contínuo guiado por demonstração (DGCRL), que utiliza um repositório de demonstrações externas para guiar a exploração e adaptação do RL.

Fonte: arXiv cs.LG

NLP/LLMs • Score 96

Aprendizado por Reforço Estável e Eficiente com Um Único Rollout para Raciocínio Multimodal

O Aprendizado por Reforço com Recompensas Verificáveis (RLVR) se tornou um paradigma chave para melhorar as capacidades de raciocínio de Modelos de Linguagem Multimodal de Grande Escala (MLLMs). Introduzimos o $ extbf{MSSR}$ (Multimodal Stabilized Single-Rollout), um framework RLVR livre de grupos que alcança otimização estável e desempenho eficaz em raciocínio multimodal.

Fonte: arXiv cs.LG

RL • Score 95

Teorema Central do Limite para médias ergódicas de cadeias de Markov e a comparação de algoritmos de amostragem para distribuições com cauda pesada

Estabelecer teoremas do limite central (CLTs) para médias ergódicas de cadeias de Markov é um problema fundamental em probabilidade e suas aplicações. Este artigo fornece condições necessárias verificáveis para CLTs de médias ergódicas em espaços de estados gerais, com foco em condições de drift que também oferecem limites inferiores para as taxas de convergência à estacionaridade.

Fonte: arXiv stat.ML

Theory/Optimization • Score 89

CTTA-T: Continual Test-Time Adaptation for Text Understanding via Teacher-Student with a Domain-aware and Generalized Teacher

arXiv:2512.18321v1 Announce Type: new Abstract: Text understanding often suffers from domain shifts. To handle testing domains, domain adaptation (DA) is trained to adapt to a fixed and observed testing domain; a more challenging paradigm, test-time adaptation (TTA), cannot access the testing domain during training and online adapts to the testing samples during testing, where the samples are from a fixed domain. We aim to explore a more practical and underexplored scenario, continual test-time adaptation (CTTA) for text understanding, which involves a sequence of testing (unobserved) domains in testing. Current CTTA methods struggle in reducing error accumulation over domains and enhancing generalization to handle unobserved domains: 1) Noise-filtering reduces accumulated errors but discards useful information, and 2) accumulating historical domains enhances generalization, but it is hard to achieve adaptive accumulation. In this paper, we propose a CTTA-T (continual test-time adaptation for text understanding) framework adaptable to evolving target domains: it adopts a teacher-student framework, where the teacher is domain-aware and generalized for evolving domains. To improve teacher predictions, we propose a refine-then-filter based on dropout-driven consistency, which calibrates predictions and removes unreliable guidance. For the adaptation-generalization trade-off, we construct a domain-aware teacher by dynamically accumulating cross-domain semantics via incremental PCA, which continuously tracks domain shifts. Experiments show CTTA-T excels baselines.

Fonte: arXiv cs.CL

Theory/Optimization • Score 92

Garantindo Robustez de Calibração na Predição Conformal Dividida Sob Ataques Adversariais

A predição conformal (CP) oferece garantias de cobertura de amostra finita e independente da distribuição, mas depende criticamente da intercambiabilidade, uma condição frequentemente violada sob mudança de distribuição. Estudamos a robustez da predição conformal dividida sob perturbações adversariais durante o teste, focando na validade da cobertura e no tamanho do conjunto de predição resultante.

Fonte: arXiv stat.ML

NLP/LLMs • Score 96

De Atalho a Cabeça de Indução: Como a Diversidade de Dados Molda a Seleção de Algoritmos em Transformers

Transformers podem implementar tanto algoritmos generalizáveis (ex: induction heads) quanto atalhos posicionais simples (ex: memorização de posições de saída fixas). Neste trabalho, estudamos como a escolha da distribuição de dados de pré-treinamento direciona um transformer raso para um comportamento ou outro, analisando o treinamento baseado em gradiente de um transformer de camada única.

Fonte: arXiv cs.LG

NLP/LLMs • Score 96

Repensando a Inteligência Multi-Agente Através da Lente de Redes de Pequeno Mundo

Modelos de linguagem grandes (LLMs) possibilitaram sistemas multi-agente (MAS) onde múltiplos agentes argumentam, criticam e coordenam para resolver tarefas complexas, tornando a topologia de comunicação uma escolha de design fundamental. Neste trabalho, revisitamos a teoria clássica sobre redes de pequeno mundo (SW) e investigamos como a conectividade SW pode ser utilizada como um princípio de design para MAS.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

NEURO-GUARD: Generalização Neuro-Simbólica e Roteamento Adaptativo Imparcial para Diagnósticos -- IA Médica Explicável

O diagnóstico baseado em imagem, preciso e interpretável, continua sendo um desafio central na IA médica, especialmente em ambientes com dados limitados e decisões clínicas críticas. Apresentamos o NEURO-GUARD, um novo framework guiado por conhecimento que integra Vision Transformers (ViTs) com raciocínio orientado por linguagem, melhorando desempenho e robustez em diferentes domínios.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

ESearch-R1: Aprendendo Agentes MLLM Conscientes de Custo para Busca Embodida Interativa via Aprendizado por Reforço

Modelos de Linguagem de Grande Escala Multimodal (MLLMs) capacitaram agentes embodidos com habilidades notáveis em planejamento e raciocínio. No entanto, ao enfrentar instruções ambíguas em linguagem natural, os agentes atuais frequentemente falham em equilibrar o alto custo da exploração física com o custo cognitivo da interação humana. Para preencher essa lacuna, propomos o ESearch-R1, um framework de raciocínio embodido consciente de custo.

Fonte: arXiv cs.AI

NLP/LLMs • Score 93

Hippocampo Externo: Mapas Cognitivos Topológicos para Orientar o Raciocínio de Modelos de Linguagem Grande

Este artigo propõe o framework Hippocampo Externo, que modela o raciocínio de modelos de linguagem a partir de uma perspectiva de dinâmica cognitiva como o fluxo de energia de informação no espaço semântico. O framework constrói mapas cognitivos topológicos através de projeção de redução de dimensionalidade, permitindo navegação precisa e intervenção do fluxo de energia durante o teste, sem requisitos computacionais substanciais.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

MEEA: Otimização Confrontacional Baseada no Efeito de Exposição Mere para Jailbreaking de LLMs

O rápido avanço dos grandes modelos de linguagem (LLMs) intensificou preocupações sobre a robustez de seu alinhamento de segurança. Propomos o MEEA (Mere Exposure Effect Attack), um framework automatizado inspirado na psicologia para avaliar a robustez de segurança em interações multi-turno, utilizando o efeito de exposição mere. Nossos experimentos mostram que o MEEA supera consistentemente as taxas de sucesso de ataque de modelos como GPT-4 e Claude-3.5.

Fonte: arXiv cs.AI

RL • Score 96

Podemos Testar Teorias da Consciência em IA? Ablations, Marcadores e Robustez

A busca por indicadores confiáveis de consciência se fragmentou em campos teóricos concorrentes (Global Workspace Theory (GWT), Integrated Information Theory (IIT) e Higher-Order Theories (HOT)), cada um propondo assinaturas neurais distintas. Adotamos uma abordagem de neuro-fenomenologia sintética, construindo agentes artificiais para testar as consequências funcionais dessas teorias através de ablações arquitetônicas precisas. Relatamos dissociações que sugerem que essas teorias descrevem camadas funcionais complementares.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

Observador, Não Jogador: Simulando a Teoria da Mente em LLMs através da Observação de Jogos

Apresentamos um framework interativo para avaliar se modelos de linguagem grandes (LLMs) exibem um verdadeiro 'entendimento' em um ambiente simples, mas estratégico. Utilizando o jogo Pedra-Papel-Tesoura (RPS) como exemplo, nosso sistema posiciona o LLM como um Observador, cuja tarefa é identificar as estratégias em jogo e articular o raciocínio por trás desse julgamento.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

Grandes Modelos de Linguagem como Filtros Bayesianos Descontados

Os Grandes Modelos de Linguagem (LLMs) demonstram forte generalização em poucos exemplos através do aprendizado em contexto, mas seu raciocínio em ambientes dinâmicos e estocásticos permanece opaco. Introduzimos um framework de filtragem bayesiana para avaliar a inferência online em LLMs, revelando como as atualizações de crença se comportam como filtros de esquecimento exponencial.

Fonte: arXiv cs.AI

RL • Score 93

Rumo ao Descida Guiada: Algoritmos de Otimização para Treinamento de Redes Neurais em Larga Escala

A otimização de redes neurais continua sendo um dos desafios mais significativos e mal compreendidos na pesquisa de IA moderna. Melhorias em algoritmos de treinamento podem levar a um aprendizado de características aprimorado em modelos fundamentais, reduções significativas no tempo de treinamento e uma melhor interpretabilidade de como as redes aprendem. Esta tese investiga a evolução dos algoritmos de otimização, revelando como um design algorítmico fundamentado pode desmistificar o processo de treinamento.

Fonte: arXiv cs.LG

Evaluation/Benchmarks • Score 95

Descida de Espelho Variacional Online para Aprendizado Robusto na Ponte de Schrödinger

A Ponte de Schrödinger (SB) evoluiu para uma classe universal de modelos generativos probabilísticos. No entanto, os sinais de aprendizado estimados são intrinsecamente incertos, e a confiabilidade prometida pelos métodos existentes muitas vezes se baseia em cenários ótimos especulativos. Neste trabalho, propomos um framework de Descida de Espelho Online Variacional (OMD) para os problemas de SB, que proporciona maior estabilidade aos solucionadores de SB.

Fonte: arXiv stat.ML

RL • Score 93

Sobre a Taxa de Convergência do Gradiente Descendente LoRA

O algoritmo de adaptação de baixa rank (LoRA) para ajuste fino de grandes modelos ganhou popularidade nos últimos anos devido ao seu desempenho notável e baixos requisitos computacionais. Este trabalho apresenta pela primeira vez uma análise de convergência não assintótica do algoritmo de gradiente descendente LoRA original, sem pressupostos que limitam a compreensão da convergência.

Fonte: arXiv cs.LG

NLP/LLMs • Score 95

Generalization Gaps in Political Fake News Detection: An Empirical Study on the LIAR Dataset

arXiv:2512.18533v1 Announce Type: new Abstract: The proliferation of linguistically subtle political disinformation poses a significant challenge to automated fact-checking systems. Despite increasing emphasis on complex neural architectures, the empirical limits of text-only linguistic modeling remain underexplored. We present a systematic diagnostic evaluation of nine machine learning algorithms on the LIAR benchmark. By isolating lexical features (Bag-of-Words, TF-IDF) and semantic embeddings (GloVe), we uncover a hard "Performance Ceiling", with fine-grained classification not exceeding a Weighted F1-score of 0.32 across models. Crucially, a simple linear SVM (Accuracy: 0.624) matches the performance of pre-trained Transformers such as RoBERTa (Accuracy: 0.620), suggesting that model capacity is not the primary bottleneck. We further diagnose a massive "Generalization Gap" in tree-based ensembles, which achieve more than 99% training accuracy but collapse to approximately 25% on test data, indicating reliance on lexical memorization rather than semantic inference. Synthetic data augmentation via SMOTE yields no meaningful gains, confirming that the limitation is semantic (feature ambiguity) rather than distributional. These findings indicate that for political fact-checking, increasing model complexity without incorporating external knowledge yields diminishing returns.

Fonte: arXiv cs.CL

NLP/LLMs • Score 95

O Número de Condição como um Proxy Invariante de Escala para Codificação de Informação em Unidades Neurais

Este artigo explora a relação entre o número de condição do tensor de pesos de uma rede neural e a extensão da informação codificada pela unidade de processamento associada, sob a perspectiva da teoria da informação. Argumenta-se que um número de condição elevado pode indicar que a unidade aprendeu a amplificar e comprimir informações de forma seletiva.

Fonte: arXiv stat.ML

Vision • Score 96

NOVA: Descobrindo Transformações Winograd Bem Condicionadas através da Otimização Numérica da Aritmética de Vandermonde

A convolução Winograd é o algoritmo padrão para inferência eficiente, reduzindo a complexidade aritmética em 2,25x para núcleos 3x3. No entanto, enfrenta uma barreira crítica na era moderna da computação de baixa precisão: a instabilidade numérica. Apresentamos o NOVA, um framework de descoberta que otimiza a seleção de pontos Winograd como um problema de otimização contínua.

Fonte: arXiv cs.LG

RL • Score 95

Seleção de Recursos Não Supervisionada via Autoencoder Robusto e Aprendizado Adaptativo de Grafo

A seleção eficaz de recursos é essencial para a análise de dados de alta dimensão e machine learning. A seleção de recursos não supervisionada (UFS) visa agrupar dados e identificar as características mais discriminativas. Propomos o modelo Robust Autoencoder-based Unsupervised Feature Selection (RAEUFS), que utiliza um autoencoder profundo para aprender representações de recursos não lineares, melhorando a robustez contra outliers.

Fonte: arXiv stat.ML

RL • Score 96

Inteligência Alinhada à Segurança Embutida via Embeddings de Alinhamento Interno Diferenciáveis

Apresentamos a Inteligência Alinhada à Segurança Embutida (ESAI), um framework teórico para aprendizado por reforço multi-agente que incorpora restrições de alinhamento diretamente nas representações internas dos agentes usando embeddings de alinhamento interno diferenciáveis. Este trabalho analisa condições de estabilidade e propriedades teóricas, posicionando o ESAI como uma contribuição conceitual para mecanismos de alinhamento diferenciáveis em sistemas multi-agente.

Fonte: arXiv cs.LG

Theory/Optimization • Score 87

Redução de Variância e Baixa Complexidade de Amostra em Otimização Estocástica via Método de Ponto Proximal

As garantias de alta probabilidade em otimização estocástica são frequentemente obtidas apenas sob suposições de ruído forte, como caudas sub-Gaussianas. Mostramos que tais garantias também podem ser alcançadas sob a suposição mais fraca de variância limitada, desenvolvendo um método estocástico de ponto proximal que combina um solucionador de subproblemas proximais com um amplificador de probabilidade.

Fonte: arXiv stat.ML

NLP/LLMs • Score 95

MemEvolve: Meta-Evolution of Agent Memory Systems

arXiv:2512.18746v1 Announce Type: new Abstract: Self-evolving memory systems are unprecedentedly reshaping the evolutionary paradigm of large language model (LLM)-based agents. Prior work has predominantly relied on manually engineered memory architectures to store trajectories, distill experience, and synthesize reusable tools, enabling agents to evolve on the fly within environment interactions. However, this paradigm is fundamentally constrained by the staticity of the memory system itself: while memory facilitates agent-level evolving, the underlying memory architecture cannot be meta-adapted to diverse task contexts. To address this gap, we propose MemEvolve, a meta-evolutionary framework that jointly evolves agents' experiential knowledge and their memory architecture, allowing agent systems not only to accumulate experience but also to progressively refine how they learn from it. To ground MemEvolve in prior research and foster openness in future self-evolving systems, we introduce EvolveLab, a unified self-evolving memory codebase that distills twelve representative memory systems into a modular design space (encode, store, retrieve, manage), providing both a standardized implementation substrate and a fair experimental arena. Extensive evaluations on four challenging agentic benchmarks demonstrate that MemEvolve achieves (I) substantial performance gains, improving frameworks such as SmolAgent and Flash-Searcher by up to $17.06\%$; and (II) strong cross-task and cross-LLM generalization, designing memory architectures that transfer effectively across diverse benchmarks and backbone models.

Fonte: arXiv cs.CL

RL • Score 95

From Natural Language to Control Signals: A Conceptual Framework for Semantic Channel Finding in Complex Experimental Infrastructure

arXiv:2512.18779v1 Announce Type: new Abstract: Modern experimental platforms such as particle accelerators, fusion devices, telescopes, and industrial process control systems expose tens to hundreds of thousands of control and diagnostic channels accumulated over decades of evolution. Operators and AI systems rely on informal expert knowledge, inconsistent naming conventions, and fragmented documentation to locate signals for monitoring, troubleshooting, and automated control, creating a persistent bottleneck for reliability, scalability, and language-model-driven interfaces. We formalize semantic channel finding-mapping natural-language intent to concrete control-system signals-as a general problem in complex experimental infrastructure, and introduce a four-paradigm framework to guide architecture selection across facility-specific data regimes. The paradigms span (i) direct in-context lookup over curated channel dictionaries, (ii) constrained hierarchical navigation through structured trees, (iii) interactive agent exploration using iterative reasoning and tool-based database queries, and (iv) ontology-grounded semantic search that decouples channel meaning from facility-specific naming conventions. We demonstrate each paradigm through proof-of-concept implementations at four operational facilities spanning two orders of magnitude in scale-from compact free-electron lasers to large synchrotron light sources-and diverse control-system architectures, from clean hierarchies to legacy environments. These implementations achieve 90-97% accuracy on expert-curated operational queries.

Fonte: arXiv cs.CL

NLP/LLMs • Score 95

A Large Language Model Based Method for Complex Logical Reasoning over Knowledge Graphs

arXiv:2512.19092v1 Announce Type: new Abstract: Reasoning over knowledge graphs (KGs) with first-order logic (FOL) queries is challenging due to the inherent incompleteness of real-world KGs and the compositional complexity of logical query structures. Most existing methods rely on embedding entities and relations into continuous geometric spaces and answer queries via differentiable set operations. While effective for simple query patterns, these approaches often struggle to generalize to complex queries involving multiple operators, deeper reasoning chains, or heterogeneous KG schemas. We propose ROG (Reasoning Over knowledge Graphs with large language models), an ensemble-style framework that combines query-aware KG neighborhood retrieval with large language model (LLM)-based chain-of-thought reasoning. ROG decomposes complex FOL queries into sequences of simpler sub-queries, retrieves compact, query-relevant subgraphs as contextual evidence, and performs step-by-step logical inference using an LLM, avoiding the need for task-specific embedding optimization. Experiments on standard KG reasoning benchmarks demonstrate that ROG consistently outperforms strong embedding-based baselines in terms of mean reciprocal rank (MRR), with particularly notable gains on high-complexity query types. These results suggest that integrating structured KG retrieval with LLM-driven logical reasoning offers a robust and effective alternative for complex KG reasoning tasks.

Fonte: arXiv cs.CL

RL • Score 95

Amostragem de distribuições multimodais com pontos de partida aquecidos: Limites não assintóticos para o Reweighted Annealed Leap-Point Sampler

A amostragem de distribuições multimodais é um desafio central na inferência bayesiana e machine learning. Este trabalho introduz o Reweighted ALPS (Re-ALPS), uma versão modificada do Annealed Leap-Point Sampler (ALPS), que elimina a suposição de aproximação gaussiana e apresenta um limite de tempo polinomial em um contexto geral.

Fonte: arXiv stat.ML

RL • Score 96

Monitoramento da Monitorabilidade

A observabilidade na tomada de decisões de sistemas de IA modernos pode ser necessária para implantar com segurança agentes cada vez mais capazes. Monitorar a cadeia de pensamento (CoT) dos modelos de raciocínio atuais tem se mostrado eficaz na detecção de comportamentos inadequados. No entanto, essa 'monitorabilidade' pode ser frágil sob diferentes procedimentos de treinamento e fontes de dados.

Fonte: arXiv cs.AI

RL • Score 93

Família FedSUM: Métodos Eficientes de Aprendizado Federado sob Participação Arbitrária de Clientes

Os métodos de Aprendizado Federado (FL) são frequentemente projetados para padrões específicos de participação de clientes, limitando sua aplicabilidade em implementações práticas. Apresentamos a família de algoritmos FedSUM, que suporta participação arbitrária de clientes sem suposições adicionais sobre a heterogeneidade dos dados. Nosso framework modela a variabilidade de participação com duas métricas de atraso: o atraso máximo $ au_{ ext{max}}$ e o atraso médio $ au_{ ext{avg}}$.

Fonte: arXiv cs.LG

RL • Score 96

Unificando Aprendizado por Reforço Causal: Revisão, Taxonomia, Algoritmos e Aplicações

Integrar inferência causal (CI) com aprendizado por reforço (RL) se tornou um paradigma poderoso para abordar limitações críticas no RL clássico, como baixa explicabilidade e falta de robustez. Este trabalho revisa avanços recentes na interseção entre CI e RL, categorizando abordagens existentes e discutindo desafios, sucessos empíricos e direções futuras de pesquisa.

Fonte: arXiv cs.AI

MLOps/Systems • Score 92

Uma Perspectiva de Otimização Riemanniana do Método Gauss-Newton para Redes Neurais Feedforward

Neste trabalho, estabelecemos limites de convergência não assintóticos para o método Gauss-Newton no treinamento de redes neurais com ativações suaves. No regime subparametrizado, o fluxo de gradiente Gauss-Newton induz um fluxo de gradiente Riemanniano em uma subvariedade embutida de baixa dimensão do espaço de funções.

Fonte: arXiv stat.ML

MLOps/Systems • Score 95

GenUQ: Estimativas de Incerteza Preditiva via Redes Hiper-Geradoras

O aprendizado de operadores é uma generalização recente da regressão para mapeamentos entre funções, prometendo reduzir a integração numérica cara de PDEs para avaliações rápidas de mapeamentos entre estados funcionais de um sistema. Neste artigo, apresentamos o GenUQ, uma abordagem teórica de medida para UQ que evita a construção de uma verossimilhança ao introduzir um modelo de rede hiper-geradora.

Fonte: arXiv stat.ML

RL • Score 96

Extensão de Base Contrafactual e Geometria Representacional: Um Modelo de Crescimento Conceitual com Restrições de MDL

O aprendizado de conceitos se torna possível apenas quando representações existentes falham em contabilizar a experiência. Este artigo propõe um framework geométrico onde o crescimento conceitual é modelado como extensão de base admissível avaliada sob um critério de Minimum Description Length (MDL). A experiência é representada como vetores em relação a um subespaço conceitual atual.

Fonte: arXiv cs.AI

Theory/Optimization • Score 92

Inferência Causal como Adaptação de Distribuição: Otimizando o Risco ATE sob Incerteza de Propensão

Abordagens padrão para inferência causal, como Regressão de Resultado e Ajuste de Regressão Ponderada por Probabilidade Inversa (IPWRA), são geralmente derivadas através da lente da imputação de dados ausentes e teoria de identificação. Neste trabalho, unificamos esses métodos sob uma perspectiva de Machine Learning, reformulando a estimativa de ATE como um problema de adaptação de domínio sob mudança de distribuição.

Fonte: arXiv stat.ML

NLP/LLMs • Score 95

Rumo a Testes de Independência Condicional Escaláveis e Válidos com Representações Espectrais

A independência condicional (CI) é central para inferência causal, seleção de características e modelagem gráfica, mas é muitas vezes impossível de testar sem suposições adicionais. Testes existentes de CI dependem de condições estruturais restritivas, limitando sua validade em dados do mundo real. Este trabalho explora se o aprendizado de representações pode ajudar a superar essas limitações.

Fonte: arXiv stat.ML

Evaluation/Benchmarks • Score 92

Uma Função de Perda Convexa para Predição de Conjuntos com Compromissos Otimais Entre Tamanho e Cobertura Condicional

Consideramos problemas de aprendizado supervisionado em que previsões de conjuntos fornecem estimativas explícitas de incerteza. Usando integrais de Choquet (também conhecidas como extensões de Lov{á}sz), propomos uma função de perda convexa para funções de subconjunto não decrescentes obtidas como conjuntos de nível de uma função de valor real.

Fonte: arXiv stat.ML

RL • Score 92

A Single-Loop First-Order Algorithm for Linearly Constrained Bilevel Optimization

arXiv:2510.24710v2 Announce Type: replace-cross Abstract: We study bilevel optimization problems where the lower-level problems are strongly convex and have coupled linear constraints. To overcome the potential non-smoothness of the hyper-objective and the computational challenges associated with the Hessian matrix, we utilize penalty and augmented Lagrangian methods to reformulate the original problem as a single-level one. Especially, we establish a strong theoretical connection between the reformulated function and the original hyper-objective by characterizing the closeness of their values and derivatives. Based on this reformulation, we propose a single-loop, first-order algorithm for linearly constrained bilevel optimization (SFLCB). We provide rigorous analyses of its non-asymptotic convergence rates, showing an improvement over prior double-loop algorithms -- form $O(\epsilon^{-3}\log(\epsilon^{-1}))$ to $O(\epsilon^{-3})$. The experiments corroborate our theoretical findings and demonstrate the practical efficiency of the proposed SFLCB algorithm. Simulation code is provided at https://github.com/ShenGroup/SFLCB.

Fonte: arXiv stat.ML

RL • Score 92

Garantias de Convergência Teórica para Autoencoders Variacionais

Os Autoencoders Variacionais (VAE) são modelos generativos populares usados para amostrar de distribuições de dados complexas. Este artigo busca preencher lacunas na compreensão das propriedades teóricas dos VAE, fornecendo garantias de convergência não assintótica para VAE treinados com os algoritmos Stochastic Gradient Descent e Adam, derivando uma taxa de convergência de $ ext{O}( rac{ ext{log} n}{ ext{sqrt}(n)})$.

Fonte: arXiv stat.ML

Theory/Optimization • Score 92

Desdobramento Generalizado de Dados Usando Estatísticas Suficientes

Nosso objetivo é desenvolver uma estratégia geral para decompor uma variável aleatória $X$ em múltiplas variáveis aleatórias independentes, sem sacrificar informações sobre parâmetros desconhecidos. Este trabalho generaliza um procedimento recente, permitindo a reconstrução exata de $X$ a partir de funções conhecidas das variáveis independentes.

Fonte: arXiv stat.ML

Theory/Optimization • Score 89

Nonasymptotic Convergence Rates for Plug-and-Play Methods With MMSE Denoisers

arXiv:2510.27211v4 Announce Type: replace-cross Abstract: It is known that the minimum-mean-squared-error (MMSE) denoiser under Gaussian noise can be written as a proximal operator, which suffices for asymptotic convergence of plug-and-play (PnP) methods but does not reveal the structure of the induced regularizer or give convergence rates. We show that the MMSE denoiser corresponds to a regularizer that can be written explicitly as an upper Moreau envelope of the negative log-marginal density, which in turn implies that the regularizer is 1-weakly convex. Using this property, we derive (to the best of our knowledge) the first sublinear convergence guarantee for PnP proximal gradient descent with an MMSE denoiser. We validate the theory with a one-dimensional synthetic study that recovers the implicit regularizer. We also validate the theory with imaging experiments (deblurring and computed tomography), which exhibit the predicted sublinear behavior.

Fonte: arXiv stat.ML

Theory/Optimization • Score 89

DGH: Dynamic Gaussian Hair

arXiv:2512.17094v1 Announce Type: new Abstract: The creation of photorealistic dynamic hair remains a major challenge in digital human modeling because of the complex motions, occlusions, and light scattering. Existing methods often resort to static capture and physics-based models that do not scale as they require manual parameter fine-tuning to handle the diversity of hairstyles and motions, and heavy computation to obtain high-quality appearance. In this paper, we present Dynamic Gaussian Hair (DGH), a novel framework that efficiently learns hair dynamics and appearance. We propose: (1) a coarse-to-fine model that learns temporally coherent hair motion dynamics across diverse hairstyles; (2) a strand-guided optimization module that learns a dynamic 3D Gaussian representation for hair appearance with support for differentiable rendering, enabling gradient-based learning of view-consistent appearance under motion. Unlike prior simulation-based pipelines, our approach is fully data-driven, scales with training data, and generalizes across various hairstyles and head motion sequences. Additionally, DGH can be seamlessly integrated into a 3D Gaussian avatar framework, enabling realistic, animatable hair for high-fidelity avatar representation. DGH achieves promising geometry and appearance results, providing a scalable, data-driven alternative to physics-based simulation and rendering.

Fonte: arXiv cs.CV

RL • Score 93

Valor Sob Ignorância na Inteligência Artificial Universal

Neste trabalho, generalizamos o agente de aprendizado por reforço AIXI para admitir uma classe mais ampla de funções de utilidade. Atribuir uma utilidade a cada possível histórico de interação nos leva a confrontar a ambiguidade de que algumas hipóteses na distribuição de crença do agente preveem apenas um prefixo finito do histórico, o que é interpretado como implicando uma chance de morte igual a uma quantidade chamada perda semimeasure.

Fonte: arXiv cs.AI

Vision • Score 96

Dialética para Inteligência Artificial

O artigo investiga se a inteligência artificial pode descobrir conceitos a partir de experiências brutas e sem supervisão humana. Propõe-se uma definição de 'conceito' que vai além de um rótulo de dicionário, considerando a relação estrutural com a experiência total de um agente. A reversibilidade é um aspecto central, permitindo que a existência de conceitos seja uma afirmação estrutural verificável.

Fonte: arXiv cs.AI

Evaluation/Benchmarks • Score 93

Otimização da Busca de Texto: Um Novo Algoritmo de Correspondência de Padrões Baseado na Abordagem de Ukkonen

No campo da ciência da computação, a eficiência dos algoritmos de busca de texto é crucial para processar grandes volumes de dados em áreas como processamento de linguagem natural e bioinformática. Este estudo investiga algoritmos de busca de texto, focando na otimização de Suffix Trees através de métodos como Splitting e o Algoritmo de Ukkonen, demonstrando eficiências em tempo e espaço linear.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

Sobre o Papel da Informação Contextual e Estados do Ego no Comportamento de Agentes LLM para Diálogos de Análise Transacional

Agentes impulsionados por LLM são utilizados em diversas áreas, desde suporte ao cliente até educação, com um crescente interesse em sua capacidade de agir de forma mais humana. Este artigo propõe um Sistema Multi-Agente (MAS) inspirado na teoria da Análise Transacional (TA), onde cada agente é dividido em três estados do ego: Pai, Adulto e Criança, enriquecendo o processo de resposta com um mecanismo de recuperação de informações contextuais.

Fonte: arXiv cs.AI

RL • Score 96

Conjunto de Dados Sintético que Preserva a Privacidade de Trajetórias Diárias Individuais para Análises de Mobilidade em Escala Urbana

Os dados de mobilidade urbana são essenciais para o planejamento urbano, previsão de demanda de transporte e modelagem de pandemias. Este estudo apresenta um conjunto de dados sintético que preserva a privacidade, reconstruindo trajetórias diárias a partir de entradas agregadas, sem a necessidade de identificadores pessoais.

Fonte: arXiv cs.AI

NLP/LLMs • Score 96

QSMOTE-PGM/kPGM: Classificação de Conjuntos de Dados Desbalanceados Baseada em QSMOTE e kPGM

O aprendizado de máquina inspirado em quantum (QiML) utiliza estruturas matemáticas da teoria quântica para aprimorar algoritmos clássicos, com foco nas estruturas de produto interno em espaços de características de alta dimensão. Este trabalho apresenta uma comparação teórica e empírica unificada de classificadores baseados em PGM e KPGM, analisando seu desempenho em cenários de oversampling sintético usando variantes do Quantum SMOTE (QSMOTE).

Fonte: arXiv cs.LG

NLP/LLMs • Score 96

Turn-PPO: Estimativa de Vantagem em Nível de Turno com PPO para Melhoria do RL Multi-Turno em LLMs Agentes

O aprendizado por reforço (RL) ressurgiu como uma abordagem natural para treinar agentes LLM interativos em ambientes do mundo real. No entanto, a aplicação direta do algoritmo Group Relative Policy Optimization (GRPO) em tarefas multi-turno revela limitações significativas. Para superar esses desafios, investigamos estratégias de estimativa de vantagem mais estáveis e eficazes, introduzindo o turn-PPO como uma variante que opera em uma formulação de MDP em nível de turno.

Fonte: arXiv cs.LG

RL • Score 92

Convergence Guarantees for Federated SARSA with Local Training and Heterogeneous Agents

arXiv:2512.17688v1 Announce Type: cross Abstract: We present a novel theoretical analysis of Federated SARSA (FedSARSA) with linear function approximation and local training. We establish convergence guarantees for FedSARSA in the presence of heterogeneity, both in local transitions and rewards, providing the first sample and communication complexity bounds in this setting. At the core of our analysis is a new, exact multi-step error expansion for single-agent SARSA, which is of independent interest. Our analysis precisely quantifies the impact of heterogeneity, demonstrating the convergence of FedSARSA with multiple local updates. Crucially, we show that FedSARSA achieves linear speed-up with respect to the number of agents, up to higher-order terms due to Markovian sampling. Numerical experiments support our theoretical findings.

Fonte: arXiv stat.ML

NLP/LLMs • Score 96

Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs

arXiv:2512.17131v1 Announce Type: cross Abstract: We propose Generalized Primal Averaging (GPA), an extension of Nesterov's method in its primal averaging formulation that addresses key limitations of recent averaging-based optimizers such as single-worker DiLoCo and Schedule-Free (SF) in the non-distributed setting. These two recent algorithmic approaches improve the performance of base optimizers, such as AdamW, through different iterate averaging strategies. Schedule-Free explicitly maintains a uniform average of past weights, while single-worker DiLoCo performs implicit averaging by periodically aggregating trajectories, called pseudo-gradients, to update the model parameters. However, single-worker DiLoCo's periodic averaging introduces a two-loop structure, increasing its memory requirements and number of hyperparameters. GPA overcomes these limitations by decoupling the interpolation constant in the primal averaging formulation of Nesterov. This decoupling enables GPA to smoothly average iterates at every step, generalizing and improving upon single-worker DiLoCo. Empirically, GPA consistently outperforms single-worker DiLoCo while removing the two-loop structure, simplifying hyperparameter tuning, and reducing its memory overhead to a single additional buffer. On the Llama-160M model, GPA provides a 24.22% speedup in terms of steps to reach the baseline (AdamW's) validation loss. Likewise, GPA achieves speedups of 12% and 27% on small and large batch setups, respectively, to attain AdamW's validation accuracy on the ImageNet ViT workload. Furthermore, we prove that for any base optimizer with regret bounded by $O(\sqrt{T})$, where $T$ is the number of iterations, GPA can match or exceed the convergence guarantee of the original optimizer, depending on the choice of interpolation constants.

Fonte: arXiv cs.AI

RL • Score 96

SHARP-QoS: Sparsely-gated Hierarchical Adaptive Routing for joint Prediction of QoS

arXiv:2512.17262v1 Announce Type: new Abstract: Dependable service-oriented computing relies on multiple Quality of Service (QoS) parameters that are essential to assess service optimality. However, real-world QoS data are extremely sparse, noisy, and shaped by hierarchical dependencies arising from QoS interactions, and geographical and network-level factors, making accurate QoS prediction challenging. Existing methods often predict each QoS parameter separately, requiring multiple similar models, which increases computational cost and leads to poor generalization. Although recent joint QoS prediction studies have explored shared architectures, they suffer from negative transfer due to loss-scaling caused by inconsistent numerical ranges across QoS parameters and further struggle with inadequate representation learning, resulting in degraded accuracy. This paper presents an unified strategy for joint QoS prediction, called SHARP-QoS, that addresses these issues using three components. First, we introduce a dual mechanism to extract the hierarchical features from both QoS and contextual structures via hyperbolic convolution formulated in the Poincar\'e ball. Second, we propose an adaptive feature-sharing mechanism that allows feature exchange across informative QoS and contextual signals. A gated feature fusion module is employed to support dynamic feature selection among structural and shared representations. Third, we design an EMA-based loss balancing strategy that allows stable joint optimization, thereby mitigating the negative transfer. Evaluations on three datasets with two, three, and four QoS parameters demonstrate that SHARP-QoS outperforms both single- and multi-task baselines. Extensive study shows that our model effectively addresses major challenges, including sparsity, robustness to outliers, and cold-start, while maintaining moderate computational overhead, underscoring its capability for reliable joint QoS prediction.

Fonte: arXiv cs.LG

Evaluation/Benchmarks • Score 96

Bridging Training and Merging Through Momentum-Aware Optimization

arXiv:2512.17109v1 Announce Type: new Abstract: Training large neural networks and merging task-specific models both exploit low-rank structure and require parameter importance estimation, yet these challenges have been pursued in isolation. Current workflows compute curvature information during training, discard it, then recompute similar information for merging -- wasting computation and discarding valuable trajectory data. We introduce a unified framework that maintains factorized momentum and curvature statistics during training, then reuses this information for geometry-aware model composition. The proposed method achieves memory efficiency comparable to state-of-the-art approaches while accumulating task saliency scores that enable curvature-aware merging without post-hoc Fisher computation. We establish convergence guarantees for non-convex objectives with approximation error bounded by gradient singular value decay. On natural language understanding benchmarks, curvature-aware parameter selection outperforms magnitude-only baselines across all sparsity levels, with multi-task merging improving over strong baselines. The proposed framework exhibits rank-invariant convergence and superior hyperparameter robustness compared to existing low-rank optimizers. By treating the optimization trajectory as a reusable asset rather than discarding it, our approach eliminates redundant computation while enabling more principled model composition.

Fonte: arXiv cs.LG

MLOps/Systems • Score 95

Regularized Random Fourier Features and Finite Element Reconstruction for Operator Learning in Sobolev Space

arXiv:2512.17884v1 Announce Type: cross Abstract: Operator learning is a data-driven approximation of mappings between infinite-dimensional function spaces, such as the solution operators of partial differential equations. Kernel-based operator learning can offer accurate, theoretically justified approximations that require less training than standard methods. However, they can become computationally prohibitive for large training sets and can be sensitive to noise. We propose a regularized random Fourier feature (RRFF) approach, coupled with a finite element reconstruction map (RRFF-FEM), for learning operators from noisy data. The method uses random features drawn from multivariate Student's $t$ distributions, together with frequency-weighted Tikhonov regularization that suppresses high-frequency noise. We establish high-probability bounds on the extreme singular values of the associated random feature matrix and show that when the number of features $N$ scales like $m \log m$ with the number of training samples $m$, the system is well-conditioned, which yields estimation and generalization guarantees. Detailed numerical experiments on benchmark PDE problems, including advection, Burgers', Darcy flow, Helmholtz, Navier-Stokes, and structural mechanics, demonstrate that RRFF and RRFF-FEM are robust to noise and achieve improved performance with reduced training time compared to the unregularized random feature model, while maintaining competitive accuracy relative to kernel and neural operator tests.

Fonte: arXiv stat.ML

RL • Score 96

Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making

arXiv:2512.17091v1 Announce Type: cross Abstract: We propose a new approach for solving planning problems with a hierarchical structure, fusing reinforcement learning and MPC planning. Our formulation tightly and elegantly couples the two planning paradigms. It leverages reinforcement learning actions to inform the MPPI sampler, and adaptively aggregates MPPI samples to inform the value estimation. The resulting adaptive process leverages further MPPI exploration where value estimates are uncertain, and improves training robustness and the overall resulting policies. This results in a robust planning approach that can handle complex planning problems and easily adapts to different applications, as demonstrated over several domains, including race driving, modified Acrobot, and Lunar Lander with added obstacles. Our results in these domains show better data efficiency and overall performance in terms of both rewards and task success, with up to a 72% increase in success rate compared to existing approaches, as well as accelerated convergence (x2.1) compared to non-adaptive sampling.

Fonte: arXiv cs.AI

Theory/Optimization • Score 90

Universal consistency of the $k$-NN rule in metric spaces and Nagata dimension. III

arXiv:2512.17058v1 Announce Type: new Abstract: We prove the last remaining implication allowing to claim the equivalence of the following conditions for a complete separable metric space $X$: (1) The $k$-nearest neighbour classifier is (weakly) universally consistent in $X$, (2) The strong Lebesgue--Besicovitch differentiation property holds in $X$ for every locally finite Borel measure, (3) $X$ is sigma-finite dimensional in the sense of Nagata. The equivalence (2)$\iff$(3) was announced by Preiss (1983), while a detailed proof of the implication (3)$\Rightarrow$(2) has appeared in Assouad and Quentin de Gromard (2006). The implication (2)$\Rightarrow$(1) was established by C\'erou and Guyader (2006). We prove the implication (1)$\Rightarrow$(3). The result was conjectured in the first article in the series (Collins, Kumari, Pestov 2020), and here we also correct a wrong claim made in the second article (Kumari and Pestov 2024).

Fonte: arXiv cs.LG

RL • Score 96

Alzheimer's Disease Brain Network Mining

arXiv:2512.17276v1 Announce Type: new Abstract: Machine learning approaches for Alzheimer's disease (AD) diagnosis face a fundamental challenges. Clinical assessments are expensive and invasive, leaving ground truth labels available for only a fraction of neuroimaging datasets. We introduce Multi view Adaptive Transport Clustering for Heterogeneous Alzheimer's Disease (MATCH-AD), a semi supervised framework that integrates deep representation learning, graph-based label propagation, and optimal transport theory to address this limitation. The framework leverages manifold structure in neuroimaging data to propagate diagnostic information from limited labeled samples to larger unlabeled populations, while using Wasserstein distances to quantify disease progression between cognitive states. Evaluated on nearly five thousand subjects from the National Alzheimer's Coordinating Center, encompassing structural MRI measurements from hundreds of brain regions, cerebrospinal fluid biomarkers, and clinical variables MATCHAD achieves near-perfect diagnostic accuracy despite ground truth labels for less than one-third of subjects. The framework substantially outperforms all baseline methods, achieving kappa indicating almost perfect agreement compared to weak agreement for the best baseline, a qualitative transformation in diagnostic reliability. Performance remains clinically useful even under severe label scarcity, and we provide theoretical convergence guarantees with proven bounds on label propagation error and transport stability. These results demonstrate that principled semi-supervised learning can unlock the diagnostic potential of the vast repositories of partially annotated neuroimaging data accumulating worldwide, substantially reducing annotation burden while maintaining accuracy suitable for clinical deployment.

Fonte: arXiv cs.LG

Theory/Optimization • Score 90

Explanation Beyond Intuition: A Testable Criterion for Inherent Explainability

arXiv:2512.17316v1 Announce Type: new Abstract: Inherent explainability is the gold standard in Explainable Artificial Intelligence (XAI). However, there is not a consistent definition or test to demonstrate inherent explainability. Work to date either characterises explainability through metrics, or appeals to intuition - "we know it when we see it". We propose a globally applicable criterion for inherent explainability. The criterion uses graph theory for representing and decomposing models for structure-local explanation, and recomposing them into global explanations. We form the structure-local explanations as annotations, a verifiable hypothesis-evidence structure that allows for a range of explanatory methods to be used. This criterion matches existing intuitions on inherent explainability, and provides justifications why a large regression model may not be explainable but a sparse neural network could be. We differentiate explainable -- a model that allows for explanation -- and \textit{explained} -- one that has a verified explanation. Finally, we provide a full explanation of PREDICT -- a Cox proportional hazards model of cardiovascular disease risk, which is in active clinical use in New Zealand. It follows that PREDICT is inherently explainable. This work provides structure to formalise other work on explainability, and allows regulators a flexible but rigorous test that can be used in compliance frameworks.

Fonte: arXiv cs.LG

NLP/LLMs • Score 96

GreedySnake: Accelerating SSD-Offloaded LLM Training with Efficient Scheduling and Optimizer Step Overlapping

arXiv:2512.17570v1 Announce Type: new Abstract: SSD-offloaded training offers a practical and promising approach to making LLM training cost-effective. Building on gradient accumulation with micro-batches, this paper introduces GreedySnake, a new SSD-offloaded training system that employs vertical scheduling, which executes all microbatches of a layer before proceeding to the next. Compared to existing systems that use horizontal scheduling (i.e., executing micro-batches sequentially), GreedySnake achieves higher training throughput with smaller batch sizes, bringing the system much closer to the ideal scenario predicted by the roofline model. To further mitigate the I/O bottleneck, GreedySnake overlaps part of the optimization step with the forward pass of the next iteration. Experimental results on A100 GPUs show that GreedySnake achieves saturated training throughput improvements over ZeRO-Infinity: 1.96x on 1 GPU and 1.93x on 4 GPUs for GPT-65B, and 2.53x on 1 GPU for GPT-175B. The code is open-sourced at https://github.com/npz7yyk/GreedySnake

Fonte: arXiv cs.LG

NLP/LLMs • Score 96

Deep Learning-Based Surrogate Creep Modelling in Inconel 625: A High-Temperature Alloy Study

arXiv:2512.17477v1 Announce Type: new Abstract: Time-dependent deformation, particularly creep, in high-temperature alloys such as Inconel 625 is a key factor in the long-term reliability of components used in aerospace and energy systems. Although Inconel 625 shows excellent creep resistance, finite-element creep simulations in tools such as ANSYS remain computationally expensive, often requiring tens of minutes for a single 10,000-hour run. This work proposes deep learning based surrogate models to provide fast and accurate replacements for such simulations. Creep strain data was generated in ANSYS using the Norton law under uniaxial stresses of 50 to 150 MPa and temperatures of 700 to 1000 $^\circ$C, and this temporal dataset was used to train two architectures: a BiLSTM Variational Autoencoder for uncertainty-aware and generative predictions, and a BiLSTM Transformer hybrid that employs self-attention to capture long-range temporal behavior. Both models act as surrogate predictors, with the BiLSTM-VAE offering probabilistic output and the BiLSTM-Transformer delivering high deterministic accuracy. Performance is evaluated using RMSE, MAE, and $R^2$. Results show that the BiLSTM-VAE provides stable and reliable creep strain forecasts, while the BiLSTM-Transformer achieves strong accuracy across the full time range. Latency tests indicate substantial speedup: while each ANSYS simulation requires 30 to 40 minutes for a given stress-temperature condition, the surrogate models produce predictions within seconds. The proposed framework enables rapid creep assessment for design optimization and structural health monitoring, and provides a scalable solution for high-temperature alloy applications.

Fonte: arXiv cs.LG

Theory/Optimization • Score 90

Machine Learning for Static and Single-Event Dynamic Complex Network Analysis

arXiv:2512.17577v1 Announce Type: new Abstract: The primary objective of this thesis is to develop novel algorithmic approaches for Graph Representation Learning of static and single-event dynamic networks. In such a direction, we focus on the family of Latent Space Models, and more specifically on the Latent Distance Model which naturally conveys important network characteristics such as homophily, transitivity, and the balance theory. Furthermore, this thesis aims to create structural-aware network representations, which lead to hierarchical expressions of network structure, community characterization, the identification of extreme profiles in networks, and impact dynamics quantification in temporal networks. Crucially, the methods presented are designed to define unified learning processes, eliminating the need for heuristics and multi-stage processes like post-processing steps. Our aim is to delve into a journey towards unified network embeddings that are both comprehensive and powerful, capable of characterizing network structures and adeptly handling the diverse tasks that graph analysis offers.

Fonte: arXiv cs.LG

RL • Score 96

Learning Safe Autonomous Driving Policies Using Predictive Safety Representations

arXiv:2512.17586v1 Announce Type: new Abstract: Safe reinforcement learning (SafeRL) is a prominent paradigm for autonomous driving, where agents are required to optimize performance under strict safety requirements. This dual objective creates a fundamental tension, as overly conservative policies limit driving efficiency while aggressive exploration risks safety violations. The Safety Representations for Safer Policy Learning (SRPL) framework addresses this challenge by equipping agents with a predictive model of future constraint violations and has shown promise in controlled environments. This paper investigates whether SRPL extends to real-world autonomous driving scenarios. Systematic experiments on the Waymo Open Motion Dataset (WOMD) and NuPlan demonstrate that SRPL can improve the reward-safety tradeoff, achieving statistically significant improvements in success rate (effect sizes r = 0.65-0.86) and cost reduction (effect sizes r = 0.70-0.83), with p < 0.05 for observed improvements. However, its effectiveness depends on the underlying policy optimizer and the dataset distribution. The results further show that predictive safety representations play a critical role in improving robustness to observation noise. Additionally, in zero-shot cross-dataset evaluation, SRPL-augmented agents demonstrate improved generalization compared to non-SRPL methods. These findings collectively demonstrate the potential of predictive safety representations to strengthen SafeRL for autonomous driving.

Fonte: arXiv cs.LG

NLP/LLMs • Score 96

Understanding Generalization in Role-Playing Models via Information Theory

arXiv:2512.17270v1 Announce Type: new Abstract: Role-playing models (RPMs) are widely used in real-world applications but underperform when deployed in the wild. This degradation can be attributed to distribution shifts, including user, character, and dialogue compositional shifts. Existing methods like LLM-as-a-judge fall short in providing a fine-grained diagnosis of how these shifts affect RPM generalization, and thus there lack formal frameworks to characterize RPM generalization behaviors. To bridge these gaps, we introduce an information-theoretic metric, named reasoning-based effective mutual information difference (R-EMID), to measure RPM performance degradation in an interpretable way. We also derive an upper bound on R-EMID to predict the worst-case generalization performance of RPMs and theoretically reveal how various shifts contribute to the RPM performance degradation. Moreover, we propose a co-evolving reinforcement learning framework to adaptively model the connection among user, character, and dialogue context and thus enhance the estimation of dialogue response generation probability, which is critical for calculating R-EMID. Finally, we evaluate the generalization performance of various RPMs using R-EMID, finding that user shift poses the highest risk among all shifts and reinforcement learning is the most effective approach for enhancing RPM generalization.

Fonte: arXiv cs.LG

RL • Score 96

Assessing Long-Term Electricity Market Design for Ambitious Decarbonization Targets using Multi-Agent Reinforcement Learning

arXiv:2512.17444v1 Announce Type: new Abstract: Electricity systems are key to transforming today's society into a carbon-free economy. Long-term electricity market mechanisms, including auctions, support schemes, and other policy instruments, are critical in shaping the electricity generation mix. In light of the need for more advanced tools to support policymakers and other stakeholders in designing, testing, and evaluating long-term markets, this work presents a multi-agent reinforcement learning model capable of capturing the key features of decarbonizing energy systems. Profit-maximizing generation companies make investment decisions in the wholesale electricity market, responding to system needs, competitive dynamics, and policy signals. The model employs independent proximal policy optimization, which was selected for suitability to the decentralized and competitive environment. Nevertheless, given the inherent challenges of independent learning in multi-agent settings, an extensive hyperparameter search ensures that decentralized training yields market outcomes consistent with competitive behavior. The model is applied to a stylized version of the Italian electricity system and tested under varying levels of competition, market designs, and policy scenarios. Results highlight the critical role of market design for decarbonizing the electricity sector and avoiding price volatility. The proposed framework allows assessing long-term electricity markets in which multiple policy and market mechanisms interact simultaneously, with market participants responding and adapting to decarbonization pathways.

Fonte: arXiv cs.LG

MLOps/Systems • Score 96

NetworkFF: Unified Layer Optimization in Forward-Only Neural Networks

arXiv:2512.17531v1 Announce Type: new Abstract: The Forward-Forward algorithm eliminates backpropagation's memory constraints and biological implausibility through dual forward passes with positive and negative data. However, conventional implementations suffer from critical inter-layer isolation, where layers optimize goodness functions independently without leveraging collective learning dynamics. This isolation constrains representational coordination and limits convergence efficiency in deeper architectures. This paper introduces Collaborative Forward-Forward (CFF) learning, extending the original algorithm through inter-layer cooperation mechanisms that preserve forward-only computation while enabling global context integration. Our framework implements two collaborative paradigms: Fixed CFF (F-CFF) with constant inter-layer coupling and Adaptive CFF (A-CFF) with learnable collaboration parameters that evolve during training. The collaborative goodness function incorporates weighted contributions from all layers, enabling coordinated feature learning while maintaining memory efficiency and biological plausibility. Comprehensive evaluation on MNIST and Fashion-MNIST demonstrates significant performance improvements over baseline Forward-Forward implementations. These findings establish inter-layer collaboration as a fundamental enhancement to Forward-Forward learning, with immediate applicability to neuromorphic computing architectures and energy-constrained AI systems.

Fonte: arXiv cs.LG

RL • Score 96

Sharing Knowledge without Sharing Data: Stitches can improve ensembles of disjointly trained models

arXiv:2512.17592v1 Announce Type: new Abstract: Deep learning has been shown to be very capable at performing many real-world tasks. However, this performance is often dependent on the presence of large and varied datasets. In some settings, like in the medical domain, data is often fragmented across parties, and cannot be readily shared. While federated learning addresses this situation, it is a solution that requires synchronicity of parties training a single model together, exchanging information about model weights. We investigate how asynchronous collaboration, where only already trained models are shared (e.g. as part of a publication), affects performance, and propose to use stitching as a method for combining models. Through taking a multi-objective perspective, where performance on each parties' data is viewed independently, we find that training solely on a single parties' data results in similar performance when merging with another parties' data, when considering performance on that single parties' data, while performance on other parties' data is notably worse. Moreover, while an ensemble of such individually trained networks generalizes better, performance on each parties' own dataset suffers. We find that combining intermediate representations in individually trained models with a well placed pair of stitching layers allows this performance to recover to a competitive degree while maintaining improved generalization, showing that asynchronous collaboration can yield competitive results.

Fonte: arXiv cs.LG

RL • Score 95

Generative Multi-Objective Bayesian Optimization with Scalable Batch Evaluations for Sample-Efficient De Novo Molecular Design

arXiv:2512.17659v1 Announce Type: new Abstract: Designing molecules that must satisfy multiple, often conflicting objectives is a central challenge in molecular discovery. The enormous size of chemical space and the cost of high-fidelity simulations have driven the development of machine learning-guided strategies for accelerating design with limited data. Among these, Bayesian optimization (BO) offers a principled framework for sample-efficient search, while generative models provide a mechanism to propose novel, diverse candidates beyond fixed libraries. However, existing methods that couple the two often rely on continuous latent spaces, which introduces both architectural entanglement and scalability challenges. This work introduces an alternative, modular "generate-then-optimize" framework for de novo multi-objective molecular design/discovery. At each iteration, a generative model is used to construct a large, diverse pool of candidate molecules, after which a novel acquisition function, qPMHI (multi-point Probability of Maximum Hypervolume Improvement), is used to optimally select a batch of candidates most likely to induce the largest Pareto front expansion. The key insight is that qPMHI decomposes additively, enabling exact, scalable batch selection via only simple ranking of probabilities that can be easily estimated with Monte Carlo sampling. We benchmark the framework against state-of-the-art latent-space and discrete molecular optimization methods, demonstrating significant improvements across synthetic benchmarks and application-driven tasks. Specifically, in a case study related to sustainable energy storage, we show that our approach quickly uncovers novel, diverse, and high-performing organic (quinone-based) cathode materials for aqueous redox flow battery applications.

Fonte: arXiv stat.ML

NLP/LLMs • Score 95

Generalized infinite dimensional Alpha-Procrustes based geometries

arXiv:2511.09801v2 Announce Type: replace Abstract: This work extends the recently introduced Alpha-Procrustes family of Riemannian metrics for symmetric positive definite (SPD) matrices by incorporating generalized versions of the Bures-Wasserstein (GBW), Log-Euclidean, and Wasserstein distances. While the Alpha-Procrustes framework has unified many classical metrics in both finite- and infinite- dimensional settings, it previously lacked the structural components necessary to realize these generalized forms. We introduce a formalism based on unitized Hilbert-Schmidt operators and an extended Mahalanobis norm that allows the construction of robust, infinite-dimensional generalizations of GBW and Log-Hilbert-Schmidt distances. Our approach also incorporates a learnable regularization parameter that enhances geometric stability in high-dimensional comparisons. Preliminary experiments reproducing benchmarks from the literature demonstrate the improved performance of our generalized metrics, particularly in scenarios involving comparisons between datasets of varying dimension and scale. This work lays a theoretical and computational foundation for advancing robust geometric methods in machine learning, statistical inference, and functional data analysis.

Fonte: arXiv stat.ML

NLP/LLMs • Score 92

Towards Sharp Minimax Risk Bounds for Operator Learning

arXiv:2512.17805v1 Announce Type: cross Abstract: We develop a minimax theory for operator learning, where the goal is to estimate an unknown operator between separable Hilbert spaces from finitely many noisy input-output samples. For uniformly bounded Lipschitz operators, we prove information-theoretic lower bounds together with matching or near-matching upper bounds, covering both fixed and random designs under Hilbert-valued Gaussian noise and Gaussian white noise errors. The rates are controlled by the spectrum of the covariance operator of the measure that defines the error metric. Our setup is very general and allows for measures with unbounded support. A key implication is a curse of sample complexity which shows that the minimax risk for generic Lipschitz operators cannot decay at any algebraic rate in the sample size. We obtain essentially sharp characterizations when the covariance spectrum decays exponentially and provide general upper and lower bounds in slower-decay regimes.

Fonte: arXiv stat.ML

MLOps/Systems • Score 95

A Systems-Theoretic View on the Convergence of Algorithms under Disturbances

arXiv:2512.17598v1 Announce Type: cross Abstract: Algorithms increasingly operate within complex physical, social, and engineering systems where they are exposed to disturbances, noise, and interconnections with other dynamical systems. This article extends known convergence guarantees of an algorithm operating in isolation (i.e., without disturbances) and systematically derives stability bounds and convergence rates in the presence of such disturbances. By leveraging converse Lyapunov theorems, we derive key inequalities that quantify the impact of disturbances. We further demonstrate how our result can be utilized to assess the effects of disturbances on algorithmic performance in a wide variety of applications, including communication constraints in distributed learning, sensitivity in machine learning generalization, and intentional noise injection for privacy. This underpins the role of our result as a unifying tool for algorithm analysis in the presence of noise, disturbances, and interconnections with other dynamical systems.

Fonte: arXiv stat.ML

Theory/Optimization • Score 92

Weighted Stochastic Differential Equation to Implement Wasserstein-Fisher-Rao Gradient Flow

arXiv:2512.17878v1 Announce Type: cross Abstract: Score-based diffusion models currently constitute the state of the art in continuous generative modeling. These methods are typically formulated via overdamped or underdamped Ornstein--Uhlenbeck-type stochastic differential equations, in which sampling is driven by a combination of deterministic drift and Brownian diffusion, resulting in continuous particle trajectories in the ambient space. While such dynamics enjoy exponential convergence guarantees for strongly log-concave target distributions, it is well known that their mixing rates deteriorate exponentially in the presence of nonconvex or multimodal landscapes, such as double-well potentials. Since many practical generative modeling tasks involve highly non-log-concave target distributions, considerable recent effort has been devoted to developing sampling schemes that improve exploration beyond classical diffusion dynamics. A promising line of work leverages tools from information geometry to augment diffusion-based samplers with controlled mass reweighting mechanisms. This perspective leads naturally to Wasserstein--Fisher--Rao (WFR) geometries, which couple transport in the sample space with vertical (reaction) dynamics on the space of probability measures. In this work, we formulate such reweighting mechanisms through the introduction of explicit correction terms and show how they can be implemented via weighted stochastic differential equations using the Feynman--Kac representation. Our study provides a preliminary but rigorous investigation of WFR-based sampling dynamics, and aims to clarify their geometric and operator-theoretic structure as a foundation for future theoretical and algorithmic developments.

Fonte: arXiv stat.ML

Theory/Optimization • Score 92

Unifying Distributionally Robust Optimization via Optimal Transport Theory

arXiv:2308.05414v2 Announce Type: replace-cross Abstract: In recent years, two prominent paradigms have shaped distributionally robust optimization (DRO), modeling distributional ambiguity through $\phi$-divergences and Wasserstein distances, respectively. While the former focuses on ambiguity in likelihood ratios, the latter emphasizes ambiguity in outcomes and uses a transportation cost function to capture geometric structure in the outcome space. This paper proposes a unified framework that bridges these approaches by leveraging optimal transport (OT) with conditional moment constraints. Our formulation enables adversarial distributions to jointly perturb likelihood ratios and outcomes, yielding a generalized OT coupling between the nominal and perturbed distributions. We further establish key duality results and develop tractable reformulations that highlight the practical power of our unified approach.

Fonte: arXiv stat.ML

RL • Score 92

Regularized Langevin Dynamics for Combinatorial Optimization

arXiv:2502.00277v4 Announce Type: replace-cross Abstract: This work proposes a simple yet effective sampling framework for combinatorial optimization (CO). Our method builds on discrete Langevin dynamics (LD), an efficient gradient-guided generative paradigm. However, we observe that directly applying LD often leads to limited exploration. To overcome this limitation, we propose the Regularized Langevin Dynamics (RLD), which enforces an expected distance between the sampled and current solutions, effectively avoiding local minima. We develop two CO solvers on top of RLD, one based on simulated annealing (SA), and the other one based on neural network (NN). Empirical results on three classic CO problems demonstrate that both of our methods can achieve comparable or better performance against the previous state-of-the-art (SOTA) SA- and NN-based solvers. In particular, our SA algorithm reduces the runtime of the previous SOTA SA method by up to 80\%, while achieving equal or superior performance. In summary, RLD offers a promising framework for enhancing both traditional heuristics and NN models to solve CO problems. Our code is available at https://github.com/Shengyu-Feng/RLD4CO.

Fonte: arXiv stat.ML

RL • Score 95

Fairness via Independence: A (Conditional) Distance Covariance Framework

arXiv:2412.00720v2 Announce Type: replace-cross Abstract: We explore fairness from a statistical perspective by selectively utilizing either conditional distance covariance or distance covariance statistics as measures to assess the independence between predictions and sensitive attributes. We boost fairness with independence by adding a distance covariance-based penalty to the model's training. Additionally, we present the matrix form of empirical (conditional) distance covariance for parallel calculations to enhance computational efficiency. Theoretically, we provide a proof for the convergence between empirical and population (conditional) distance covariance, establishing necessary guarantees for batch computations. Through experiments conducted on a range of real-world datasets, we have demonstrated that our method effectively bridges the fairness gap in machine learning. Our code is available at \url{https://github.com/liuhaixias1/Fair_dc/}.

Fonte: arXiv stat.ML

RL • Score 95

A Survey on Archetypal Analysis

arXiv:2504.12392v2 Announce Type: replace-cross Abstract: Archetypal analysis (AA) was originally proposed in 1994 by Adele Cutler and Leo Breiman as a computational procedure for extracting distinct aspects, so-called archetypes, from observations, with each observational record approximated as a mixture (i.e., convex combination) of these archetypes. AA thereby provides straightforward, interpretable, and explainable representations for feature extraction and dimensionality reduction, facilitating the understanding of the structure of high-dimensional data and enabling wide applications across the sciences. However, AA also faces challenges, particularly as the associated optimization problem is non-convex. This is the first survey that provides researchers and data mining practitioners with an overview of the methodologies and opportunities that AA offers, surveying the many applications of AA across disparate fields of science, as well as best practices for modeling data with AA and its limitations. The survey concludes by explaining crucial future research directions concerning AA.

Fonte: arXiv stat.ML

RL • Score 95

Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients

arXiv:2512.02342v2 Announce Type: replace-cross Abstract: The stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optimization problems, including deep neural network training. However, extensions of this approach to non-smooth settings remain in their early stages, often relying on interpolation assumptions or requiring knowledge of the optimal solution. In this work, we propose a novel SPS variant, Safeguarded SPS (SPS$_{safe}$), for the stochastic subgradient method, and provide rigorous convergence guarantees for non-smooth convex optimization with no need for strong assumptions. We further incorporate momentum into the update rule, yielding equally tight theoretical results. On non-smooth convex benchmarks, our experiments are consistent with the theoretical predictions on how the safeguard affects the convergence neighborhood. On deep neural networks the proposed step size achieves competitive performance to existing adaptive baselines and exhibits stable behavior across a wide range of problem settings. Moreover, in these experiments, the gradient norms under our step size do not collapse to (near) zero, indicating robustness to vanishing gradients.

Fonte: arXiv stat.ML

NLP/LLMs • Score 95

Sample, Don't Search: Rethinking Test-Time Alignment for Language Models

arXiv:2504.03790v2 Announce Type: replace-cross Abstract: Increasing test-time computation has emerged as a promising direction for improving language model performance, particularly in scenarios where model finetuning is impractical or impossible due to computational constraints or private model weights. However, existing test-time search methods using a reward model (RM) often degrade in quality as compute scales, due to the over-optimization of what are inherently imperfect reward proxies. We introduce QAlign, a new test-time alignment approach. As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt. By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access. We demonstrate the effectiveness of QAlign on mathematical reasoning benchmarks (GSM8K and GSM-Symbolic) using a task-specific RM, showing consistent improvements over existing test-time compute methods like best-of-n and majority voting. Furthermore, when applied with more realistic RMs trained on the Tulu 3 preference dataset, QAlign outperforms direct preference optimization (DPO), best-of-n, majority voting, and weighted majority voting on a diverse range of datasets (GSM8K, MATH500, IFEval, MMLU-Redux, and TruthfulQA). A practical solution to aligning language models at test time using additional computation without degradation, our approach expands the limits of the capability that can be obtained from off-the-shelf language models without further training.

Fonte: arXiv stat.ML

MLOps/Systems • Score 92

Look-Ahead Reasoning on Learning Platforms

arXiv:2511.14745v2 Announce Type: replace-cross Abstract: On many learning platforms, the optimization criteria guiding model training reflect the priorities of the designer rather than those of the individuals they affect. Consequently, users may act strategically to obtain more favorable outcomes. While past work has studied strategic user behavior on learning platforms, the focus has largely been on strategic responses to a deployed model, without considering the behavior of other users. In contrast, look-ahead reasoning takes into account that user actions are coupled, and -- at scale -- impact future predictions. Within this framework, we first formalize level-k thinking, a concept from behavioral economics, where users aim to outsmart their peers by looking one step ahead. We show that, while convergence to an equilibrium is accelerated, the equilibrium remains the same, providing no benefit of higher-level reasoning for individuals in the long run. Then, we focus on collective reasoning, where users take coordinated actions by optimizing through their joint impact on the model. By contrasting collective with selfish behavior, we characterize the benefits and limits of coordination; a new notion of alignment between the learner's and the users' utilities emerges as a key concept. Look-ahead reasoning can be seen as a generalization of algorithmic collective action; we thus offer the first results characterizing the utility trade-offs of coordination when contesting algorithmic systems.

Fonte: arXiv stat.ML

Vision • Score 95

Diagnostic Performance of Universal-Learning Ultrasound AI Across Multiple Organs and Tasks: the UUSIC25 Challenge

arXiv:2512.17279v1 Announce Type: new Abstract: IMPORTANCE: Current ultrasound AI remains fragmented into single-task tools, limiting clinical utility compared to versatile modern ultrasound systems. OBJECTIVE: To evaluate the diagnostic accuracy and efficiency of single general-purpose deep learning models for multi-organ classification and segmentation. DESIGN: The Universal UltraSound Image Challenge 2025 (UUSIC25) involved developing algorithms on 11,644 images (public/private). Evaluation used an independent, multi-center test set of 2,479 images, including data from a center completely unseen during training to assess generalization. OUTCOMES: Diagnostic performance (Dice Similarity Coefficient [DSC]; Area Under the Receiver Operating Characteristic Curve [AUC]) and computational efficiency (inference time, GPU memory). RESULTS: Of 15 valid algorithms, the top model (SMART) achieved a macro-averaged DSC of 0.854 across 5 segmentation tasks and AUC of 0.766 for binary classification. Models showed high capability in segmentation (e.g., fetal head DSC: 0.942) but variability in complex tasks subject to domain shift. Notably, in breast cancer molecular subtyping, the top model's performance dropped from AUC 0.571 (internal) to 0.508 (unseen external center), highlighting generalization challenges. CONCLUSIONS: General-purpose AI models achieve high accuracy and efficiency across multiple tasks using a single architecture. However, performance degradation on unseen data suggests domain generalization is critical for future clinical deployment.

Fonte: arXiv cs.CV

NLP/LLMs • Score 95

AVM: Towards Structure-Preserving Neural Response Modeling in the Visual Cortex Across Stimuli and Individuals

arXiv:2512.16948v1 Announce Type: new Abstract: While deep learning models have shown strong performance in simulating neural responses, they often fail to clearly separate stable visual encoding from condition-specific adaptation, which limits their ability to generalize across stimuli and individuals. We introduce the Adaptive Visual Model (AVM), a structure-preserving framework that enables condition-aware adaptation through modular subnetworks, without modifying the core representation. AVM keeps a Vision Transformer-based encoder frozen to capture consistent visual features, while independently trained modulation paths account for neural response variations driven by stimulus content and subject identity. We evaluate AVM in three experimental settings, including stimulus-level variation, cross-subject generalization, and cross-dataset adaptation, all of which involve structured changes in inputs and individuals. Across two large-scale mouse V1 datasets, AVM outperforms the state-of-the-art V1T model by approximately 2% in predictive correlation, demonstrating robust generalization, interpretable condition-wise modulation, and high architectural efficiency. Specifically, AVM achieves a 9.1% improvement in explained variance (FEVE) under the cross-dataset adaptation setting. These results suggest that AVM provides a unified framework for adaptive neural modeling across biological and experimental conditions, offering a scalable solution under structural constraints. Its design may inform future approaches to cortical modeling in both neuroscience and biologically inspired AI systems.

Fonte: arXiv cs.CV

NLP/LLMs • Score 95

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

arXiv:2512.17260v1 Announce Type: new Abstract: Large language models have recently made significant progress to generate rigorous mathematical proofs. In contrast, utilizing LLMs for theorem proving in formal languages (such as Lean) remains challenging and computationally expensive, particularly when addressing problems at the undergraduate level and beyond. In this work, we present \textbf{Seed-Prover 1.5}, a formal theorem-proving model trained via large-scale agentic reinforcement learning, alongside an efficient test-time scaling (TTS) workflow. Through extensive interactions with Lean and other tools, the model continuously accumulates experience during the RL process, substantially enhancing the capability and efficiency of formal theorem proving. Furthermore, leveraging recent advancements in natural language proving, our TTS workflow efficiently bridges the gap between natural and formal languages. Compared to state-of-the-art methods, Seed-Prover 1.5 achieves superior performance with a smaller compute budget. It solves \textbf{88\% of PutnamBench} (undergraduate-level), \textbf{80\% of Fate-H} (graduate-level), and \textbf{33\% of Fate-X} (PhD-level) problems. Notably, using our system, we solved \textbf{11 out of 12 problems} from Putnam 2025 within 9 hours. Our findings suggest that scaling learning from experience, driven by high-quality formal feedback, holds immense potential for the future of formal mathematical reasoning.

Fonte: arXiv cs.CL

NLP/LLMs • Score 92

Governance-Aware Hybrid Fine-Tuning for Multilingual Large Language Models

arXiv:2512.17344v1 Announce Type: new Abstract: We present a governance-aware hybrid fine-tuning framework for multilingual, low-resource adaptation of large language models. The core algorithm combines gradient-aligned low-rank updates with structured orthogonal transformations through layer-wise mixing and introduces unitary constraints in selected sub-layers to stabilize deep optimization. In tandem with lightweight, label-free data governance steps, including language identification, near-duplicate removal, and quality filtering, the framework targets accuracy, calibration, and cross-language parity under tight compute budgets. Across XNLI and FLORES, the hybrid approach delivers consistent gains over strong PEFT baselines while maintaining directional balance and improving probability calibration, as shown in Tables II and III. It is more resilient to lightweight orthographic variants, as shown in Table IV, and benefits additively from simple governance steps, as shown in Table V. Training footprint measurements indicate modest overhead and a favorable cost-quality frontier, as shown in Table VI and Figure 2. Together, these results show that hybrid and unitary PEFT provide a stable and accessible path to resource-efficient multilingual adaptation when paired with practical data governance.

Fonte: arXiv cs.CL

NLP/LLMs • Score 95

Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion Detection

arXiv:2512.17630v1 Announce Type: new Abstract: This paper introduces a confidence-weighted, credibility-aware ensemble framework for text-based emotion detection, inspired by Condorcet's Jury Theorem (CJT). Unlike conventional ensembles that often rely on homogeneous architectures, our approach combines architecturally diverse small transformer-based large language models (sLLMs) - BERT, RoBERTa, DistilBERT, DeBERTa, and ELECTRA, each fully fine-tuned for emotion classification. To preserve error diversity, we minimize parameter convergence while taking advantage of the unique biases of each model. A dual-weighted voting mechanism integrates both global credibility (validation F1 score) and local confidence (instance-level probability) to dynamically weight model contributions. Experiments on the DAIR-AI dataset demonstrate that our credibility-confidence ensemble achieves a macro F1 score of 93.5 percent, surpassing state-of-the-art benchmarks and significantly outperforming large-scale LLMs, including Falcon, Mistral, Qwen, and Phi, even after task-specific Low-Rank Adaptation (LoRA). With only 595M parameters in total, our small LLMs ensemble proves more parameter-efficient and robust than models up to 7B parameters, establishing that carefully designed ensembles of small, fine-tuned models can outperform much larger LLMs in specialized natural language processing (NLP) tasks such as emotion detection.

Fonte: arXiv cs.CL

NLP/LLMs • Score 95

AutoMetrics: Approximate Human Judgements with Automatically Generated Evaluators

arXiv:2512.17267v1 Announce Type: new Abstract: Evaluating user-facing AI applications remains a central challenge, especially in open-ended domains such as travel planning, clinical note generation, or dialogue. The gold standard is user feedback (e.g., thumbs up/down) or behavioral signals (e.g., retention), but these are often scarce in prototypes and research projects, or too-slow to use for system optimization. We present AutoMetrics, a framework for synthesizing evaluation metrics under low-data constraints. AutoMetrics combines retrieval from MetricBank, a collection of 48 metrics we curate, with automatically generated LLM-as-a-Judge criteria informed by lightweight human feedback. These metrics are composed via regression to maximize correlation with human signal. AutoMetrics takes you from expensive measures to interpretable automatic metrics. Across 5 diverse tasks, AutoMetrics improves Kendall correlation with human ratings by up to 33.4% over LLM-as-a-Judge while requiring fewer than 100 feedback points. We show that AutoMetrics can be used as a proxy reward to equal effect as a verifiable reward. We release the full AutoMetrics toolkit and MetricBank to accelerate adaptive evaluation of LLM applications.

Fonte: arXiv cs.CL

NLP/LLMs • Score 95

Are Vision Language Models Cross-Cultural Theory of Mind Reasoners?

arXiv:2512.17394v1 Announce Type: new Abstract: Theory of Mind (ToM) -- the ability to attribute beliefs, desires, and emotions to others -- is fundamental for human social intelligence, yet remains a major challenge for artificial agents. Existing Vision-Language Models (VLMs) are increasingly applied in socially grounded tasks, but their capacity for cross-cultural ToM reasoning is largely unexplored. In this work, we introduce CulturalToM-VQA, a new evaluation benchmark containing 5095 questions designed to probe ToM reasoning across diverse cultural contexts through visual question answering. The dataset captures culturally grounded cues such as rituals, attire, gestures, and interpersonal dynamics, enabling systematic evaluation of ToM reasoning beyond Western-centric benchmarks. Our dataset is built through a VLM-assisted human-in-the-loop pipeline, where human experts first curate culturally rich images across traditions, rituals, and social interactions; a VLM then assist in generating structured ToM-focused scene descriptions, which are refined into question-answer pairs spanning a taxonomy of six ToM tasks and four graded complexity levels. The resulting dataset covers diverse theory of mind facets such as mental state attribution, false belief reasoning, non-literal communication, social norm violations, perspective coordination, and multi-agent reasoning.

Fonte: arXiv cs.CL

Vision • Score 95

WDFFU-Mamba: A Wavelet-guided Dual-attention Feature Fusion Mamba for Breast Tumor Segmentation in Ultrasound Images

arXiv:2512.17278v1 Announce Type: new Abstract: Breast ultrasound (BUS) image segmentation plays a vital role in assisting clinical diagnosis and early tumor screening. However, challenges such as speckle noise, imaging artifacts, irregular lesion morphology, and blurred boundaries severely hinder accurate segmentation. To address these challenges, this work aims to design a robust and efficient model capable of automatically segmenting breast tumors in BUS images.We propose a novel segmentation network named WDFFU-Mamba, which integrates wavelet-guided enhancement and dual-attention feature fusion within a U-shaped Mamba architecture. A Wavelet-denoised High-Frequency-guided Feature (WHF) module is employed to enhance low-level representations through noise-suppressed high-frequency cues. A Dual Attention Feature Fusion (DAFF) module is also introduced to effectively merge skip-connected and semantic features, improving contextual consistency.Extensive experiments on two public BUS datasets demonstrate that WDFFU-Mamba achieves superior segmentation accuracy, significantly outperforming existing methods in terms of Dice coefficient and 95th percentile Hausdorff Distance (HD95).The combination of wavelet-domain enhancement and attention-based fusion greatly improves both the accuracy and robustness of BUS image segmentation, while maintaining computational efficiency.The proposed WDFFU-Mamba model not only delivers strong segmentation performance but also exhibits desirable generalization ability across datasets, making it a promising solution for real-world clinical applications in breast tumor ultrasound analysis.

Fonte: arXiv cs.CV

MLOps/Systems • Score 93

SDUM: Um Modelo Profundo Desdobrado Escalável para Reconstrução Universal de MRI

O SDUM é um framework universal que combina um reconstrutor baseado em Restormer, um estimador de mapa de sensibilidade de bobina aprendido (CSME) e consistência de dados ponderada ciente de amostragem (SWDC). Ele demonstra um comportamento de escalabilidade semelhante a modelos fundamentais, alcançando resultados de ponta em desafios de reconstrução de MRI sem ajuste fino específico de tarefa.

Fonte: arXiv cs.AI

NLP/LLMs • Score 95

CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency

arXiv:2512.17213v1 Announce Type: new Abstract: Medical Vision-Language Models (VLMs) are prone to hallucinations, compromising clinical reliability. While reinforcement learning methods like Group Relative Policy Optimization (GRPO) offer a low-cost alignment solution, their reliance on sparse, outcome-based rewards inadvertently encourages models to "overthink" -- generating verbose, convoluted, and unverifiable Chain-of-Thought reasoning to justify answers. This focus on outcomes obscures factual errors and poses significant safety risks. To address this, we propose CheXPO-v2, a novel alignment framework that shifts from outcome to process supervision. Our core innovation is a Knowledge Graph Consistency Reward mechanism driven by Entity-Relation Matching. By explicitly parsing reasoning steps into structured "Disease, Relation, Anatomy" triplets, we provide fine-grained supervision that penalizes incoherent logic and hallucinations at the atomic level. Integrating this with a hard-example mining strategy, our approach significantly outperforms GRPO and state-of-the-art models on benchmarks like MIMIC-CXR-VQA. Crucially, CheXPO-v2 achieves new state-of-the-art accuracy using only 5k samples, demonstrating exceptional data efficiency while producing clinically sound and verifiable reasoning. The project source code is publicly available at: https://github.com/ecoxial2007/CheX-Phi4MM.

Fonte: arXiv cs.CV

NLP/LLMs • Score 95

Deep But Reliable: Advancing Multi-turn Reasoning for Thinking with Images

arXiv:2512.17306v1 Announce Type: new Abstract: Recent advances in large Vision-Language Models (VLMs) have exhibited strong reasoning capabilities on complex visual tasks by thinking with images in their Chain-of-Thought (CoT), which is achieved by actively invoking tools to analyze visual inputs rather than merely perceiving them. However, existing models often struggle to reflect on and correct themselves when attempting incorrect reasoning trajectories. To address this limitation, we propose DRIM, a model that enables deep but reliable multi-turn reasoning when thinking with images in its multimodal CoT. Our pipeline comprises three stages: data construction, cold-start SFT and RL. Based on a high-resolution image dataset, we construct high-difficulty and verifiable visual question-answer pairs, where solving each task requires multi-turn tool calls to reach the correct answer. In the SFT stage, we collect tool trajectories as cold-start data, guiding a multi-turn reasoning pattern. In the RL stage, we introduce redundancy-penalized policy optimization, which incentivizes the model to develop a self-reflective reasoning pattern. The basic idea is to impose judgment on reasoning trajectories and penalize those that produce incorrect answers without sufficient multi-scale exploration. Extensive experiments demonstrate that DRIM achieves superior performance on visual understanding benchmarks.

Fonte: arXiv cs.CV

Vision • Score 95

AnyCXR: Segmentação da Anatomia Humana em Radiografias de Tórax em Qualquer Posição de Aquisição Usando Dados Sintéticos Randomizados de Domínio em Múltiplas Etapas com Anotações Imperfeitas e Aprendizado de Regularização de Anotação Conjunta Condicional

A segmentação anatômica robusta de radiografias de tórax (CXRs) continua desafiadora devido à escassez de anotações abrangentes e à variabilidade substancial das condições de aquisição no mundo real. Propomos o AnyCXR, um framework unificado que permite a segmentação multi-orgânica generalizável em ângulos de projeção arbitrários de CXR usando apenas supervisão sintética.

Fonte: arXiv cs.CV

Vision • Score 95

Any-Optical-Model: Um Modelo de Fundação Universal para Sensoriamento Remoto Óptico

Os satélites ópticos, com suas diversas configurações de bandas e distâncias de amostragem no solo, fornecem evidências indispensáveis para tarefas que vão desde a vigilância de ecossistemas até a resposta a emergências. No entanto, discrepâncias significativas na composição de bandas e na resolução espacial entre diferentes sensores ópticos apresentam grandes desafios para os Modelos de Fundação de Sensoriamento Remoto (RSFMs) existentes.

Fonte: arXiv cs.CV

NLP/LLMs • Score 95

Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

arXiv:2512.17206v1 Announce Type: new Abstract: Exploration capacity shapes both inference-time performance and reinforcement learning (RL) training for large (vision-) language models, as stochastic sampling often yields redundant reasoning paths with little high-level diversity. This paper proposes Reasoning Palette, a novel latent-modulation framework that endows the model with a stochastic latent variable for strategic contextualization, guiding its internal planning prior to token generation. This latent context is inferred from the mean-pooled embedding of a question-answer pair via a variational autoencoder (VAE), where each sampled latent potentially encodes a distinct reasoning context. During inference, a sampled latent is decoded into learnable token prefixes and prepended to the input prompt, modulating the model's internal reasoning trajectory. In this way, the model performs internal sampling over reasoning strategies prior to output generation, which shapes the style and structure of the entire response sequence. A brief supervised fine-tuning (SFT) warm-up phase allows the model to adapt to this latent conditioning. Within RL optimization, Reasoning Palette facilitates structured exploration by enabling on-demand injection for diverse reasoning modes, significantly enhancing exploration efficiency and sustained learning capability. Experiments across multiple reasoning benchmarks demonstrate that our method enables interpretable and controllable control over the (vision-) language model's strategic behavior, thereby achieving consistent performance gains over standard RL methods.

Fonte: arXiv cs.CV

Theory/Optimization • Score 88

Spectral Concentration at the Edge of Stability: Information Geometry of Kernel Associative Memory

arXiv:2511.23083v4 Announce Type: replace-cross Abstract: High-capacity kernel Hopfield networks exhibit a \textit{Ridge of Optimization} characterized by extreme stability. While previously linked to \textit{Spectral Concentration}, its origin remains elusive. Here, we analyze the network dynamics on a statistical manifold, revealing that the Ridge corresponds to the Edge of Stability, a critical boundary where the Fisher Information Matrix becomes singular. We demonstrate that the apparent Euclidean force antagonism is a manifestation of \textit{Dual Equilibrium} in the Riemannian space. This unifies learning dynamics and capacity via the Minimum Description Length principle, offering a geometric theory of self-organized criticality.

Fonte: arXiv stat.ML

NLP/LLMs • Score 96

InfoTok: Tokenizador de Vídeo Discreto Adaptativo via Compressão Teórica da Informação

A tokenização discreta de vídeo precisa e eficiente é essencial para o processamento de longas sequências de vídeo. Este artigo apresenta o InfoTok, um framework fundamentado para tokenização adaptativa de vídeo, provando que métodos de treinamento existentes são subótimos e introduzindo um novo algoritmo baseado em ELBO que se aproxima da optimalidade teórica.

Fonte: arXiv cs.AI

RL • Score 92

Text-Conditioned Background Generation for Editable Multi-Layer Documents

arXiv:2512.17151v1 Announce Type: new Abstract: We present a framework for document-centric background generation with multi-page editing and thematic continuity. To ensure text regions remain readable, we employ a \emph{latent masking} formulation that softly attenuates updates in the diffusion space, inspired by smooth barrier functions in physics and numerical optimization. In addition, we introduce \emph{Automated Readability Optimization (ARO)}, which automatically places semi-transparent, rounded backing shapes behind text regions. ARO determines the minimal opacity needed to satisfy perceptual contrast standards (WCAG 2.2) relative to the underlying background, ensuring readability while maintaining aesthetic harmony without human intervention. Multi-page consistency is maintained through a summarization-and-instruction process, where each page is distilled into a compact representation that recursively guides subsequent generations. This design reflects how humans build continuity by retaining prior context, ensuring that visual motifs evolve coherently across an entire document. Our method further treats a document as a structured composition in which text, figures, and backgrounds are preserved or regenerated as separate layers, allowing targeted background editing without compromising readability. Finally, user-provided prompts allow stylistic adjustments in color and texture, balancing automated consistency with flexible customization. Our training-free framework produces visually coherent, text-preserving, and thematically aligned documents, bridging generative modeling with natural design workflows.

Fonte: arXiv cs.CV

Vision • Score 95

ABE-CLIP: Training-Free Attribute Binding Enhancement for Compositional Image-Text Matching

arXiv:2512.17178v1 Announce Type: new Abstract: Contrastive Language-Image Pretraining (CLIP) has achieved remarkable performance in various multimodal tasks. However, it still struggles with compositional image-text matching, particularly in accurately associating objects with their corresponding attributes, because its inherent global representation often overlooks fine-grained semantics for attribute binding. Existing methods often require additional training or extensive hard negative sampling, yet they frequently show limited generalization to novel compositional concepts and fail to fundamentally address the drawbacks of global representations. In this paper, we propose ABE-CLIP, a novel training-free Attribute Binding Enhancement method designed to strengthen attribute-object binding in CLIP-like models. Specifically, we employ a Semantic Refinement Mechanism to refine token embeddings for both object and attribute phrases in the text, thereby mitigating attribute confusion and improving semantic precision. We further introduce a Local Token-Patch Alignment strategy that computes similarity scores between refined textual tokens and their most relevant image patches. By aggregating localized similarity scores, ABE-CLIP computes the final image-text similarity. Experiments on multiple datasets demonstrate that ABE-CLIP significantly improves attribute-object binding performance, even surpassing methods that require extensive training.

Fonte: arXiv cs.CV

NLP/LLMs • Score 95

Mitty: Geração de Vídeo de Humano para Robô Baseada em Difusão

Aprender diretamente com vídeos de demonstração humana é um marco importante para o aprendizado de robôs escalável e generalizável. Apresentamos Mitty, um Diffusion Transformer que possibilita o In-Context Learning de vídeo para geração de vídeo Human2Robot, utilizando um modelo de difusão de vídeo pré-treinado e sem a necessidade de rótulos de ação.

Fonte: arXiv cs.CV

NLP/LLMs • Score 95

Globally Optimal Solution to the Generalized Relative Pose Estimation Problem using Affine Correspondences

arXiv:2512.17188v1 Announce Type: new Abstract: Mobile devices equipped with a multi-camera system and an inertial measurement unit (IMU) are widely used nowadays, such as self-driving cars. The task of relative pose estimation using visual and inertial information has important applications in various fields. To improve the accuracy of relative pose estimation of multi-camera systems, we propose a globally optimal solver using affine correspondences to estimate the generalized relative pose with a known vertical direction. First, a cost function about the relative rotation angle is established after decoupling the rotation matrix and translation vector, which minimizes the algebraic error of geometric constraints from affine correspondences. Then, the global optimization problem is converted into two polynomials with two unknowns based on the characteristic equation and its first derivative is zero. Finally, the relative rotation angle can be solved using the polynomial eigenvalue solver, and the translation vector can be obtained from the eigenvector. Besides, a new linear solution is proposed when the relative rotation is small. The proposed solver is evaluated on synthetic data and real-world datasets. The experiment results demonstrate that our method outperforms comparable state-of-the-art methods in accuracy.

Fonte: arXiv cs.CV

RL • Score 96

Navegando Expansões Taxonômicas de Conjuntos de Entidades Impulsionadas por Bases de Conhecimento

Reconhecer semelhanças entre entidades é central tanto para a cognição humana quanto para a inteligência computacional. A Expansão de Conjuntos de Entidades é uma tarefa que visa identificar entidades adicionais que compartilham propriedades semânticas relevantes. Um novo framework baseado em lógica introduz o conceito de um grafo de expansão, que apoia expansões taxonômicas de conjuntos de entidades, embora a grandeza desses grafos possa tornar a materialização completa impraticável.

Fonte: arXiv cs.AI