MOHAMMED.
01Home02Work03Research04About

Research / 03

Preprints, papers,
and ongoing work.

Three preprints spanning interpretable clinical NLP, mechanistic safety of tool-using LLMs, and few-shot fault diagnosis under data scarcity. Click any thumbnail or title to open the full PDF.

01 / Themes

/01

Healthcare AI

Interpretable clinical NLP, concept-grounded diagnosis.

/02

LLM Safety

Channel-specific vulnerability, mechanistic interpretability.

/03

Applied Generative ML

Few-shot diagnosis, augmentation under data scarcity.

02 / Preprints (3)
ShifaMind: A Multiplicative Concept Bottleneck for Interpretable ICD-10 Coding preview
Preprint2025

Healthcare AI · Interpretability

ShifaMind: A Multiplicative Concept Bottleneck for Interpretable ICD-10 Coding

Mohammed Sameer Syed, Xuan LuUniversity of Arizona

Automated ICD-10 coding from clinical discharge summaries requires models that are both accurate on long-tailed multi-label classification tasks and interpretable to clinicians. We present ShifaMind, a concept-grounded architecture built around a Multiplicative Concept Bottleneck (MCB), which changes the form, rather than the width, of the bottleneck. Instead of projecting through a narrow concept layer, ShifaMind uses a learned multiplicative gate over a concept-grounded representation while retaining a scalar concept interface for inspection. On MIMIC-IV top-50 ICD-10 coding, ShifaMind achieves performance competitive with the strongest baseline LAAT across F1, AUC, and ranking metrics, while outperforming five additional ICD-coding baselines and providing concept-mediated explanations.

0.712

Macro-F1

MIMIC-IV top-50

4.3×

over Vanilla CBM

0.704

CSTPR

Concept BottleneckClinical NLPICD-10MIMIC-IVInterpretability
Read PDFDownloadPreprint #01
The Safety Asymmetry Score: Channel-Specific Vulnerability in Tool-Using Language Models preview
Preprint2025

LLM Safety · Mechanistic Interpretability

The Safety Asymmetry Score: Channel-Specific Vulnerability in Tool-Using Language Models

Mohammed Sameer SyedUniversity of Arizona

Tool-using language models face a larger attack surface than chatbots: adversarial content can arrive in a user's message but also in tool descriptions, tool outputs, or cross-tool instructions. We introduce the Safety Asymmetry Score (SAS), a model's attack success rate on the tool channel minus its rate on the chat channel, measured over matched-payload pairs that hold the malicious instruction byte-identical across channels. Across five production LLMs and 98 cases, two agent-native models carry SAS ≈ +23.5 pp while three general-purpose models average −5.6 pp, a +29.1 pp gap driven by tool poisoning. On Llama 3.3 70B, causal activation patching at layers 48 and 64 localises a representation that is necessary and sufficient yet encoded non-linearly.

+29.1 pp

Group SAS gap

5 / 98

Models · cases

ρ = 0.70

vs MCPTox

LLM SafetyTool UseMCPActivation PatchingLlama 3.3
Read PDFDownloadPreprint #02
SpectralGAN-Augmented Transformer Neural Network for Power Transformer Winding Fault Diagnosis via Frequency Response Analysis preview
Submitted2025

Power Systems · Generative Models

SpectralGAN-Augmented Transformer Neural Network for Power Transformer Winding Fault Diagnosis via Frequency Response Analysis

Mohammed Sameer Syed, Mohammed Sohail SyedIEEE Transactions on Power DeliveryDepartment of Electrical Engineering

Accurate classification of power transformer winding deformation faults from frequency response analysis (FRA) measurements is constrained by the fundamental scarcity of labelled fault data. We present a two-stage diagnostic pipeline: SpectralGAN, a conditional WGAN-GP with spectral normalisation on every linear layer, synthesises realistic 48-dimensional FRA indicator vectors from only 19 training samples per fold; FRATransformer, a lightweight multi-head self-attention network, classifies the mixed corpus of real, jittered, and synthetic samples. Under strict Leave-One-Out Cross-Validation on a 20-sample real dataset spanning healthy, axial displacement, and radial deformation classes, the pipeline achieves 80.0% accuracy and macro F1 = 0.800.

80.0%

Accuracy

LOOCV

+10 pp

over SVM baseline

13.7 min

20-fold runtime

WGAN-GPSpectral NormalisationSelf-AttentionFRAFew-Shot
Read PDFDownloadPreprint #03