Research Directions

  • Kronecker hull: 3D network shape in (a,b,d) space

    Interpretable / Spectral Network Representations

    New network representations that are (I) easy-to-visualize and (II) interpretable (i.e., structurally-informative).

    Traditionally, a network is represented by an adjacency matrix, which captures the nodes connected in the network. Adjacency matrices can be massive even for sparse large graphs, are not interpretable (e.g., not directly capturing complex relationships such as paths or cuts), and are hard to visualize, appearing as ``hairballs"; dense tangled structures of nodes and edges often carrying no insights. To address these challenges, we have developed new network representations that are (I) easy-to-visualize and (II) interpretable (i.e., structurally-informative). See these examples (Spectral Zoo — KDD'20, Spectral Paths — KDD'22, and Network Shapes — ICDM'18) and their applications in network identification & authentication (also TKDE'22) and in network robustness assessment (ICKG'22). The WebShapes demo (WSDM'20) shows how these spectral representations enable 3D network visualization.

  • Higher-order network / hypergraph illustration

    Higher-Order Networks

    Beyond pairwise edges: representing, learning from, and predicting links in networks where interactions naturally involve groups of nodes (hyperedges).

    Many real-world systems — co-authorship, group conversations, biochemical reactions, multi-party transactions — involve interactions among groups of entities, not just pairs. We study how to faithfully represent and learn from such higher-order networks.

    Our survey in SIGKDD Explorations (2024) reviews higher-order network representations and learning. We introduce spectral-moment representations of higher-order networks (PAKDD'25), exploit cross-order patterns for link prediction in higher-order networks (ICDMW'22), and have a dedicated common-neighbor approach for link prediction in CIKM'20. This direction was the focus of Hao Tian's 2024 dissertation, Exploring Higher-order Networks.

  • Noise-enhanced signal illustration

    Noise-Enhanced Learning & Network Science

    When can adding noise — rather than removing it — actually improve learning, link prediction, and community detection?

    Conventional wisdom treats noise as something to denoise away. We take the opposite view, building on stochastic-resonance ideas from physics: in many problems, carefully injected noise can improve learning algorithms, especially when data is limited or models are overparameterized.

    Our survey on harnessing the power of noise (2024) maps the techniques and applications. Specific results include noise-enhanced community detection (Hypertext'20, Best Paper Candidate) and noise-enhanced unsupervised link prediction (PAKDD'21). We have given tutorials on this material at SDM'22 and TheWebConf'23. The line of work was the centerpiece of Reyhaneh Abdolazimi's 2024 dissertation, Noise-Enhanced Network Science.

  • Graph sparsification: removing redundant edges

    Graph Neural Networks: Sparsification, Editing, and Robustness

    Editing the graph — sparsifying it, adversarially perturbing it, or augmenting it — to make graph neural networks faster, more robust, and more interpretable.

    Graph neural networks (GNNs) inherit whatever structural noise lives in the input graph: redundant edges, adversarial perturbations, distributional shift. We study a family of techniques that edit the graph itself — sparsifying, augmenting, or attacking it — to improve downstream learning.

    Highlights include SGCN: a graph sparsifier based on graph convolutional networks (PAKDD'20) and its extended JDSA journal version; semi-supervised graph ultra-sparsification via reweighted ℓ1 optimization (ICASSP'23); and AdverSparse: an adversarial-attack framework for spatio-temporal GNNs (ICASSP'22). This program of work underlies Jiayu Li's 2024 dissertation, Enhancing Graph Neural Networks by Editing Graphs.

  • Hate speech context, culture, and multimodality

    Hate Speech & Hateful Memes: Multimodal, Cultural, Contextual

    Going beyond surface text to understand the cultural context, presupposed claims, and multimodal cues that make hate hard to detect.

    Hate speech detection systems are routinely brittle: they miss hate that is implied rather than stated, conflate dialect with toxicity, and fail to transfer across cultural contexts. Our recent work tackles each of these failure modes.

    For hateful memes, we unpack the presupposed context and false claims that text-only systems miss (2025). For text, we propose hate-subspace modeling for culture-aware hate speech detection (2025) — recognizing that what counts as hate depends on the speech community. The research has started based on Weibin Cai's MS Thesis: Harnessing LLMs to Detect Hate Speech (2025).

  • LLM memory and cognitive processes

    LLMs Through the Lens of Cognitive Psychology

    Borrowing methods from cognitive psychology to characterize how large language models remember, forget, and generalize.

    Do large language models exhibit the classical memory effects studied for decades in human cognition — the list-length effect, list-strength effect, fan effect, and the other "sins" of memory? Or do their failures follow an entirely different structure?

    Our 2025 paper "Analyzing Memory Effects in Large Language Models through the lens of Cognitive Psychology" systematically tests seven classical memory phenomena in state-of-the-art LLMs using paradigms drawn from psychological research, comparing human and model behavior side-by-side. The work is part of a broader effort to evaluate AI systems with the same care we apply to human subjects.

  • Misinformation and fake news headline illustration

    Misinformation, Disinformation, and "Fake News" Research

    Understanding and characterizing fake news and misinformation, and designing techniques to detect it — even with limited ground truth, limited text, and unknown intent.

    A summary of fake news research can be obtained through our CSUR survey. Our work spans detecting fake news using content, link/network information, and early detection theories. We have built multimodal news credibility datasets such as ReCOVery (CIKM'20) for COVID-19 news, and Chinese-language resources such as CHECKED.

    Recent work pushes detection toward the realistic regime of limited information: our 2025 SIGKDD Explorations paper "Is Less Really More? Fake News Detection with Limited Information" studies what is recoverable when text and labels are scarce, and our HERO model learns the hierarchical linguistic style of fake news, drawing on psychological theories of how deception manifests in writing. We also introduced the first techniques to assess the intent of fake-news spreaders (TheWebConf'22), and we are now investigating AI-generated fake news across multiple domains. For more, see our KDD/WSDM Tutorials here.

  • Mining across social media sites

    Mining across Sites

    Users are often active on multiple social media sites. To systematically study users, we need their information on all sites.

    To mine across social media sites, we particularly focus on two specific problems. First, how does user behavior vary across sites (e.g., difference between LinkedIn Friends and Facebook Friends). In addition to designing new techniques, we investigate means to scale and adapt traditional models that analyze user behavior for a single site to multiple sites. For recent results on this research question, see my papers in Information Fusion'16 and ICWSM'14 and this book chapter. Second, I study user behaviors that are only observed across sites. An example includes our study on user migrations across sites.

  • Graph isomorphism: matching users across sites

    Identifying Users across Sites

    Investigating means to identify the same user across social media sites, allowing to understand users online comprehensively.

    I investigate identifying the same user across multiple sites using link (friendships) [TKDD' 15] and content information [ICWSM'09, KDD' 13]. User identification using link information is closely related to the graph isomorphism problem... .

  • Analyzing human behavior online

    Analyzing Human Behavior Using Online Traces

    Realistically model, predict, or mine human behavior.

    My research has investigated means to realistically analyze human behavior online by focusing on ways to exploit information redundancies generated by user behavior. The methodology has been used to identify sarcasm on Twitter, to identify users across sites, among other behaviors. For more on the topic see this article or this textbook chapter. As a by-product, my research on human behavior modeling has had implication in information verification, privacy and security.

  • Evaluation without ground truth

    Evaluation in Social Media Research

    With no face-to-face access to users on social media, how can we guarantee that the patterns that we identify online represents the true intentions of online users?

    In data mining terms, ground truth is rarely available online. I recently started to investigate this problem and identified some ways to tackle the problem. For a succinct review of the topic see my recent Communciations of the ACM (CACM) paper on this issue.

  • Mining with minimum information

    Mining with Absolute Minimum Information

    What is the minimum information required to perform data mining tasks on social media?

    I have looked at how to utilize minimum information to identify users, detect malicious users, or to recommend friends on social media sites with high accuracy. As these methods utilize only minimum information, they scale easily to millions of users. Recently, I have been investigating theoretical limits of using minimum information.

  • Privacy and information theory

    Theoretical and Empirical Limits of Privacy

    How much user privacy is violated by mining user's content?

    I have recently investigated the balance between privacy and mining user-generated content by connecting ideas from complexity theory, specifically Kolmogrov complexity. See this paper for some (very!) preliminary results.

  • Sentiment and emotion in social networks

    Pyschological and Affective States of Online Users (e.g., sentiments and emotions)

    nderstanding the role emotions play in social interactions has been a central research question in the social sciences. However, the challenge of obtaining large-scale data on human emotions has left the most fundamental questions on emotions less explored.

    Previous research has shown that human sentiment and/or mental state depends on those of friends and family. I have investigated how sentiment and information propagates in large scale networks. See some recent results in our CIKM and ICDM papers and an older paper here.

  • Crisis and disaster management from online data

    Online Crisis and Disaster Management

    How can we identify areas impacted by natural disasters and provide assistance to individuals impacted by natural disasters using online data?

    My research has focused on (1) online means to map areas impacted by natural disasters in real-time [ICDM'15], (2) identifying relevant users that provide most useful information in case of crises [HT 2014], and (3) systematic approaches to crowdsource user-generated content in case of disasters [CMOT'12].