基本信息

窦文生,研究员,博士生导师,中国科学院软件研究所

电子邮件: wensheng@iscas.ac.cn

通信地址: 北京市中关村南四街4号5号楼6层

邮政编码: 100190

个人主页https://wsdou.github.io/

研究领域

    我长期从事基础软件质量保障的理论、方法与技术研究。研究方向聚焦于两大交叉领域:面向基础软件的智能化软件工程(如AI4SE、LLM4SE、SE4AI等),以及智能化基础软件(如“AI+”基础软件、“Agent+”基础软件)。针对数据库系统、分布式系统、智能系统、具身智能系统等关键基础软件,提出了一系列高效的质量保障技术,相关成果已在TSE、TOSEM、ICSE、FSE、ASE、ISSTA、AAAI、VLDB、ICDE、EuroSys等高水平期刊与会议上发表论文70余篇(其中CCF A类国际期刊与会议论文40余篇),并多次获得ACM SIGSOFT杰出论文奖、软件学报高影响力论文奖。研究成果已在华为、阿里、腾讯、达梦、浪潮、金山等多家头部基础软件企业中获得实际应用,为满足重要领域的基础软件国产化替代提供了关键技术支撑。

    我入选中国科学院青年创新促进会优秀会员、中国科学院特聘研究员、中国科学院软件研究所杰出青年、微软亚洲研究院“铸星计划”学者,曾获中国计算机学会优秀博士学位论文奖、NASAC青年软件创新奖、中创软件人才奖等荣誉。受邀担任JCST青年编委,以及TOSEM、TOPLAS、TSE、TPDS等顶级国际期刊审稿人,并多次担任ICSE、FSE、ASE、ISSTA等顶级国际会议程序委员会委员。

招生信息

    研究组长期招收对科研有浓厚兴趣的博士生、硕士生、实习生,致力于“AI+软件工程+基础软件”的交叉研究,重点包括智能化软件工程(AI、大语言模型如何赋能软件工程的自动化与智能化)、智能化基础软件(如何利用AI增强数据库、分布式系统等基础软件的易用性、性能、可靠性等)、智能系统可信保障(AI原生基础软件本身的安全、可靠、可信)等方向。欢迎有意者在联系或报考前,阅读研究组近三年发表的论文,以初步了解我们的研究方向与内容。

    我认为培养学生的独立科研能力是导师的首要责任。我们致力于营造开放、协作的学术环境,与学生共同探索前沿科学问题,助力每位学生成长为能独立解决重要科学问题的研究者。在这一理念引导下,研究组学生表现突出,多数学生能在入学一年内即可完成并投出顶级期刊或会议论文,多名毕业生获得CCF优秀博士学位论文奖、中国科学院优秀博士学位论文奖、中国科学院院长奖、三好学生标兵、国家奖学金等荣誉(详情参考我的个人主页)。这也意味着,如果你选择加入,需要对科研怀有真诚的热爱,并愿意为之投入持续而专注的努力。

    研究组内名额有限,如有意向建议尽早通过邮件联系(建议大二下学期或大三上学期),我们期望申请人具备以下基本素质:(1)热爱编程,具备较强的动手实践能力;(2)具备团队协作意识,乐于共同推进科研工作;(3)态度积极主动,能主动承担并推进科研工作;(4)富有探索精神,勇于尝试并解决有难度的研究问题。

工作经历

   
工作简历
2022-10~现在, 中国科学院软件研究所, 研究员
2017-10~2018-04,微软亚洲研究院, 铸星计划访问学者
2016-08~2022-10,中国科学院软件研究所, 副研究员
2015-07~2016-07,中国科学院软件研究所, 助理研究员
2013-12~2014-08,美国俄亥俄州立大学, 访问学者
2013-02~2013-07,香港科技大学, 访问学者
2009-07~2015-06,中国科学院软件研究所, 助理研究员
社会兼职
2021-03-30-今,IJSI编委, 期刊编委
2020-12-31-今,JCST青年编委, 期刊编委
2019-11-23-今,CCF软工专委会委员, 执行委员

教授课程

高级软件工程
软件工程前沿
网络分布式计算

专利与奖励

   
奖励信息
(1) 中创软件人才奖, , 国家级, 2024
(2) 《软件学报》2022年高影响力论文, , 其他, 2023
(3) 中国计算机学会NASAC青年软件创新奖, , 专项, 2023
(4) 中国科学院青年创新促进会优秀会员, , 院级, 2022
(5) 《软件学报》2021年高影响力论文, , 其他, 2022
(6) Microsoft Research Asia Increasing Productivity Award, 其他, 2019
(7) 中国科学院青年创新促进会会员, , 院级, 2018
(8) ESEC/FSE 2018杰出论文奖, 其他, 2018
(9) 中科院软件所杰出青年, 研究所(学校), 2018
(10) 微软亚洲研究院铸星计划学者, , 其他, 2017
(11) 中国计算机学会优秀博士学位论文奖, , 专项, 2016
(12) 中国科学院大学优秀毕业生, 研究所(学校), 2015

发表论文


发表论文
(1) Systematically Cover SQL Syntactic Structures via k-Sequence, 35th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2026, 第 8 作者  通讯作者
(2) Efficiently Testing Distributed Systems via Abstract State Space Prioritization, IEEE Transactions on Software Engineering (TSE), 2026, 第 3 作者  通讯作者
(3) RISE: Rule-Driven SQL Dialect Translation viaQuery Reduction, 48th IEEE/ACM International Conference on Software Engineering (ICSE), 2026, 第 3 作者  通讯作者
(4) PATCH: Empowering Large Language Model with Programmer-Intent Guidance and Collaborative-Behavior Simulation for Automatic Bug Fixing, ACM Transactions on Software Engineering and Methodology (TOSEM), 2026, 第 7 作者
(5) CITYWALK: Enhancing LLM-Based C++ Unit Test Generation via Project-Dependency Awareness and Language-Specific Knowledge, ACM Transactions on Software Engineering and Methodology (TOSEM), 2025, 第 4 作者
(6) Simple Testing Can Expose Most Critical Transaction Bugs: Understanding and Detecting Write-Specific Serializability Violations in Database Systems, 51st International Conference on Very Large Data Bases (VLDB), 2025, 第 2 作者  通讯作者
(7) Model Checking Guided Incremental Testing for Distributed Systems, 34th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2025, 第 3 作者  通讯作者
(8) BridgeGC: An Efficient Cross-Level Garbage Collector for Big Data Frameworks, ACM Transactions on Architecture and Code Optimization (TACO), 2025, 第 4 作者
(9) Evaluating Garbage Collection Performance Across Managed Language Runtimes, 47th IEEE/ACM International Conference on Software Engineering (ICSE), 2025, 第 2 作者  通讯作者
(10) Training Deep Neural Networks with Virtual Smoothing Classes, 39th AAAI Conference on Artificial Intelligence (AAAI), 2025, 第 3 作者
(11) Detecting Schema-Related Logic Bugs in Relational DBMSs via Equivalent Database Construction, 51st International Conference on Very Large Data Bases (VLDB), 2025, 第 2 作者  通讯作者
(12) Proving Cypher Query Equivalence, 41st IEEE International Conference on Data Engineering (ICDE), 2025, 第 2 作者  通讯作者
(13) Detecting Isolation Anomalies in Relational DBMSs, 34th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2025, 第 3 作者  通讯作者
(14) FaultFuzz: A Coverage Guided Fault Injection Tool for Distributed Systems, 46th IEEE/ACM International Conference on Software Engineering (ICSE Demo), 2024, 第 5 作者
(15) 基于TLA+形式化规约的Raft系统测试, 软件学报, 2024, 第 2 作者  通讯作者
(16) Understanding Transaction Bugs in Database Systems, 46th IEEE/ACM International Conference on Software Engineering (ICSE), 2024, 第 2 作者  通讯作者
(17) Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database Construction, 50th International Conference on Very Large Data Bases (VLDB), 2024, 第 2 作者  通讯作者
(18) Differential Optimization Testing of Gremlin-Based Graph Database Systems, 17th IEEE International Conference on Software Testing, Verification and Validation (ICST), 2024, 第 2 作者  通讯作者
(19) 谛听:面向鲁棒分布外样本检测的半监督对抗训练方法, DiTing:Semi-supervised Adversarial Training Approach for Robust Out-of-distribution Detection, 软件学报, 2024, 第 2 作者
(20) Testing Gremlin-Based Graph Database Systems via Query Disassembling, 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis(ISSTA) (CCF A), 2024, 第 2 作者  通讯作者
(21) 共用数据导向的分布式系统失效恢复缺陷检测, Common Data Guided Crash Recovery Bug Detection for Distributed Systems, 软件学报, 2023, 第 4 作者
(22) Model Checking Guided Testing for Distributed Systems, 18th European Conference on Computer Systems (EuroSys), 2023, 第 2 作者  通讯作者
(23) Coverage Guided Fault Injection for Cloud Systems, 45th IEEE/ACM International Conference on Software Engineering (ICSE), 2023, 第 2 作者  通讯作者
(24) Fixing robust out-of-distribution detection for deep neural networks, 34th IEEE International Symposium on Software Reliability Engineering (ISSRE), 2023, 第 3 作者
(25) Self-supervised log parsing using semantic contribution difference, Journal of Systems and Software, 2023, 第 4 作者
(26) Detecting Flash Loan Based Price Manipulation Attacks in Ethereum, 43rd IEEE International Conference on Distributed Computing Systems (ICDCS), 2023, 第 3 作者  通讯作者
(27) Randomized Differential Testing of RDF Stores, 45th IEEE/ACM International Conference on Software Engineering (ICSE Demo), 2023, 第 4 作者
(28) Detecting Isolation Bugs via Transaction Oracle Construction, 45th IEEE/ACM International Conference on Software Engineering (ICSE) (CCF A), 2023, 第 1 作者
(29) Testing Database Systems via Differential Query Execution, 45th IEEE/ACM International Conference on Software Engineering (ICSE) (CCF A), 2023, 第 2 作者  通讯作者
(30) Common Data Guided Crash Injection for Cloud Systems, 44th ACM/IEEE International Conference on Software Engineering (ICSE Demo), 2022, 第 4 作者  通讯作者
(31) Differentially Testing Database Transactions for Fun and Profit, 37th IEEE/ACM International Conference on Automated Software Engineering (ASE) (CCF A), 2022, 第 2 作者  通讯作者
(32) Understanding Device Integration Bugs in Smart Home System, 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2022, 第 4 作者  通讯作者
(33) Finding Bugs in Gremlin-Based Graph Database Systems via Randomized Differential Testing, 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA) (CCF A), 2022, 第 2 作者  通讯作者
(34) DisTA: Generic Dynamic Taint Tracking for Java-Based Distributed Systems, 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2022, 第 3 作者  通讯作者
(35) Characterizing and Detecting Bugs in WeChat Mini-Programs, 44th ACM/IEEE International Conference on Software Engineering (ICSE), 2022, 第 4 作者  通讯作者
(36) Knowledge-Based Environment Dependency Inference for Python Programs, 44th ACM/IEEE International Conference on Software Engineering (ICSE), 2022, 第 3 作者
(37) The Performance of Selfish Mining in GHOST, 20th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2021, 第 2 作者  通讯作者
(38) The Impact Analysis of Multiple Miners and Propagation Delay on Selfish Mining, 45th IEEE Computers, Software, and Applications Conference (COMPSAC), 2021, 第 2 作者
(39) Semantic Table Structure Identification in Spreadsheets, 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2021, 第 4 作者  通讯作者
(40) Systemizing Interprocedural Static Analysis of Large-scale Systems Code with Graspan, ACM Transactions on Architecture and Code Optimization (TOCS), 2021, 第 7 作者
(41) 区块链共识协议综述, 软件学报, 2021, 第 2 作者
(42) DeepCon: Contribution Coverage Testing for Deep Learning Systems, 28th International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2021, 第 2 作者  通讯作者
(43) Race Detection for Event-Driven Node.js Applications, 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021, 第 2 作者  通讯作者
(44) DistStream: An Order-Aware Distributed Framework for Online-Offline Stream Clustering Algorithms, 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) (CCF B), 2020, 第 5 作者
(45) CoFI: Consistency-Guided Fault Injection for Cloud Systems, 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2020, 第 2 作者  通讯作者
(46) Learning to Detect Table Clones in Spreadsheets, 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2020, 第 2 作者  通讯作者
(47) Detecting Cache-Related Bugs in Spark Applications, 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA) (CCF A), 2020, 第 5 作者
(48) Understanding Exception-Related Bugs in Large-Scale Cloud Systems, 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019, 第 2 作者  通讯作者
(49) 程序分析研究进展, Recent Progress in Program Analysis, 软件学报, 2019, 第 8 作者
(50) An Experimental Evaluation of Garbage Collectors on Big Data Applications, Proceedings of the VLDB 2019 PhD Workshop, co-located with the 45th International Conference on Very Large Databases (VLDB 2019) (CCF A), 2019, 第 3 作者
(51) Detecting Atomicity Violations for Event-Driven Node.js Applications, 41st IEEE/ACM International Conference on Software Engineering (ICSE), 2019, 第 2 作者  通讯作者
(52) Detecting Atomicity Violations for Event-Driven Node.js Applications, 41st ACM/IEEE International Conference on Software Engineering (ICSE), 2019, 第 2 作者  通讯作者
(53) Expandable Group Identification in Spreadsheets, 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2018, 第 1 作者  通讯作者
(54) Detecting Faulty Empty Cells in Spreadsheets, 25th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2018, 第 3 作者  通讯作者
(55) An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems, 26th ACM Joint Meeting on European Software Engineering Conference (ESEC) / Symposium on the Foundations of Software Engineering (FSE), 2018, 第 2 作者  通讯作者
(56) Characterizing and Diagnosing out of Memory Errors in MapReduce Applications, Journal of Systems and Software (JSS) (CCF B), 2018, 第 2 作者  通讯作者
(57) Context-Based Event Trace Reduction in Client-Side JavaScript Applications, 11th IEEE International Conference on Software Testing, Verification and Validation (ICST), 2018, 第 2 作者  通讯作者
(58) Rewriting High-Level Spreadsheet Structures into Higher-Order Functional Programs, 20th International Symposium on Practical Aspects of Declarative Languages (PADL), 2018, 第 2 作者
(59) JSTrace: Fast reproducing web application errors, Journal of Systems and Software, 2018, 第 2 作者
(60) How Are Spreadsheet Templates Used in Practice: A Case Study on Enron, 26th ACM Joint Meeting on European Software Engineering Conference (ESEC) / Symposium on the Foundations of Software Engineering (FSE), 2018, 第 2 作者  通讯作者
(61) A Hierarchical Categorization Approach for Configuration Management Modules, 41st IEEE Annual Computer Software and Applications Conference (COMPSAC), 2017, 第 3 作者
(62) A Hierarchical Categorization Approach for System Operation Services, 24th IEEE International Conference on Web Services (ICWS), 2017, 第 4 作者
(63) Mining API Type Specifications for JavaScript, 24th Asia-Pacific Software Engineering Conference (APSEC), 2017, 第 2 作者
(64) A Comprehensive Study on Real World Concurrency Bugs in Node.js, 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2017, 第 2 作者  通讯作者
(65) CACheck: Detecting and Repairing Cell Arrays in Spreadsheets, IEEE Transactions on Software Engineering (TSE), 2017, 第 1 作者  通讯作者
(66) SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering, 14th IEEE/ACM International Conference on Mining Software Repositories (MSR), 2017, 第 2 作者  通讯作者
(67) VEnron: A Versioned Spreadsheet Corpus and Related Evolution Analysis, 38th International Conference on Software Engineering (ICSE SEIP), 2016, 第 1 作者
(68) Detecting Table Clones and Smells in Spreadsheets, 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), 2016, 第 1 作者  通讯作者
(69) Experience Report: A Characteristic Study on Out of Memory Errors in Distributed Data-Parallel Applications, ISSRE (CCF B), 2015, 第 2 作者  通讯作者
(70) Towards Web Application Mobilization via Efficient Web Control Extraction, 7th Asia-Pacific Symposium on Internetware (Internetware), 2015, 第 2 作者
(71) Fast Reproducing Web Application Errors, 26th IEEE International Symposium on Software Reliability Engineering (ISSRE), 2015, 第 2 作者  通讯作者
(72) Discovering User-Defined Event Handlers in Presence of JavaScript Libraries, 22nd Asia-Pacific Software Engineering Conference (APSEC), 2015, 第 2 作者  通讯作者
(73) Is Spreadsheet Ambiguity Harmful? Detecting and Repairing Spreadsheet Smells due to Ambiguous Computation, 36th International Conference on Software Engineering (ICSE), 2014, 第 1 作者  通讯作者
(74) A highly concurrent process virtual machine based on event-driven process execution model, 9th IEEE International Conference on E-Business Engineering (ICEBE), 2012, 第 4 作者
(75) 面向协作的软件开发环境及其构造方法, Collaborative Software Development Environment and Its Construction Method, 计算机科学与探索, 2011, 第 1 作者
(76) 基于状态方面的Web服务动态替换, Dynamic Substitution of Web Services through Stateful Aspect Extension, 计算机科学, 2009, 第 1 作者