基于合作者网络社区发现的学科主题分析——以国际统计学期刊为例

张妍; 潘蕊; 方匡南

基于合作者网络社区发现的学科主题分析——以国际统计学期刊为例

Subject Topic Analysis through Community Detection in Coauthorship Network—Taking International Statistical Journals as an Example

摘要

摘要: 随着大科学时代的到来，科研合作现象越来越普遍。为了了解当前主流研究主题以及科研学者之间的科研合作模式，使得科研学者对学科主题有更好的认知，建立更高效的合作团体，提高科研产出，促进学科发展，本文以国际统计学期刊为例对学科主题进行了深入的研究。首先构建合作者网络并分析其基本属性，其次提取其核心网络并分析其连通分量结构，最后利用ECV方法和正则化谱聚类算法对第一、二大连通分量进行社区个数的确定及社区划分。结果表明，统计领域科研合作现象日益普遍，合作者网络具有明显的社区结构；结合论文信息和作者属性，本文得到29个不同的学科主题，并发现不同社区之间存在交叉合作的现象，同一社区内部存在不同学科主题的融合。此外，在科研合作模式方面，本文发现同一学科主题或科研单位的学者更容易产生合作关系，同一社区的学者发表论文的期刊具有明显的相似性。

Abstract: Scientific researchers are an important force in promoting the development of disciplines.In the era of big science,more and more scholars tend to cooperate in research,and the phenomenon of scientific research cooperation is becoming more and more common.Through scientific research collaboration,scholars can complement each other’s strengths and avoid duplication of research.The collaboration among researchers can be transformed into network data.The application of complex network analysis has led to significant advancements in understanding complex systems across various fields.Therefore,in order to understand the current mainstream research topics and the collaboration mode among researchers,this study collects information of 66460 papers published in 44 statistical journals from 2001 to 2018,and builds a co-authorship network.It can be found that in recent years,statisticians have increasingly tended to collaborate in publishing papers.Besides,nearly one-fifth of statisticians have only one collaborator in the network.Professor Balakrishnan,N.,from McMaster University,has the largest number of collaborators and has published the most papers in 44 statistical journals.Many large networks have a core-periphery structure,and so does the collaborator network.We then extract its core network.The core network contains 1158 nodes and 15464 edges.Additionally,there are 34 connected components in the core network.The largest connected component has 356 authors,accounting for 30.7% of the total authors in the core network.The second largest connected component has 172 authors,accounting for 14.9% of the total authors in the core network.The rest of the connected components have less than 100 authors,respectively.Then we particularly analyze its first and second largest connected components.

There are three common characteristics of complex networks,which are small-world,scale-free,and community structure characteristics.Among them,the community structure means that nodes in the network show aggregation phenomenon.Community detection is a particularly crucial research area,as it allows for the identification of groups of nodes that exhibit specific patterns of interaction within a network.We intend to conduct community detection in the first and second largest connected components.For many community detection algorithms,the number of communities should be pre-set.However,it is difficult to know it in a real network.Using cross-validation to automatically select the number of communities in the network is a breakthrough in the field of community detection.This paper adopts the edge cross-validation (ECV) method to determine the number of communities and adjustment parameters.Then we use the regularized spectral clustering algorithm to discover the community of the co-authorship network.

The core co-authorship network is divided into 62 communities.Through observation,it can be found that authors in the same community cooperate relatively closely,and authors belonging to different communities have relatively little cooperation.There is cross-collaboration between different communities.Then we analyze the characteristics of 62 communities from the three perspectives based on the authors’ attributes.The first one is the research field.We find 29 different research fields,including biostatistics,variable selection,maximum likelihood,and many others.We specifically show the research fields of the communities in the largest connected component.The authors in the largest connected component have a broad range of research topics.Some communities focus on more than one field.The second one is the journal.There exists an obvious similarity in the journals in which authors from the same community published their papers.We take Community 6 and Community 18 as examples for detailed analysis.It is found that authors in Community 6,who mainly study survival analysis in biostatistics,are more inclined to publish papers in Bioinformatics,Biostatistics,Biometrics,and other biostatistics journals.Authors in Community 18,who mainly focus on variable selection,are more inclined to publish papers in the top statistical journals such as Journal of the American Statistical Association,Biometrics,and Annals of Statistics.The third one is the author’s affiliation.The affiliations of authors in the same community are also obviously clustered.We also take Community 6 and Community 18 as examples for detailed analysis.It is found that many of the statisticians in Community 6 are from the University of North Carolina Chapel Hill and the Fred Hutchinson Cancer Center.Community 18 has a higher number of statisticians from the University of North Carolina and the University of North Carolina Chapel Hill.

Most of the existing studies about co-authorship networks of statisticians focus on papers in the four major statistical journals (Annals of Statistics,Biometrika,Journal of the American Statistical Association,and Journal of the Royal Statistical Society Series B-Statistical Methodology).Comparatively,the data used in this paper covers a much broader range,and the conclusions are richer.The methods used to determine the number of communities in previous studies are more subjective.Compared with them,the method adopted in this paper is more objective and universal,which can be extended to collaboration networks in other fields.

HTML全文

参考文献(0)

施引文献

资源附件(0)