Abstract:
Scientific researchers are an important force in promoting the development of disciplines.In the era of big science,more and more scholars tend to cooperate in research,and the phenomenon of scientific research cooperation is becoming more and more common.Through scientific research collaboration,scholars can complement each other’s strengths and avoid duplication of research.The collaboration among researchers can be transformed into network data.The application of complex network analysis has led to significant advancements in understanding complex systems across various fields.Therefore,in order to understand the current mainstream research topics and the collaboration mode among researchers,this study collects information of 66460 papers published in 44 statistical journals from 2001 to 2018,and builds a co-authorship network.It can be found that in recent years,statisticians have increasingly tended to collaborate in publishing papers.Besides,nearly one-fifth of statisticians have only one collaborator in the network.Professor Balakrishnan,N.,from McMaster University,has the largest number of collaborators and has published the most papers in 44 statistical journals.Many large networks have a core-periphery structure,and so does the collaborator network.We then extract its core network.The core network contains 1158 nodes and 15464 edges.Additionally,there are 34 connected components in the core network.The largest connected component has 356 authors,accounting for 30.7% of the total authors in the core network.The second largest connected component has 172 authors,accounting for 14.9% of the total authors in the core network.The rest of the connected components have less than 100 authors,respectively.Then we particularly analyze its first and second largest connected components.
There are three common characteristics of complex networks,which are small-world,scale-free,and community structure characteristics.Among them,the community structure means that nodes in the network show aggregation phenomenon.Community detection is a particularly crucial research area,as it allows for the identification of groups of nodes that exhibit specific patterns of interaction within a network.We intend to conduct community detection in the first and second largest connected components.For many community detection algorithms,the number of communities should be pre-set.However,it is difficult to know it in a real network.Using cross-validation to automatically select the number of communities in the network is a breakthrough in the field of community detection.This paper adopts the edge cross-validation (ECV) method to determine the number of communities and adjustment parameters.Then we use the regularized spectral clustering algorithm to discover the community of the co-authorship network.
The core co-authorship network is divided into 62 communities.Through observation,it can be found that authors in the same community cooperate relatively closely,and authors belonging to different communities have relatively little cooperation.There is cross-collaboration between different communities.Then we analyze the characteristics of 62 communities from the three perspectives based on the authors’ attributes.The first one is the research field.We find 29 different research fields,including biostatistics,variable selection,maximum likelihood,and many others.We specifically show the research fields of the communities in the largest connected component.The authors in the largest connected component have a broad range of research topics.Some communities focus on more than one field.The second one is the journal.There exists an obvious similarity in the journals in which authors from the same community published their papers.We take Community 6 and Community 18 as examples for detailed analysis.It is found that authors in Community 6,who mainly study survival analysis in biostatistics,are more inclined to publish papers in Bioinformatics,Biostatistics,Biometrics,and other biostatistics journals.Authors in Community 18,who mainly focus on variable selection,are more inclined to publish papers in the top statistical journals such as Journal of the American Statistical Association,Biometrics,and Annals of Statistics.The third one is the author’s affiliation.The affiliations of authors in the same community are also obviously clustered.We also take Community 6 and Community 18 as examples for detailed analysis.It is found that many of the statisticians in Community 6 are from the University of North Carolina Chapel Hill and the Fred Hutchinson Cancer Center.Community 18 has a higher number of statisticians from the University of North Carolina and the University of North Carolina Chapel Hill.
Most of the existing studies about co-authorship networks of statisticians focus on papers in the four major statistical journals (Annals of Statistics,Biometrika,Journal of the American Statistical Association,and Journal of the Royal Statistical Society Series B-Statistical Methodology).Comparatively,the data used in this paper covers a much broader range,and the conclusions are richer.The methods used to determine the number of communities in previous studies are more subjective.Compared with them,the method adopted in this paper is more objective and universal,which can be extended to collaboration networks in other fields.