Early detection of research trends and forecast their future impact*

This post aims to act like a hub for all the relevant information about my doctoral work. It will be constantly updated with new source and developments.

Abstract

The ability to promptly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. While the literature describes several approaches which aim to identify the emergence of new research topics early in their lifecycle, these rely on the assumption that the topic in question is already associated with a number of publications and consistently referred to by a community of researchers. Hence, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. In this paper, we begin to address this challenge by performing a study of the dynamics preceding the creation of new topics. This study indicates that the emergence of a new topic is anticipated by a significant increase in the pace of collaboration between relevant research areas, which can be seen as the ‘parents’ of the new topic. These initial findings (i) confirm our hypothesis that it is possible in principle to detect the emergence of a new topic at the embryonic stage, (ii) provide new empirical evidence supporting relevant theories in Philosophy of Science, and also (iii) suggest that new topics tend to emerge in an environment in which weakly interconnected research areas begin to cross-fertilise. (Please note this abstract come from one of my last papers.)

Relevant Papers (on Chronological order)

[*] Please note that the title will change once I have decided the final title of my thesis.

3MT – Early detection of research trends

On 16th May 2017, the STEM Faculty of my university organised a 3 Minutes Thesis (3MT) in which each candidate has a time slot of three minutes to describe their thesis. The speech can be supported by one static slide showing important features of the work.

I wish I had shown the one above. In which it is pretty clear that in my work I aim to combine different data sources to attain new information and knowledge: emerging topics.

At the end I decided to show a more formal one, attached below.

Single slide I used for my 3MT talk

How are topics born? Understanding the research dynamics preceding the emergence of new areas

How are topics born? Understanding the research dynamics preceding the emergence of new areas” is a peer-reviewed paper submitted to PeerJ Computer Science. The paper has been submitted in July 2016 and accepted in May 2017. All the co-authors are thankful to the reviewers and the editor for providing insightful comments and thus improving the manuscript.

Authors:

Angelo Antonio Salatino, Francesco Osborne, Enrico Motta

Abstract:

The ability to recognise promptly new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. While the literature describes several approaches which aim to identify the emergence of new research topics early in their lifecycle, these rely on the assumption that the topic in question is already associated with a number of publications and consistently referred to by a community of researchers. Hence, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. In this paper, we begin to address this challenge by performing a study of the dynamics preceding the creation of new topics. This study indicates that the emergence of a new topic is anticipated by a significant increase in the pace of collaboration between relevant research areas, which can be seen as the ‘parents’ of the new topic. These initial findings i) confirm our hypothesis that it is possible in principle to detect the emergence of a new topic at the embryonic stage, ii) provide new empirical evidence supporting relevant theories in Philosophy of Science, and also iii) suggest that new topics tend to emerge in an environment in which weakly interconnected research areas begin to cross-fertilise.

Download paper: https://peerj.com/articles/cs-119/

Export Graph in R via JSON

This post presents an easy solution for exporting and importing a graph object of igraph library.
In its previous versions, the library used to have the save and load functions in which you could respectively export and import the graph object [1]. Although they seem to not be in the library anymore, the documentation states:

“Attribute values can be set to any R object, but note that storing the graph in some file formats might result the loss of complex attribute values. All attribute values are preserved if you use save and load to store/retrieve your graphs.

The library also proposes write_graph and read_graph, that rely on the GraphML format, for exporting and importing back graph objects.

However, here I propose my little solution with almost zero options. It saves the graph and allows to re-load it again (in another session as well) simply saving all the fields and values in a JSON file.

Read More

BigDat2017: certificate of attendance

I have recently been attending in Bari (IT) a winter school about Big Data: BigDat2017. At the moment, Big Data is gaining great attention in research, since it allows to provide data-driven solutions in several contexts.

As part of my postgraduate research I decided to attend it and follow the new developments in this field.

Here follows the proof of my attendance.

Certificate of Attendance

In addition to this, I also wrote a review of the winter school. If you are interested in it, please read it here: BigDat2017.

Here is the link to the news I wrote on my department website, instead.

BigDat2017: a review

This week I have been attending the 3rd edition of the Big Data winter school: BigDat2017. It was held in my former campus, at the University of Bari (IT). It was a really nice feeling to be back for a while, sitting on those benches and following courses, once again.
Big Data has recently gained a lot of interest in research and many believe that it will still play its leading role for many years. Nowadays, we live in a world in which all information seems to be available, we are surrounded by data-driven applications (Google, Facebook, Twitter, Spotify, just to name a few), which gather data and try to provide tailor-made solutions for their users. To this end, having such event like BigDat2017 with its clear mission —introduce and update new researchers into this fast advancing research area—is really important.Read More

Department Research Seminar: Early Detection of Research Topics

On the 8th February I delivered a seminar to my department (KMi @ OU) in which I described the work I have been doing in the last two years for my postgraduate research.

I started with a little bit of introduction about science, I then started talking about the different and currently available technologies for keeping track of the development of the different research areas. I showed how this technologies were not satisfactory enough if we want to perform the early detection of research topics. Showing a bit of the state of the art (including The Structure of Scientific Revolution by Kuhn) it allowed me to state my main hypothesis, regarding the existence of an embryonic stage that research areas face and that it is possible to detect their emergence during this stage1.Read More

Smart Topic Miner

Smart Topic Miner (STM) is a web application which uses Semantic Web technologies to classify scholarly publications on the basis of Computer Science Ontology (CSO), a very large automatically generated ontology of research areas.

 

STM was developed to support the Springer Nature Computer Science editorial team in classifying proceedings in the LNCS family. It analyses in real time a set of publications provided by an editor and produces a structured set of research topics and a number of Springer Nature Classification tags, which best characterise the proceedings book. Indeed, if you regularly publish in the main Computer Science conferences, your work was probably already classified and indexed by using STM. **

During the classification of proceedings, editors are involved in different tasks and one of them is determining the list of related terms and categories. This is accomplished according to their own experience, like exploring titles and abstracts visually. However, this appears to be time-consuming as well as complex to perform. In addition, new emerging topics may not find their space while some other current topics could be considered still popular.

Read More

A Visual Introduction to Machine Learning: Italian Translation

The R2D3 team (http://www.r2d3.us/) developed a visual introduction to Machine Learning. This introduction uses data visualization technologies to show a workflow that can help for the creation of a machine learning model able to make accurate predictions. Lately, many people volunteered to translate this introduction in different languages. I took care of the Italian version: Una introduzione visuale al machine learning.
English Version: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
Italian Version: http://www.r2d3.us/una-introduzione-visuale-al-machine-learning-1/

Social:

Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction based on Innovation-Adoption Priors

Semantic Innovation Forecasting Model
Semantic Innovation Forecasting Model

Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction based on Innovation-Adoption Priors” is a peer-reviewed paper presented on Tuesday 22nd November 2016 at the “Entity detection, matching and evolution” session at the 20th International Conference on Knowledge Engineering and Knowledge Management, Bologna, Italy

Authors:

Amparo Elizabeth Cano-Basave, Francesco Osborne and Angelo Antonio Salatino

Abstract:

The ontology engineering research community has focused for many years on supporting the creation, development and evolution of ontologies. Ontology forecasting, which aims at predicting semantic changes in an ontology, represents instead a new challenge. In this paper, we want to give a contribution to this novel endeavour by focusing on the task of forecasting semantic concepts in the research domain. Indeed, ontologies representing scientific disciplines contain only research topics that are already popular enough to be selected by human experts or automatic algorithms. They are thus unfit to support tasks which require the ability of describing and exploring the forefront of research, such as trend detection and horizon scanning. We address this issue by introducing the Semantic Innovation Forecast (SIF) model, which predicts new concepts of an ontology at time t + 1, using only data available at time t. Our approach relies on lexical innovation and adoption information extracted from historical data. We evaluated the SIF model on a very large dataset consisting of over one million scientific papers belonging to the Computer Science domain: the outcomes show that the proposed approach offers a competitive boost in mean average precision-at-ten compared to the baselines when forecasting over 5 years.Read More