Latest News AI for the Research Ecosystem workshop #AI4RE – round up30 March 2024On March 22, 2024, the AI for the Research Ecosystem workshop (#AI4RE) took place in London, kindly hosted by UCL in the wonderful surroundings of Chandler House. The workshop was part of the Turing Institue’s AI UK Fringe series of events which took place around the U.K. The workshop focused on the intersection of the recent developments in Artificial Intelligence, such as Large Language Models and Deep Learning, and how these developments will impact current research practices. The packed programme opened with a keynote by Prof. David De Roure of the University of Oxford, exploring knowledge infrastructures, social machines and how, and if, we can measure the rate of innovation – and whether it is increasing. There were two sessions of invited talks featuring a range of impressive speakers from academia and industry. Dr Phil Gooch, from Scholarcy, looked at how AI can help make research more accessible for the widest range of users. Professor Mike Thelwall, University of Sheffield, examined whether using AI in research assessment can be beneficial, and impartial. Simon Porter, VP futures at Digital Science, gave a service provider’s perspective on how they are adapting and adopting these new technologies. Professor Andrew French, University of Nottingham, showed some amazing work their group have been doing for over 20 years using machine learning for identifying the phenotype of plants. He also introduced AIBIO-UK, a community-building project that brings together AI and core bioscience researchers to unravel biological fundamentals and tackle impeding societal challenges. KMi were represented by Professor Petr Knoth discussing CORE and why AI needs open data and open infrastructures, and by Dr Angelo Salatino who gave a detailed overview of his team’s work investigating the role of AI in literature reviews, hypothesis generation, expert diversity and scientific question / answering. The workshop’s themes address highly relevant issues, attracting significant interest from stakeholders like the European Commission and UK Research and Innovation. Ben Steyn, head of metascience at the Department for Science Innovation and Technology (DSIT) gave an impromptu lightning talk, introducing the combined DSIT / UKRI Metascience unit and presented their efforts in developing a global community of practice for metascience. There was also a poster session, with 6 accepted posters gathering a crowd during the networking session, and the day rounded off with a wide-ranging panel session where the panellists had a lively discussion on AI’s applications in study design, literature reviews, and data analysis. The panel also covered ethical considerations and the democratisation of AI technologies. The workshop aimed to foster a collaborative learning community among researchers, institutions, and policymakers and by this measure was a resounding success. We are extremely grateful to all of the speakers and participants for being a part of the AI4RE workshop. You can view all of the slides and posters from the day on the AI4RE website... Integrating Conversational Agents and Knowledge Graphs Within the Scholarly Domain18 March 2023“Integrating Conversational Agents and Knowledge Graphs Within the Scholarly Domain” is a journal paper accepted at IEEE Access. Antonello Meloni1, Simone Angioni1, Angelo Antonio Salatino2, Francesco Osborne2, Diego Reforgiato Recupero1, Enrico Motta2 1 Department of Mathematics and Computer Science, University of Cagliari (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) Abstract In the last few years, chatbots have become mainstream solutions adopted in a variety of domains for automatizing communication at scale. In the same period, knowledge graphs have attracted significant attention from business and academia as robust and scalable representations of information. In the scientific and academic research domain, they are increasingly used to illustrate the relevant actors (e.g., researchers, institutions), documents (e.g., articles, patents), entities (e.g., concepts, innovations), and other related information. Following the same direction, this paper describes how to integrate conversational agents with knowledge graphs focused on the scholarly domain, a.k.a. Scientific Knowledge Graphs. On top of the proposed architecture, we developed AIDA-Bot, a simple chatbot that leverages a large-scale knowledge graph of scholarly data. AIDA-Bot can answer natural language questions about scientific articles, research concepts, researchers, institutions, and research venues. We have developed four prototypes of AIDA-Bot on Alexa products, web browsers, Telegram clients, and humanoid robots. We performed a user study evaluation with 15 domain experts showing a high level of interest and engagement with the proposed agent. Download Download from DOI (Open Access): https://doi.org/10.1109/ACCESS.2023.3253388 Download from Institutional Repository (ORO): https://oro.open.ac.uk/88056/... R-Classify: Extracting Research Papers’ Relevant Concepts from a Controlled Vocabulary12 November 2022“R-Classify: Extracting Research Papers’ Relevant Concepts from a Controlled Vocabulary” is a software paper accepted at Software Impacts. Tanay Aggarwal, Angelo Antonio Salatino, Francesco Osborne, Enrico Motta Knowledge Media Institute, The Open University, Milton Keynes (UK) Abstract In the past few decades, we saw a proliferation of scientific articles available online. This data-rich environment offers several opportunities but also challenges, since it is problematic to explore these resources and identify all the relevant content. Hence, it is crucial that they are appropriately annotated with their relevant concepts so to increase their chance of being properly indexed and retrieved. In this paper, we present R-Classify, a web tool that assists users in identifying the most relevant concepts according to a large-scale ontology of research areas in the field of Computer Science. Web App R-Classify is up and running. Feel free to give it a try at https://cso.kmi.open.ac.uk/classify/ Download Download from DOI (Open Access): https://doi.org/10.1016/j.simpa.2022.100444 Download from institutional repository: https://oro.open.ac.uk/85958/... Leveraging Knowledge Graph Technologies to Assess Journals and Conferences at Springer Nature12 November 2022“Leveraging Knowledge Graph Technologies to Assess Journals and Conferences at Springer Nature” is an In-Use paper presented at the 21st International Semantic Web Conference (ISWC 2022). Simone Angioni1, Angelo Antonio Salatino2, Francesco Osborne2,3,AliaksandrBirukou4, Diego Reforgiato Recupero1, Enrico Motta2 1 Department of Mathematics and Computer Science, University of Cagliari (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) 3 Department of Business and Law, University of Milano Bicocca, Milan (Italy) 4 Springer-Verlag GmbH, Tiergartenstrasse 17, 69121 Heidelberg (DE) Abstract Research publishing companies need to constantly monitor and compare scientific journals and conferences in order to inform critical business and editorial decisions. Semantic Web and Knowledge Graph technologies are natural solutions since they allow these companies to integrate, represent, and analyse a large quantity of information from heterogeneous sources. In this paper, we present the AIDA Dashboard 2.0, an innovative system developed in collaboration with Springer Nature to analyse and compare scientific venues, now also available to the public. This tool builds on a knowledge graph which includes over 1.5B RDF triples and was produced by integrating information about 25M research articles from Microsoft Academic Graph, Dimensions, DBpedia, GRID, CSO, and INDUSO. It can produce sophisticated analytics and rankings that are not available in alternative systems. We discuss the advantages of this solution for the Springer Nature editorial process and present a user study involving 5 editors and 5 researchers, which yielded excellent results in terms of quality of the analytics and usability. Media Download Download from ORO: http://oro.open.ac.uk/84363/ Download from DOI: https://doi.org/10.1007/978-3-031-19433-7_42... Best Paper Award at the In-Use Track ISWC 202212 November 2022It is an honour to be prized for the Best Paper Award at the In-Use Track ISWC – International Semantic Web Conference (Premiere Conference in the Semantic Web). Great work in collaboration with Springer Nature and UniCa – Università degli Studi di Cagliari. The paper describes our recent efforts in putting semantic technologies (The AIDA Dashboard https://aida.kmi.open.ac.uk/dashboard) in production, and for use in the industry. Read our paper here: https://oro.open.ac.uk/84363/ Further Reading From KMi Planet (Eng): https://kmi.open.ac.uk/news/article/19810 From University of Cagliari (Italian): https://unica.it/unica/page/it/aida_dashbord_applicazione_web_innovativa_per_lanalisi_di_riviste_autori_e_conferenze_scientifiche... Annotating D3 dataset with the CSO Classifier20 September 2022Abstract The DBLP Discovery Dataset (D3) is a newly created dataset of research papers in the field of Computer Science which can support several tasks like identifying trends in research activity, productivity, focus, bias, accessibility, and impact. This dataset stems from DBLP and integrates additional information from the full-texts. We argue that papers classified with their research topics can improve the identification of research trends. To this end, we used the CSO Classifier to annotate all the papers within D3 and we made such extension available for research purposes. Introduction The DBLP Discovery Dataset (D3) is a dataset in the field of Computer Science, which was recently released and can support several tasks including identifying trends in research activity, productivity, focus, bias, accessibility, and impact. This dataset derives from DBLP and integrates additional information from the full-texts. Each paper is associated with a set of attributes: corpusid, abstract, updated, externalids, url, title, authors, venue, year, referencecount, citationcount, influentialcitationcount, isopenaccess, s2fieldsofstudy, publicationtypes, publicationdate, and journal. We argue that annotating research papers with their research topics can improve a number of tasks, including the exploration of research trends, the recommendation of similar research articles, and extraction of knowledge (read more). To this end, we run the CSO Classifier to annotate all the papers within the D3 dataset and we made such extension available for research purposes on Zenodo (see D3 dataset annotated with CSO topics – https://zenodo.org/record/7097148). CSO Classifier The CSO Classifier is an application that takes as input the text from abstract, title, and keywords of a research paper and outputs a list of relevant concepts from CSO. It consists of two main components: (i) the syntactic module and (ii) the semantic module. The syntactic module parses the input documents and identifies CSO concepts that are explicitly referred in the document. The semantic module uses part-of-speech tagging to identify promising terms and then exploits word embeddings to infer semantically related topics. Finally, the CSO Classifier combines the results of these two modules, removes outliers, and enhances them by including relevant super-areas. The reader can refer to this article for additional details. Dataset In this section, we will observe how to process the newly created annotation. The D3 dataset is distributed in JSONL format, meaning that each line is a JSON dictionary. This format is quite convenient for large files as it does not require the whole dataset to be parsed at once, but it can be parsed row by row (i.e., paper by paper). For the sake of consistency, we kept the same format with our annotated dataset. D3 dataset In Listing 1, we present an example of line (paper) found in the D3 dataset, having corpus id 26. In particular, we can observe the richness of metadata pertained in this dataset. JSON associated to paper (corpusid 26) within the D3 dataset. CSO annotations In Listing 2 we can find the extracted topics from the same paper (corpus id 26) showed in Listing 1. It is a JSON dictionary that will sit as single line within the distributed dataset. In particular, it contains 5 keys. There is the corpusid which helps to refer to the original paper contained in the D3 dataset. Then, there are four keys that express the outcome of the CSO Classifier: syntactic, semantic, union, and enhanced. The keys syntactic and semantic respectively contain the topics returned by the syntactic and semantic module. Union contains the unique topics found by the previous two modules. In enhanced you can find the relevant super-areas. JSON obtained by the CSO Classifier for the same paper (corpusid 26). Downloads Dataset: https://zenodo.org/record/7097148 This article in PDF: Annotating D3 dataset with the CSO Classifier... Sci-K 2022 – International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment17 June 2022“Sci-K 2022 – International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment” is the introductory chapter of the workshop proceedings of “Sci-K 2022 – International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment” co-located with The Web Conference 2022. Paolo Manghi1, Andrea Mannocci1, Francesco Osborne2, Dimitris Sacharidis3, Angelo Salatino2, Thanasis Vergoulis4 1 CNR-ISTI – National Research Council, Institute of Information Science and Technologies “Alessandro Faedo” (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) 3 Université Libre de Bruxelles (Belgium) 4 “Athena” RC (Greece) Abstract In this paper we present the 2nd edition of the Scientific Knowledge: Representation, Discovery, and Assessment (Sci-K 2022) workshop. Sci-K aims to explore innovative solutions and ideas for the generation of approaches, data models, and infrastructures (e.g., knowledge graphs) for supporting, directing, monitoring and assessing the scientific knowledge and progress. This edition is also a reflection point as the community is seeking alternative solutions to the now-defunct Microsoft Academic Graph (MAG). Download Download from doi: https://doi.org/10.1145/3487553.3524883... Enriching Data Lakes with Knowledge Graphs17 June 2022“Enriching Data Lakes with Knowledge Graphs” is a workshop paper published at “Knowledge Graph Generation from Text” co-located with ESWC 2022. Alessandro Chessa1,2, Gianni Fenu3, Enrico Motta4, Francesco Osborne4,5, Diego Reforgiato Recupero3,Angelo Antonio Salatino4, Luca Secchi1 1 Linkalab s.r.l., Cagliari, Italy 2 Luiss Data Lab, Rome, Italy 3 University of Cagliari, Cagliari, Italy 4 Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom 5 University of Milano Bicocca, Milan, Italy Abstract Data lakes are repositories of data stored in natural/raw format. A data lake may include structured data from relational databases, semi-structured data (i.e., JSON, CSV), unstructured data (i.e., text data), or binary data (i.e., images, audio, video). It is usually built on top of cost-efficient infrastructures such as Hadoop, Amazon S3, MongoDB, ElasticSearch, etc. Several organisations rely on big data lakes for crucial tasks such as reporting, visualisation, advanced analytics, machine learning, and business intelligence. A major limitation of this solution is that without descriptive metadata and a mechanism to maintain it, such data tend to be noisy, making their management and analysis complex and time-consuming. Therefore, there is the need to add a semantic layer based on a formal ontology to describe the data and efficient mechanism to represent them as a knowledge graph. In this paper, we present a methodology to add a semantic layer to a data lake and thus obtain a knowledge graph that can support structured queries and advanced data exploration. We describe a practical implementation of a methodology applied to a data lake consisting of text data describing the online marketplace for lodging and tourism activities. We report statistics about the data lake and the resulting knowledge graph. Download Link will be available soon... The AIDA Dashboard: a Web Application for Assessing and Comparing Scientific Conferences17 June 2022“The AIDA Dashboard: a Web Application for Assessing and Comparing Scientific Conferences” is a research paper submitted to IEEE Access. Simone Angioni1, Angelo Antonio Salatino2, Francesco Osborne2, Diego Reforgiato Recupero1, Enrico Motta2 1 Department of Mathematics and Computer Science, University of Cagliari (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) Abstract Scientific conferences are essential for developing active research communities, promoting the cross-pollination of ideas and technologies, bridging between academia and industry, and disseminating new findings. Analyzing and monitoring scientific conferences is thus crucial for all users who need to take informed decisions in this space. However, scholarly search engines and bibliometric applications only provide a limited set of analytics for assessing research conferences, preventing us from performing a comprehensive analysis of these events. In this paper, we introduce the AIDA Dashboard, a novel web application, developed in collaboration with Springer Nature, for analyzing and comparing scientific conferences. This tool introduces three major new features: 1) it enables users to easily compare conferences within specific fields (e.g., Digital Libraries) and time-frames (e.g., the last five years); 2) it characterises conferences according to a 14K research topics from the Computer Science Ontology (CSO); and 3) it provides several functionalities for assessing the involvement of commercial organizations, including the ability to characterize industrial contributions according to 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). We evaluated the AIDA Dashboard by performing both a quantitative evaluation and a user study, obtaining excellent results in terms of quality of the analytics and usability. Downloads Download paper from IEEE Access (OA): https://ieeexplore.ieee.org/document/9754584 Download from ORO: http://oro.open.ac.uk/82668/... Characterising Research Areas in the field of AI17 June 2022“Characterising Research Areas in the field of AI” is a research paper submitted to the special track “Statistical Methods for Science Mapping” on “51st Scientific Meeting of the Italian Statistical Society”. Alessandra Belfiore1, Angelo Salatino2, Francesco Osborne2 1 Università della Campania Luigi Vanvitelli, Caserta (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) Abstract Interest in Artificial Intelligence (AI) continues to grow rapidly, hence it is crucial to support researchers and organisations in understanding where AI research is heading. In this study, we conducted a bibliometric analysis on 257K articles in AI, retrieved from OpenAlex. We identified the main conceptual themes by performing clustering analysis on the co-occurrence network of topics. Finally, we observed how such themes evolved over time. The results highlight the growing academic interest in research themes like deep learning, machine learning, and internet of things. Downloads Download paper from arXiv: https://arxiv.org/abs/2205.13471...