Depuis une quarantaine d’années, les écarts de rémunération entre hommes et femmes ont donné lieu à une littérature abondante. Au fil du temps et d’un pays à l’autre, les articles dénoncent à la fois une inégalité de salaire, toutes choses égales par ailleurs, et une impossibilité pour les femmes à atteindre les échelons supérieurs de la hiérarchie (plafond de verre). Gobillon et al. (2015) mettent en évidence un différentiel de salaire de 16% dans la Fonction publique d’État, Les écarts de rémunération traduisent un effet dit « structurel » - les salariés ont des caractéristiques individuelles qui ne suivent pas nécessairement la même structure et ces caractéristiques influent le niveau de rémunération- et un effet qui renvoie aux traitements genrés des carrières et des rémunérations des femmes et des hommes.. En s'appuyant sur l'exploitation de données fournies par le ministère de l'Environnement, de l'Energie et de la Mer, ce projet s'interroge sur les différences dans les salaires et les carrières des employé(e)s de ce ministère. L'originalité et la richesse des données disponibles (4 ans de fiches de salaire pour 46000 individus plus 10 ans de récapitulatifs de carrière) permettra d'adopter
une approche interdisciplinaire, avec des outils issus des systèmes complexes et des techniques du data science et de l’apprentissage automatique. L'idée sera d'expliquer les différences de traitement mais aussi les différences de choix individuels professionnels. Ce projet, mené en collaboration avec le service "Egalité des droits entre les femmes et les hommes et de la lutte contre les discriminations" du ministère fait que l’étude n’a pas seulement un objectif de connaissance mais un objectif de politique publique, i.e. de remédiation des éventuelles discriminations constatées.
In recent years, the database research community is dedicating more and more attention to problems related to the efficient management of graph data. The interest in this kind of data is motivated by the widespread diffusion of applications that manipulate and process data that are naturally modelled as graphs: the most notable example of these application is represented by social networks like Facebook or LinkedIn, that daily process graphs with billions of edges. Managing and analysing data graphs is a challenging task first of all due to the inherent flexibility and possible heterogeneity of graphs, and secondly because of the data size that is becoming more and more important.
In the context of this project, we examine the efficient saturation of big RDF data sets using BigData platforms such as SPARK, Hadoop, but also Graph specific platforms, e.g., GraphX. RDF (Resource Description Framework) is a W3C graph data model that is has been widely adopted recently to encode and query (using the SPARQL language) structured datasets in the last few years. We focus on the problem on massive parallelisation of saturation of RDF datasets, since this operation is key to a range of query management functionalities, including data querying, data exchange and integration, to name a few. We intend to a examine and implement a variety of RDF graph saturation techniques on top of different Big data management platforms with the view to draw lessons that guide the saturation operation when dealing with massive datasets, and identify the advantages such platforms bring, as well as the weaknesses that need to be addressed.
Libor (London Interbank Offered Rate) manipulation by banks has been under trial since 2008. Judicial decisions came with the disclosure of a great number of documents containing conversations of traders involved. The project "Construction of a semantic tool for the analysis of Libor manipulation" uses those documents to build a textual database to characterize the interactions at stake between financial actors. A semantic analysis will help better understand the social environment of agents involved in the manipulation (in terms of social norms, values, language...) and therefore better model the economic rationality of financial actors.
This project aims at better understanding the contributions of various stakeholders in the making of EU Regulations shaping economic activities. The single market policy has been a major driver or the reshaping of regulations in all domain for the last 30 years. While tremendous progresses have been made toward a deeper economic integration, many industries remain regulated and organized on a national basis and a continuous process of harmonization is witnessed. Also, new challenges — e.g. decarbonification, digital transformation — or central EU policies — e.g. inclusion, regional development —call on a continuous basis for additional regulatory initiatives.
The decision process managed in Brussel is central since it leads to the enactment of European Directives that are then transposed in national legislations. European regulations can also be passed and become immediately enforceable as law in all member states. The text of a draft directive (or regulation) is prepared by the Commission after consultation with its own and national experts. The draft is presented to the Parliament and the Council—composed of relevant ministers of member governments, initially for evaluation and comment, then subsequently for approval or rejection. While being required to consult Parliament on legislative proposals, the Council is not bound by Parliament's position.
Also, according to Article 11 of the Treaty on the European Union, ‘the European Commission shall carry out broad consultations with parties concerned in order to ensure that the Union’s actions are coherent and transparent’. It has led over the years to the development of formal processes by which the Commission collects input and views from stakeholders about its policies, which are now central in the lawmaking process in Brussels. Draft directives and regulations are now systematically preceded by such consultation of stakeholders. In parallel, the European Union has been implementing a set of rules and tools to prevent corruption in the performance of its own institutions. As a result, the consultation with the stakeholders is increasingly based upon formal and transparent procedures.
Overall, citizens and researchers now benefit of a set of formal, consistent and comprehensive sources of bureaucratic and parliamentary data on the decision process carried out in Brussels to elaborate the EU legislation. There are also complementary sources available on the process of transposition these legislations in each national legislation. Altogether these data can be relied upon to study the effective process of decision in Brussels, and in particular:
Not only the specificities of the European law-making process can be documented, but also we benefit of a set of original data to study in concrete terms how influence of the various interest groups influence the production of regulations. Until recently, only US data were available to document these process and data-intensive research on firms’ non market strategies and the political economy of regulation was based on US data only.
The Equipex Data for Financial History develops an infrastructure to collect, align and share data on all the assets traded on the Paris stock market and listed issuers over the 19th and 20th centuries. It registers twice per month spot, forward and options prices of all the securities (French and foreign; public and private; shares and bonds), currencies and precious metals, besides securities events relevant for harmonizing prices over time (dividends, coupons and ex- dates; number of listed securities; (reverse) split; etc.). It records information on issuers such as boards, headquarters, balance sheets and income statements, changes in equity capital, governance rules... The two main historical sources are the lists of the exchanges for market data and various kinds of yearbooks for data on issuers. DFIH aims at broadening its scope to include a larger variety of data. To achieve its goals, it contributes to the development of innovative technologies of data extraction and enrichment.
Academic disciplines differ, among many dimensions, in the extent to which they are populated mostly by men, or mostly by women. Building upon this well-established fact, this research examines the relationship between the "knowledge hierarchy" and the "gender hierarchy" across a range of social sciences. We first assess the past and present strength of this connection. We then explore how gender shapes the knowledge hierarchy on both ends: by structuring input (e.g., funneling men and women into different -- and differentially valued -- scientific trajectories, both across and within disciplines); and by structuring output (e.g., the relative devaluation of women's versus men's research topics and methods, the lesser citation of women reproduces gender hierarchies). To this end, we rely primarily on massive but curated bibliometric data to identify patterns and causalities.
Understanding the behavior of funds managers (hedge funds, UCITS, mutual funds) is key to explain the recent evolution of some markets but also to detect systemic risk factors (regulation implications) and potential conflicts of interests (financial markets law implications). But any analysis of that kind imposes to collect information from several sources and to access to big-data type of databases.
The main stream of research on hedge funds focuses on hedge funds managed by US companies (and regulated by the SEC). Heterogeneity and lack of ‘good-quality’ data explain this apparent lack of interest for hedge funds managed by European asset managers (and regulated by the ESMA and some local regulators). Moreover, the construction of such database is complicated and expensive. Our project consists in constructing a European funds managers' database. We will first draw up an extensive list of all the existing sources of data: commercial databases, data from the European regulators, and data to be collected on specialized websites. In a second step, this large amount of heterogeneous data should be aggregated across sources using big data methods and technics. We want to mine vast sums of big data to discern new patterns in funds managers’ behaviour and strategies that impact price evolution and contagion between markets.
Querying interconnected data is one of the problems posed by the phenomenon of avalanche of data, for which the conventional solutions are not satisfactory. The problem addressed by this project is the querying of a collection of interlinked documents by a non-computer user. In the legal field, the context of application of this project, the challenge is to find documents not only according to their contents but also on the basis of intertextual relations. A classic keyword search would only find a list of relevant documents where the context (the graph of linked documents) is created by navigating hypertextual links. The objective is to propose a framework allowing a user to ask a natural language query describing the types of documents, their intertextual relations and the semantic content he is interested in. Unlike the traditional document search method, the answer will be an aggregation of documents/parts of interconnected documents responding to its need.
The project "Economy of digital platforms" is linked to the ANR project CAPLA "Workers on tap. The social impact of platform capitalism". It aims at analyzing the actors taking part in the development of digital platforms (Deliveroo, Etsy, La Belle assiette, Uber...) in order to understand the transformations of contemporary capitalism beyond the current debate polarized between the praise of the “sharing economy” and the denounciation of “uberization” as a new form of exploitation. In order to examine the different profiles of the workers offering their services on digital platforms, to identify their employment status and the diversity of their work activities, so as to understand how this new economy works, data from the digital platforms will be obtained through web scraping methods and analyzed through statistical methods. Etsy and La Belle assiette will be the first platforms investigated.
Coordinator : Julien Jourdan (M&O - DRM) - homepage.
Organizational scandals—broadly defined as publicized organizational transgressions that run counter to established norms—are ubiquitous phenomena in modern society, with wide-reaching consequences. With only a narrow set of cases and outcomes studied so far, the literature provides critical yet limited insights into the topic of organizational scandals, highlighting the need for more data and empirical research. This project aims at developing critical skills and knowledge about using social media to accumulate data on past and unfolding corporate scandals, and gain a better understanding of their dynamics.
This research project aims at studying the effect of colonization from a new angle which has been so far neglected in the literature, namely trade diversification and trade policy. This research project proposes to use new original data on French colonial trade to bring new lights on the short and long term impact of colonization on both the geographical and the sectoral trade specialization. We will make use of the sectoral data provided by the French archives, to analyze the relationships between economic development and diversification.
The objective of the project is to develop an algorithm enabling the analysis of different network configurations and localizations of renewable generation on the spot electricity price and network congestion in the Italian Power Market, taking into account the natural resource endowment of a region. The research is based on three different databases: resource endowment; hourly transactions in the Italian spot market; list of plants with their technology and location.
The research is structured along three tasks: 1) identifying a relationship between the renewable resource availability (wind, hydro, sun) in each of the 6 regions and the actual production submitted by renewable generators to the spot electricity market; 2) deriving through machine learning the day-ahead market algorithm used by the market operator to calculate the day-ahead market equilibrium; 3) simulating the effects of renewable deployment and different network configurations on the level and the volatility of the spot electricity price as well as on network congestion.
The final objective is the creation of a simulation tool that can be used by regulators to assess the consequences of different policies on market efficiency and network reliability.
Our aim to extend current methods for preference elicitation and game theory by considering interactive aspects and to apply these methodologies in the context of big data problems. This project concerns interactive algorithmic methods to support decision-making in “complex” domains, where by “complex” we mean that the decision needs to be selected among a large set of possibilities. Indeed, the set of possible decisions might be very large when choices belong to a combinatorial domain (as when compute a product configuration), or when the choices consist in sequences of atomic decisions to be made at successive steps (in these settings we need to make a policy or plan, rather than a single-shot decision). Moreover, the outcomes associated with these choices may have stochastic elements, and may depend on other agents’ decisions (as in games). This is an interdisciplinary project between decision theory, artificial intelligence and game theory.
The "sauvegarde procedure" for weak financial firms was introduced in France in late 2005. In many ways it resemble the US chapter 11 bankruptcy procedure. Notably, as in the US, firm managers keep a significant role during the observation period. Before this "sauvegarde procedure" was introduced, weak financial firms had access to a more strict process for debt restructuration (known as "redressement judiciaire") were firm managers have little room for manoeuver during the observation period and a liquidation procedure similar to the US chapter 7. These two schemes still exist, and firms looking for restructuration seems to be more likely to choose the stricter redressement judicaire over the sauvegarde procedure. This is puzzling since the sauvegarde seems to bring better results in term of firm survival rate.
The aim of the empirical research is to understand the determinant of firm choices between the two restructuration procedures as well as the determinants of survival rate difference between the two procedure. To do so we exploit a dataset that contained all publicly information available on the judicial evolution of bankruptcy procedures at the firm level from 2008 to 2015.