Skip to content

Commit 2d3a095

Browse files
authored
Merge pull request #4 from CompNet/dev
Version 1.0.2
2 parents a4e21ba + 1d1358b commit 2d3a095

2 files changed

Lines changed: 28 additions & 29 deletions

File tree

README.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,24 @@
1-
FoppaInit v1.0.1
1+
FoppaInit v1.0.2
22
-------------------------------------------------------------------------
33
*Initialization of the FOPPA database*
44

5-
* Copyright 2021-2022 Lucas Potin & Vincent Labatut
5+
* Copyright 2021-2023 Lucas Potin & Vincent Labatut
66

77
FoppaInit is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For source availability and license information see `licence.txt`
88

99
* **Lab site:** http://lia.univ-avignon.fr
1010
* **GitHub repo:** https://github.com/CompNet/FoppaInit
11-
* **Data:** https://doi.org/10.5281/zenodo.7443842
11+
* **Data:** https://doi.org/10.5281/zenodo.7808664
1212
* **Contact:** Lucas Potin <lucas.potin@univ-avignon.fr>
1313

1414
-------------------------------------------------------------------------
1515

1616
# Description
17-
These scripts create the FOPPA database v.1.1.1 from raw TED files. This database relies mainly on the award notices of public contracts related to French clients and suppliers from 2010 to 2020 in the Tenders Electronic Daily. It also proposes an enrichment of these data, thanks to the siretization of agents (i.e. the retrieval of their unique IDs, which is missing for most of them) as well as the cleaning and extraction of award criteria, and other processing.
17+
These scripts create the FOPPA database v.1.1.2 from raw TED files. This database relies mainly on the award notices of public contracts related to French clients and suppliers from 2010 to 2020 in the Tenders Electronic Daily. It also proposes an enrichment of these data, thanks to the siretization of agents (i.e. the retrieval of their unique IDs, which is missing for most of them) as well as the cleaning and extraction of award criteria, and other processing.
1818

19-
The process conducted to build the FOPPA is quite long, though (around 1 week, depeding on the hardware), so the produced database is alternatively directly available on [Zenodo](https://doi.org/10.5281/zenodo.7443842). The detail of this processing are described in a technical report [P'22].
19+
The process conducted to build the FOPPA is quite long, though (around 1 week, depeding on the hardware), so the produced database is alternatively directly available on [Zenodo](https://doi.org/10.5281/zenodo.7808664). The detail of this processing are described in an article [[P'23]](#references), and in further detail in a technical report [[P'22]](#references).
2020

21-
This work was conducted in the framework of the [DeCoMaP](https://anr.fr/Projet-ANR-19-CE38-0004) ANR project (*Detection of corruption in public procurement markets* -- `ANR-19-CE38-0004`). If you use this source code or the produced database, please cite bibliographical reference [P'22].
21+
This work was conducted in the framework of the [DeCoMaP](https://anr.fr/Projet-ANR-19-CE38-0004) ANR project (*Detection of corruption in public procurement markets* -- `ANR-19-CE38-0004`). If you use this source code or the produced database, please cite bibliographical reference [[P'23]](#references) (the article, not the report).
2222

2323
# Organization
2424
This repository is composed of the following elements:
@@ -51,7 +51,7 @@ In order to build the FOPPA database:
5151

5252
The script is going to perform several tasks:
5353
1. Download all the necessary data (see Section *Organization*).
54-
2. Apply the processing described in [P'22].
54+
2. Apply the processing described in [[P'22]](#references).
5555
3. Export the resulting database under different forms (SQL dump, CSV sheets).
5656

5757
# Dependencies
@@ -70,6 +70,5 @@ The produced database is directly available publicly online on [Zenodo](https://
7070
* CSV files (one by table).
7171

7272
# References
73-
**[P'22]** L. Potin, V. Labatut, R. Figueiredo, C. Largeron & P. H. Morand. *FOPPA: A database of French Open Public Procurement Award notices*, Technical Report, Avignon Université, 2022. [⟨hal-03796734⟩](https://hal.archives-ouvertes.fr/hal-03796734)
74-
75-
73+
* **[P'23]** L. Potin, V. Labatut, P. H. Morand & C. Largeron. *FOPPA: An Open Database of French Public Procurement Award Notices From 2010–2020*, Scientific Data, 2023, 10:303. DOI: [10.1038/s41597-023-02213-z](https://dx.doi.org/10.1038/s41597-023-02213-z) [⟨hal-04101350⟩](https://hal.archives-ouvertes.fr/hal-04101350)
74+
* **[P'22]** L. Potin, V. Labatut, R. Figueiredo, C. Largeron & P. H. Morand. *FOPPA: A database of French Open Public Procurement Award notices*, Technical Report, Avignon Université, 2022. [⟨hal-03796734⟩](https://hal.archives-ouvertes.fr/hal-03796734)

foppaInit.py

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -237,13 +237,13 @@ def databaseCreation(nameDatabase):
237237
sql = cursor.execute(request)
238238
request = "CREATE TABLE CriteriaTemp (lotId INTEGER,CRIT_PRICE_WEIGHT TEXT,CRIT_WEIGHTS TEXT, CRIT_CRITERIA TEXT)"
239239
sql = cursor.execute(request)
240-
request = "DROP TABLE IF EXISTS LotClients"
240+
request = "DROP TABLE IF EXISTS LotBuyers"
241241
sql = cursor.execute(request)
242-
request = "CREATE TABLE LotClients(lotId INTEGER,agentId INTEGER,FOREIGN KEY(agentId) REFERENCES Agents(agentId) ON UPDATE CASCADE,FOREIGN KEY(lotId) REFERENCES Lots(lotId) ON UPDATE CASCADE)"
242+
request = "CREATE TABLE LotBuyers(lotId INTEGER,agentId INTEGER,FOREIGN KEY(agentId) REFERENCES Agents(agentId) ON UPDATE CASCADE,FOREIGN KEY(lotId) REFERENCES Lots(lotId) ON UPDATE CASCADE)"
243243
sql = cursor.execute(request)
244-
request = "DROP TABLE IF EXISTS LotSuppliers"
244+
request = "DROP TABLE IF EXISTS LotWinners"
245245
sql = cursor.execute(request)
246-
request = "CREATE TABLE LotSuppliers(lotId INTEGER,agentId INTEGER,FOREIGN KEY(agentId) REFERENCES Agents(agentId) ON UPDATE CASCADE,FOREIGN KEY(lotId) REFERENCES Lots(lotId) ON UPDATE CASCADE)"
246+
request = "CREATE TABLE LotWinners(lotId INTEGER,agentId INTEGER,FOREIGN KEY(agentId) REFERENCES Agents(agentId) ON UPDATE CASCADE,FOREIGN KEY(lotId) REFERENCES Lots(lotId) ON UPDATE CASCADE)"
247247
sql = cursor.execute(request)
248248
request = "DROP TABLE IF EXISTS Names"
249249
sql = cursor.execute(request)
@@ -384,7 +384,7 @@ def firstCleaning(datas,database):
384384
VALUES (?,?,?,?,?,?,?,?,?)'''
385385
val = (compteurAgent,tempName,tempSiret,tempAddress,tempTown,tempPC,tempCountry,date,"CAE")
386386
cur.execute(sql,val)
387-
sql = ''' INSERT INTO LotClients(lotId,agentId)
387+
sql = ''' INSERT INTO LotBuyers(lotId,agentId)
388388
VALUES (?,?)'''
389389
val = (i,compteurAgent)
390390
cur.execute(sql,val)
@@ -443,7 +443,7 @@ def firstCleaning(datas,database):
443443
VALUES (?,?,?,?,?,?,?,?,?)'''
444444
val = (compteurAgent,tempName,tempSiret,tempAddress,tempTown,tempPC,tempCountry,date,"WIN")
445445
cur.execute(sql,val)
446-
sql = ''' INSERT INTO LotSuppliers(lotId,agentId)
446+
sql = ''' INSERT INTO LotWinners(lotId,agentId)
447447
VALUES (?,?)'''
448448
val = (i,compteurAgent)
449449
cur.execute(sql,val)
@@ -1258,22 +1258,22 @@ def dedupeAgent(datas,database):
12581258

12591259
def finalTableAgent(database):
12601260
cursor = database.cursor()
1261-
clients = pd.read_sql_query("SELECT * FROM LotClients", database,dtype=str)
1262-
suppliers = pd.read_sql_query("SELECT * FROM LotSuppliers", database,dtype=str)
1261+
clients = pd.read_sql_query("SELECT * FROM LotBuyers", database,dtype=str)
1262+
suppliers = pd.read_sql_query("SELECT * FROM LotWinners", database,dtype=str)
12631263
names = pd.read_sql_query("SELECT * FROM Names", database,dtype=str)
12641264
datas = pd.read_csv("ResDedupe.csv",dtype=str,sep=",")
12651265
lenCluster = len(datas.groupby("Cluster ID").count())
12661266
request = "DROP TABLE IF EXISTS Agents"
12671267
sql = cursor.execute(request)
12681268
request = "CREATE TABLE Agents(agentId INTEGER,name TEXT,siret TEXT,address TEXT,city TEXT,zipcode TEXT,country TEXT, department TEXT,longitude TEXT, latitude TEXT,PRIMARY KEY(agentId))"
12691269
sql = cursor.execute(request)
1270-
request = "DROP TABLE IF EXISTS LotClients"
1270+
request = "DROP TABLE IF EXISTS LotBuyers"
12711271
sql = cursor.execute(request)
1272-
request = "CREATE TABLE LotClients(lotId INTEGER,agentId INTEGER)"
1272+
request = "CREATE TABLE LotBuyers(lotId INTEGER,agentId INTEGER)"
12731273
sql = cursor.execute(request)
1274-
request = "DROP TABLE IF EXISTS LotSuppliers"
1274+
request = "DROP TABLE IF EXISTS LotWinners"
12751275
sql = cursor.execute(request)
1276-
request = "CREATE TABLE LotSuppliers(lotId INTEGER,agentId INTEGER)"
1276+
request = "CREATE TABLE LotWinners(lotId INTEGER,agentId INTEGER)"
12771277
sql = cursor.execute(request)
12781278
request = "DROP TABLE IF EXISTS Names"
12791279
sql = cursor.execute(request)
@@ -1297,14 +1297,14 @@ def finalTableAgent(database):
12971297

12981298
for i in range(len(clientsLot)):
12991299
if (int(clientsAgent[i]) in dico):
1300-
sql = ''' INSERT OR IGNORE INTO LotClients(lotId,agentId)
1300+
sql = ''' INSERT OR IGNORE INTO LotBuyers(lotId,agentId)
13011301
VALUES (?,?)'''
13021302
val = (int(clientsLot[i]),int(dico[int(clientsAgent[i])]))
13031303
cursor.execute(sql,val)
13041304

13051305
for i in range(len(suppliersLot)):
13061306
if (int(suppliersAgent[i]) in dico):
1307-
sql = ''' INSERT OR IGNORE INTO LotSuppliers(lotId,agentId)
1307+
sql = ''' INSERT OR IGNORE INTO LotWinners(lotId,agentId)
13081308
VALUES (?,?)'''
13091309
val = (int(suppliersLot[i]),int(dico[int(suppliersAgent[i])]))
13101310
cursor.execute(sql,val)
@@ -1522,10 +1522,10 @@ def exportDatabase(database):
15221522
agents.to_csv("FOPPA/csv/Agents.csv",index=False)
15231523
criteria = pd.read_sql_query("SELECT * FROM Criteria", database)
15241524
criteria.to_csv("FOPPA/csv/Criteria.csv",index=False)
1525-
lotsClients = pd.read_sql_query("SELECT * FROM LotClients", database)
1526-
lotsClients.to_csv("FOPPA/csv/LotClients.csv",index=False)
1527-
lotsSuppliers = pd.read_sql_query("SELECT * FROM LotSuppliers", database)
1528-
lotsSuppliers.to_csv("FOPPA/csv/LotSuppliers.csv",index=False)
1525+
lotsClients = pd.read_sql_query("SELECT * FROM LotBuyers", database)
1526+
lotsClients.to_csv("FOPPA/csv/LotBuyers.csv",index=False)
1527+
lotsSuppliers = pd.read_sql_query("SELECT * FROM LotWinners", database)
1528+
lotsSuppliers.to_csv("FOPPA/csv/LotWinners.csv",index=False)
15291529
names = pd.read_sql_query("SELECT * FROM Names", database)
15301530
names.to_csv("FOPPA/csv/Names.csv",index=False)
15311531

@@ -1599,4 +1599,4 @@ def exportDatabase(database):
15991599
db.close()
16001600
#os.remove("Foppa.db")
16011601
del db
1602-
1602+

0 commit comments

Comments
 (0)