Skip to content

Commit 665abc6

Browse files
authored
Merge pull request #5 from CompNet/dev
Merge Foppa 1.1.3
2 parents 2d3a095 + 5485ad9 commit 665abc6

2 files changed

Lines changed: 28 additions & 7 deletions

File tree

README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
1-
FoppaInit v1.0.2
1+
FoppaInit v1.0.3
22
-------------------------------------------------------------------------
33
*Initialization of the FOPPA database*
44

5-
* Copyright 2021-2023 Lucas Potin & Vincent Labatut
5+
* Copyright 2021-2024 Lucas Potin & Vincent Labatut
66

77
FoppaInit is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For source availability and license information see `licence.txt`
88

99
* **Lab site:** http://lia.univ-avignon.fr
1010
* **GitHub repo:** https://github.com/CompNet/FoppaInit
11-
* **Data:** https://doi.org/10.5281/zenodo.7808664
11+
* **Data:** https://doi.org/10.5281/zenodo.10879932
1212
* **Contact:** Lucas Potin <lucas.potin@univ-avignon.fr>
1313

1414
-------------------------------------------------------------------------
1515

1616
# Description
17-
These scripts create the FOPPA database v.1.1.2 from raw TED files. This database relies mainly on the award notices of public contracts related to French clients and suppliers from 2010 to 2020 in the Tenders Electronic Daily. It also proposes an enrichment of these data, thanks to the siretization of agents (i.e. the retrieval of their unique IDs, which is missing for most of them) as well as the cleaning and extraction of award criteria, and other processing.
17+
These scripts create the FOPPA database v.1.1.3 from raw TED files. This database relies mainly on the award notices of public contracts related to French clients and suppliers from 2010 to 2020 in the Tenders Electronic Daily. It also proposes an enrichment of these data, thanks to the siretization of agents (i.e. the retrieval of their unique IDs, which is missing for most of them) as well as the cleaning and extraction of award criteria, and other processing.
1818

1919
The process conducted to build the FOPPA is quite long, though (around 1 week, depeding on the hardware), so the produced database is alternatively directly available on [Zenodo](https://doi.org/10.5281/zenodo.7808664). The detail of this processing are described in an article [[P'23]](#references), and in further detail in a technical report [[P'22]](#references).
2020

@@ -64,9 +64,10 @@ Tested with Python version 3.8.0, with the following packages:
6464
* [`dedupe`](https://pypi.org/project/dedupe/): version 2.0.19.
6565

6666
# Data
67-
The produced database is directly available publicly online on [Zenodo](https://doi.org/10.5281/zenodo.7443842), under three different forms:
67+
The produced database is directly available publicly online on [Zenodo](https://doi.org/10.5281/zenodo.10879932), under four different forms:
6868
* SQLite file: https://www.sqlite.org/index.html
69-
* SQL dump.
69+
* SQLite dump.
70+
* MySQL dump.
7071
* CSV files (one by table).
7172

7273
# References

foppaInit.py

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ def databaseCreation(nameDatabase):
219219
cursor = database.cursor()
220220
request = "DROP TABLE IF EXISTS Lots"
221221
sql = cursor.execute(request)
222-
request = "CREATE TABLE Lots(lotId INTEGER,tedCanId INTEGER,correctionsNB INTEGER,cancelled INTEGER,awardDate TEXT,awardEstimatedPrice NUMERIC,awardPrice NUMERIC,cpv TEXT,tenderNumber INTEGER,onBehalf TINYINT,jointProcurement TINYINT,fraAgreement TINYINT,fraEstimated INTEGER,lotsNumber INTEGER,accelerated TINYINT,outOfDirectives TINYINT,contractorSme TINYINT,numberTendersSme INTEGER,subContracted TINYINT,gpa TINYINT,multipleCae TINYINT,typeOfContract TEXT,topType TEXT,renewal TINYINT, contractDuration INTEGER, publicityDuration INTEGER,PRIMARY KEY(lotId))"
222+
request = "CREATE TABLE Lots(lotId INTEGER,tedCanId INTEGER,correctionsNB INTEGER,cancelled INTEGER,awardDate TEXT,awardEstimatedPrice NUMERIC,awardPrice NUMERIC,cpv TEXT,tenderNumber INTEGER,onBehalf TINYINT,jointProcurement TINYINT,fraAgreement TINYINT,fraEstimated TEXT,lotsNumber INTEGER,accelerated TINYINT,outOfDirectives TINYINT,contractorSme TINYINT,numberTendersSme INTEGER,subContracted TINYINT,gpa TINYINT,multipleCae TINYINT,typeOfContract TEXT,topType TEXT,renewal TINYINT, contractDuration NUMERIC, publicityDuration NUMERIC,PRIMARY KEY(lotId))"
223223
sql = cursor.execute(request)
224224
request = "DROP TABLE IF EXISTS AgentsBase"
225225
sql = cursor.execute(request)
@@ -275,6 +275,26 @@ def firstCleaning(datas,database):
275275

276276
# Parenthesis deletion
277277
datas["CAE_NAME"] = datas["CAE_NAME"].replace(regex=r'\([^)]*\)',value=r"")
278+
279+
# Replace "Y" by 1 and "N" by 0 on boolean columns
280+
datas["B_ON_BEHALF"] = datas["B_ON_BEHALF"].replace(regex=r'Y',value=r"1")
281+
datas["B_ON_BEHALF"] = datas["B_ON_BEHALF"].replace(regex=r'N',value=r"0")
282+
datas["B_INVOLVES_JOINT_PROCUREMENT"] = datas["B_INVOLVES_JOINT_PROCUREMENT"].replace(regex=r'Y',value=r"1")
283+
datas["B_INVOLVES_JOINT_PROCUREMENT"] = datas["B_INVOLVES_JOINT_PROCUREMENT"].replace(regex=r'N',value=r"0")
284+
datas["B_FRA_AGREEMENT"] = datas["B_FRA_AGREEMENT"].replace(regex=r'Y',value=r"1")
285+
datas["B_FRA_AGREEMENT"] = datas["B_FRA_AGREEMENT"].replace(regex=r'N',value=r"0")
286+
datas["B_ACCELERATED"] = datas["B_ACCELERATED"].replace(regex=r'Y',value=r"1")
287+
datas["B_ACCELERATED"] = datas["B_ACCELERATED"].replace(regex=r'N',value=r"0")
288+
datas["B_OUT_OF_DIRECTIVES"] = datas["B_OUT_OF_DIRECTIVES"].replace(regex=r'Y',value=r"1")
289+
datas["B_OUT_OF_DIRECTIVES"] = datas["B_OUT_OF_DIRECTIVES"].replace(regex=r'N',value=r"0")
290+
datas["B_CONTRACTOR_SME"] = datas["B_CONTRACTOR_SME"].replace(regex=r'Y',value=r"1")
291+
datas["B_CONTRACTOR_SME"] = datas["B_CONTRACTOR_SME"].replace(regex=r'N',value=r"0")
292+
datas["B_SUBCONTRACTED"] = datas["B_SUBCONTRACTED"].replace(regex=r'Y',value=r"1")
293+
datas["B_SUBCONTRACTED"] = datas["B_SUBCONTRACTED"].replace(regex=r'N',value=r"0")
294+
datas["B_GPA"] = datas["B_GPA"].replace(regex=r'Y',value=r"1")
295+
datas["B_GPA"] = datas["B_GPA"].replace(regex=r'N',value=r"0")
296+
datas["B_MULTIPLE_CAE"] = datas["B_MULTIPLE_CAE"].replace(regex=r'Y',value=r"1")
297+
datas["B_MULTIPLE_CAE"] = datas["B_MULTIPLE_CAE"].replace(regex=r'N',value=r"0")
278298

279299
nameCAE = np.array(datas["CAE_NAME"])
280300
siretCAE = np.array(datas["CAE_NATIONALID"])

0 commit comments

Comments
 (0)