Skip to content

Commit 9d1bddd

Browse files
committed
add gsoc report for vulntotal
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
1 parent fb2ecb0 commit 9d1bddd

2 files changed

Lines changed: 234 additions & 0 deletions

File tree

docs/source/archive/gsoc-toc.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ GSoC 2022
1616

1717
gsoc/reports/2022/scancodeio_akhil
1818
gsoc/reports/2022/scancode_workbench_omkar
19+
gsoc/reports/2022/vulnerablecode_vulntotal_keshav
1920

2021
GSoC 2021
2122
---------
Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
VulnTotal: Tool for cross-validating vulnerability
2+
==================================================
3+
4+
Organization - `AboutCode <https://www.aboutcode.org>`_
5+
-----------------------------------------------------------
6+
| **Keshav Priyadarshi**
7+
| GitHub: `keshav-space <https://github.com/keshav-space>`_
8+
| LinkedIn: `@keshav-space <https://www.linkedin.com/in/keshav-space>`_
9+
| Project: `VulnTotal <https://github.com/nexB/vulnerablecode/tree/vulntotal/vulntotal>`_
10+
| Proposal: `Link <https://docs.google.com/document/d/1it5eKwIiSsnuKuMAPhP1SoYiq412bdPmuAWNN25ZVAY/edit>`_
11+
12+
Overview
13+
---------
14+
15+
VulnTotal ``cross-validates`` the vulnerability coverage of publicly available
16+
vulnerability check tools and databases. It's inspired by the VirusTotal
17+
multi-scanner virus scanning service. There are scenarios where a package
18+
is reported as vulnerable by some tools or databases but not by others,
19+
VulnTotal helps in detection such anomaly. We can gradually work with
20+
these tool providers to keep each other apprised about newly discovered
21+
vulnerabilities and anomaly, making FOSS more secure.
22+
23+
Sneak Peek
24+
-----------------
25+
26+
.. figure:: https://user-images.githubusercontent.com/44315208/188985807-b13e2c08-dd5c-40ec-8f8d-6b15b6d6f4db.gif
27+
28+
VulnTotal takes PURL as an argument and returns vulnerability data from various data sources.
29+
By default, vulnerability data is grouped by CVE.
30+
31+
.. note::
32+
A PURL is a URL string used to identify and locate a software package in a mostly universal and uniform
33+
way across programming languages, package managers, packaging conventions, tools, APIs and databases.
34+
`more on PURL <https://github.com/package-url>`_
35+
36+
VulnTotal Development - Walkthrough
37+
------------------------------------
38+
39+
Initial Configuration
40+
^^^^^^^^^^^^^^^^^^^^^^^^
41+
42+
The initial PR and commits outlined the core structure and implemented
43+
``VendorData`` and ``DataSource`` inside ``validator.py``.
44+
45+
**VendorData** is dataclass that encapsulates ``aliases``,
46+
``affected_versions`` and ``fixed_versions`` for a vulnerability.
47+
48+
**DataSource** outlines core methords such as ``datasource_advisory`` and
49+
``supported_ecosystem`` to be implemented by subclass.
50+
51+
52+
Below is the tree view of VulnTotal for better understanding ::
53+
54+
vulntotal
55+
├── validator.py
56+
├── vulntotal_cli.py
57+
├── vulntotal_utils.py
58+
├── datasources
59+
│ ├── __init__.py
60+
│ ├── deps.py
61+
│ ├── github.py
62+
│ ├── gitlab.py
63+
│ ├── oss.py
64+
│ ├── osv.py
65+
│ ├── snyk.py
66+
│ └── vulnerablecode.py
67+
└── tests
68+
├── test_deps.py
69+
├── test_github.py
70+
├── test_oss.py
71+
├── test_osv.py
72+
├── test_snyk.py
73+
├── test_vulnerablecode.py
74+
└── test_data
75+
├── deps/
76+
├── github/
77+
├── oss_index/
78+
├── osv/
79+
├── snyk/
80+
└── vulnerablecode/
81+
82+
PR and commits related to initial configuration
83+
84+
* `nexB/vulnerablecode#777 <https://github.com/nexB/vulnerablecode/pull/777>`_
85+
* `nexB/vulnerablecode#2176cb11 <https://github.com/nexB/vulnerablecode/commit/2176cb119614b0381ebd56551779266747f9a871>`_
86+
* `nexB/vulnerablecode#922859f3 <https://github.com/nexB/vulnerablecode/commit/922859f3c198eb0e78b51f0f4600bbb872059bed>`_
87+
* `nexB/vulnerablecode#78dd5ae7 <https://github.com/nexB/vulnerablecode/commit/78dd5ae7f736874b05764b935968e2e79559feb1>`_
88+
89+
Adding DataSource
90+
^^^^^^^^^^^^^^^^^^
91+
92+
The initial config made adding datasource fairly smooth. AnyNewDataSource just needed to
93+
inherit ``DataSource`` and implement ``datasource_advisory`` and ``supported_ecosystem``
94+
95+
**datasource_advisory** is core method that takes PURL as an arguments and yields ``VendorData``
96+
97+
**supported_ecosystem** should return a dictionary that maps PURL equivalent of ecosystem
98+
(aka purl.type) to DataSource equivalent ecosystem.
99+
100+
101+
Currently Supported DataSource
102+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
103+
104+
1. Open Source Vulnerability <osv.dev>
105+
+++++++++++++++++++++++++++++++++++++++++
106+
107+
OSV provides API end-point for querying package vulnerability. Unfortunately NuGet package names aren't
108+
case normalized by OSV. So the OSVDataSource employs NuGet SearchQueryService for
109+
discovering the valid case-sensitive package name and then uses that to query OSV.
110+
For more on this issue see `nexB/vulnerablecode/#800 <https://github.com/nexB/vulnerablecode/issues/800>`_
111+
112+
Related PR: `nexB/vulnerablecode#788 <https://github.com/nexB/vulnerablecode/pull/788>`_
113+
114+
115+
2. Open Source Insights <deps.dev>
116+
++++++++++++++++++++++++++++++++++++
117+
118+
Writing datasource for deps was quite uneventful. Deps doesn't provide any documented API except
119+
for GCP BigQuery, but it does have obfuscated API and DepsDataSource makes use of that.
120+
121+
Related PR: `nexB/vulnerablecode#789 <https://github.com/nexB/vulnerablecode/pull/789>`_
122+
123+
124+
3. GitHub Advisory Database
125+
++++++++++++++++++++++++++++
126+
127+
GitHub provide GraphQL end-point for querying package vulnerability, but it comes with a caveat
128+
that one can't query a specific version of a particular package. It dumps vulnerability related to
129+
all the versions of a particular package. For this vulntotal_utils implements a specialized method
130+
``github_constraints_satisfied`` to filters out vulnerabilities for specific version.
131+
132+
Related PR: `nexB/vulnerablecode#804 <https://github.com/nexB/vulnerablecode/pull/804>`_
133+
134+
135+
4. Sonatype OSS Index
136+
+++++++++++++++++++++++++
137+
138+
OSSIndexDataSource makes use of oss-index API. OSS-Index only provides CVE's related
139+
particular package version and makes no mention of either the affected package versions
140+
or fixed package version.
141+
142+
Related PR: `nexB/vulnerablecode#829 <https://github.com/nexB/vulnerablecode/pull/829>`_
143+
144+
145+
5. VulnerableCode Advisory Database
146+
++++++++++++++++++++++++++++++++++++++
147+
148+
VulnerableCodeDataSource currently make use of local VulnerableCode instance, but soon
149+
will be migrated to global instance.
150+
151+
Related PR: `nexB/vulnerablecode#832 <https://github.com/nexB/vulnerablecode/pull/832>`_
152+
153+
154+
6. Snyk Vulnerability Database
155+
+++++++++++++++++++++++++++++++++++
156+
157+
Snyk comes with no API whatsoever, so had to restore to web scrapping using BeautifulSoup.
158+
A specialized method ``snky_constraints_satisfied`` was implemented just filter out
159+
vulnerabilities for specific version.
160+
Among all the datasources currently available, Snyk is the only one that keeps track
161+
of malicious packages.
162+
163+
164+
Related PR: `nexB/vulnerablecode#842 <https://github.com/nexB/vulnerablecode/pull/842>`_
165+
166+
167+
7. GitLab Gemnasium Advisory Database
168+
+++++++++++++++++++++++++++++++++++++++++
169+
170+
Again, GitLab comes with no API, so GitlabDataSource is designed to directly
171+
fetch package vulnerability data from GitLab gemnasium
172+
repository. For case-sensitive package name, GitLab GraphQL end-point is
173+
used to get the exact case-sensitive package name.
174+
A similar method ``gitlab_constraints_satisfied`` is implemented to filter out
175+
vulnerabilities for specific version.
176+
177+
Related PR: `nexB/vulnerablecode#883 <https://github.com/nexB/vulnerablecode/pull/883>`_
178+
179+
180+
Automatic Datasourse Registery
181+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
182+
183+
All new Datasource must be added to ``DATASOURCE_REGISTERY`` to make it available for use.
184+
Fortunately ``__init__.py`` is configured to take care of this, as soon as a new and valid
185+
datasource file is added inside datasources directory it automatically gets registered
186+
and vice versa.
187+
188+
Related PR: `nexB/vulnerablecode#901 <https://github.com/nexB/vulnerablecode/pull/901>`_
189+
190+
Command-line Interface
191+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
192+
193+
VulnTotal CLI takes PURL as an argument and returns vulnerability data from various data sources.
194+
By default, vulnerability data is grouped by CVE. It also supports JSON and YAML data dump.
195+
Since most datasources are Network I/O intensive, so by default CLI makes use of ThreadPoolExecutor
196+
for better efficiency.
197+
198+
Related PR: `nexB/vulnerablecode#801 <https://github.com/nexB/vulnerablecode/pull/801>`_
199+
200+
.. tip::
201+
| CLI comes with lots of hidden features that are specially useful while debugging a datasource.
202+
| Look inside ``vulntotal_cli.py`` to discover them all.
203+
204+
Pre GSoC
205+
----------
206+
207+
* Test sorting of all the OpenSSL versions ever released. `nexB/univers#61 <https://github.com/nexB/univers/pull/61>`_
208+
* Migrate OpenSSL importer to importer-improver model. `nexB/vulnerablecode#690 <https://github.com/nexB/vulnerablecode/pull/690>`_
209+
* Correct notes for cvssv3.1_qr. `nexB/vulnerablecode#599 <https://github.com/nexB/vulnerablecode/pull/599>`_
210+
* Add from_versions in VersionRange. `nexB/univers#55 <https://github.com/nexB/univers/pull/55>`_
211+
* Add OpenSSL support in univers. `nexB/univers#42 <https://github.com/nexB/univers/pull/42>`_
212+
* Fix for NpmVersionRange.from_native and README. `nexB/univers#34 <https://github.com/nexB/univers/pull/34>`_
213+
* Add black code-style test for skeleton. `nexB/skeleton#56 <https://github.com/nexB/skeleton/pull/56>`_
214+
215+
Post GSoC - Future Plans & Suggestions
216+
---------------------------------------
217+
218+
* Support query using aliases. `nexB/vulnerablecode/#824 <https://github.com/nexB/vulnerablecode/issues/824>`_
219+
* Adding more DataSource like mend.io. `nexB/vulnerablecode/#835 <https://github.com/nexB/vulnerablecode/issues/835>`_
220+
* Support for API and Web UI.
221+
* Cluster analysis of advisory fetched from different DataSources. `nexB/vulnerablecode#822 <https://github.com/nexB/vulnerablecode/issues/822>`_
222+
* Handle forever vulnerable packages in VulnerableCode `nexB/vulnerablecode#855 <https://github.com/nexB/vulnerablecode/issues/855>`_
223+
224+
225+
Closing Thoughts
226+
-------------------
227+
228+
Thoroughly enjoyed working on this project. Weekly calls were greatly helpful and thanks to
229+
`Philippe <https://github.com/pombredanne>`_, `Hritik <https://github.com/hritik14>`_,
230+
`Tushar <https://github.com/TG1999>`_, `Shivam <https://github.com/sbs2001>`_ for the
231+
thoughtful inputs. Learned a lot about various interesting projects and what it takes
232+
to tame some of the real world problems. Greatly enhanced my ability to conduct myself
233+
in an open source world. All in all it's been a remarkable journey.

0 commit comments

Comments
 (0)