Skip to content

Commit 204bc45

Browse files
committed
jekyll page rebuild
1 parent 87e93bd commit 204bc45

1 file changed

Lines changed: 117 additions & 0 deletions

File tree

_posts/2023/2023-04/2023-04-18-FauxPilot-开源插件-GitHub-Copilot .md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,66 @@ This is an attempt to build a locally hosted version of [GitHub Copilot](https:/
103103

104104
## Prerequisites
105105

106+
You'll need:
107+
108+
- Docker
109+
- `docker-compose` >= 1.28
110+
- An NVIDIA GPU with Compute Capability >= 7.0 and enough VRAM to run the model you want.
111+
- [nvidia-docker](https://cloud.tencent.com/developer/article/2126651)
112+
- `curl` and `zstd `for downloading and unpacking the models.
113+
114+
115+
## Support and Warranty
116+
117+
lmao
118+
119+
## Setup
120+
121+
Run the setup script to choose a model to use. This will download the model from Huggingface and then convert it for use with FasterTransformer.
122+
123+
$ ./setup.sh
124+
Models available:
125+
[1] codegen-350M-mono (2GB total VRAM required; Python-only)
126+
[2] codegen-350M-multi (2GB total VRAM required; multi-language)
127+
[3] codegen-2B-mono (7GB total VRAM required; Python-only)
128+
[4] codegen-2B-multi (7GB total VRAM required; multi-language)
129+
[5] codegen-6B-mono (13GB total VRAM required; Python-only)
130+
[6] codegen-6B-multi (13GB total VRAM required; multi-language)
131+
[7] codegen-16B-mono (32GB total VRAM required; Python-only)
132+
[8] codegen-16B-multi (32GB total VRAM required; multi-language)
133+
Enter your choice [6]: 2
134+
Enter number of GPUs [1]: 1
135+
Where do you want to save the model [/home/moyix/git/fauxpilot/models]? /fastdata/mymodels
136+
Downloading and converting the model, this will take a while...
137+
Converting model codegen-350M-multi with 1 GPUs
138+
Loading CodeGen model
139+
Downloading config.json: 100%|██████████| 996/996 [00:00<00:00, 1.25MB/s]
140+
Downloading pytorch_model.bin: 100%|██████████| 760M/760M [00:11<00:00, 68.3MB/s]
141+
Creating empty GPTJ model
142+
Converting...
143+
Conversion complete.
144+
Saving model to codegen-350M-multi-hf...
145+
146+
=============== Argument ===============
147+
saved_dir: /models/codegen-350M-multi-1gpu/fastertransformer/1
148+
in_file: codegen-350M-multi-hf
149+
trained_gpu_num: 1
150+
infer_gpu_num: 1
151+
processes: 4
152+
weight_data_type: fp32
153+
========================================
154+
transformer.wte.weight
155+
transformer.h.0.ln_1.weight
156+
[... more conversion output trimmed ...]
157+
transformer.ln_f.weight
158+
transformer.ln_f.bias
159+
lm_head.weight
160+
lm_head.bias
161+
Done! Now run ./launch.sh to start the FauxPilot server.
162+
163+
164+
Then you can just run `./launch.sh:`
165+
106166

107167
$ ./launch.sh
108168
[+] Running 2/0
@@ -193,3 +253,60 @@ This is an attempt to build a locally hosted version of [GitHub Copilot](https:/
193253
fauxpilot-triton-1 | I0803 01:51:04.740423 93 grpc_server.cc:4587] Started GRPCInferenceService at 0.0.0.0:8001
194254
fauxpilot-triton-1 | I0803 01:51:04.740608 93 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
195255
fauxpilot-triton-1 | I0803 01:51:04.781561 93 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
256+
257+
258+
## API
259+
260+
Once everything is up and running, you should have a server listening for requests on `http://localhost:5000`. You can now talk to it using the standard [OpenAIAPI](https://platform.openai.com/docs/api-reference/) (although the full API isn't implemented yet). For example, from Python, using the [OpenAI Python bindings](https://github.com/openai/openai-python):
261+
262+
$ ipython
263+
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
264+
Type 'copyright', 'credits' or 'license' for more information
265+
IPython 8.2.0 -- An enhanced Interactive Python. Type '?' for help.
266+
267+
In [1]: import openai
268+
269+
In [2]: openai.api_key = 'dummy'
270+
271+
In [3]: openai.api_base = 'http://127.0.0.1:5000/v1'
272+
273+
In [4]: result = openai.Completion.create(engine='codegen', prompt='def hello', max_tokens=16, temperature=0.1, stop=["\n\n"])
274+
275+
In [5]: result
276+
Out[5]:
277+
<OpenAIObject text_completion id=cmpl-6hqu8Rcaq25078IHNJNVooU4xLY6w at 0x7f602c3d2f40> JSON: {
278+
"choices": [
279+
{
280+
"finish_reason": "stop",
281+
"index": 0,
282+
"logprobs": null,
283+
"text": "() {\n return \"Hello, World!\";\n}"
284+
}
285+
],
286+
"created": 1659492191,
287+
"id": "cmpl-6hqu8Rcaq25078IHNJNVooU4xLY6w",
288+
"model": "codegen",
289+
"object": "text_completion",
290+
"usage": {
291+
"completion_tokens": 15,
292+
"prompt_tokens": 2,
293+
"total_tokens": 17
294+
}
295+
}
296+
297+
298+
## Copilot Plugin
299+
300+
Perhaps more excitingly, you can configure the official [VSCode Copilot plugin](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot) to use your local server. Just edit your `settings.json` to add:
301+
302+
``` json
303+
"github.copilot.advanced": {
304+
"debug.overrideEngine": "codegen",
305+
"debug.testOverrideProxyUrl": "http://localhost:5000",
306+
"debug.overrideProxyUrl": "http://localhost:5000"
307+
}
308+
```
309+
310+
And you should be able to use Copilot with your own locally hosted suggestions! Of course, probably a lot of stuff is subtly broken. In particular, the probabilities returned by the server are partly fake. Fixing this would require changing FasterTransformer so that it can return log-probabilities for the top k tokens rather that just the chosen token.
311+
312+
Have fun!

0 commit comments

Comments
 (0)