function_leraf_2024-08-17

developer_uid: chai_backend_admin

submission_id: function_leraf_2024-08-17

model_name: gpt4-tl

model_group:

status: torndown

timestamp: 2024-08-17T05:49:29+00:00

num_battles: 8769

num_wins: 4197

celo_rating: 1216.68

family_friendly_score: 0.0

submission_type: function

display_name: gpt4-tl

is_internal_developer: True

ranking_group: single

us_pacific_date: 2024-08-16

win_ratio: 0.4786178583646938

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.1, 'top_k': 100, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n', 'You:'], 'max_input_tokens': 512, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': "{bot_name}'s Persona: {memory}\n####\n", 'prompt_template': '{prompt}\n<START>\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '{bot_name}:', 'truncate_by_message': False}

Resubmit model

Running pipeline stage StressChecker
Received healthy response to inference request in 3.824162244796753s
Failed to get response for submission undi95-meta-llama-3-70b_6209_v19: ('http://undi95-meta-llama-3-70b-6209-v19-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', 'request timeout')
Received healthy response to inference request in 4.214353561401367s
Received healthy response to inference request in 2.1274280548095703s
Received healthy response to inference request in 11.27430772781372s
Received healthy response to inference request in 6.105754375457764s
5 requests
0 failed requests
5th percentile: 2.466774892807007
10th percentile: 2.8061217308044433
20th percentile: 3.4848154067993167
30th percentile: 3.9022005081176756
40th percentile: 4.058277034759522
50th percentile: 4.214353561401367
60th percentile: 4.970913887023926
70th percentile: 5.727474212646484
80th percentile: 7.139465045928956
90th percentile: 9.20688638687134
95th percentile: 10.240597057342528
99th percentile: 11.067565593719483
mean time: 5.509201192855835
%s, retrying in %s seconds...
{"detail":"('http://chaiml-llama-8b-pairwise-8189-v4-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', 'read tcp 127.0.0.1:54918->127.0.0.1:8080: read: connection reset by peer\\n')"}
Received unhealthy response to inference request!
Received healthy response to inference request in 2.076833724975586s
{"detail":"('http://chaiml-llama-8b-pairwise-8189-v4-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '{\"error\":\"ValueError : [TypeError(\\\\\"\\'numpy.int64\\' object is not iterable\\\\\"), TypeError(\\'vars() argument must have __dict__ attribute\\')]\"}')"}
Received unhealthy response to inference request!
Received healthy response to inference request in 14.103046894073486s
Received healthy response to inference request in 1.4131765365600586s
5 requests
2 failed requests
5th percentile: 1.4470425605773927
10th percentile: 1.4809085845947265
20th percentile: 1.5486406326293944
30th percentile: 1.6813720703125
40th percentile: 1.879102897644043
50th percentile: 2.076833724975586
60th percentile: 2.530542182922363
70th percentile: 2.9842506408691403
80th percentile: 5.389493274688723
90th percentile: 9.746270084381106
95th percentile: 11.924658489227292
99th percentile: 13.667369213104248
mean time: 4.477333736419678
%s, retrying in %s seconds...
Received healthy response to inference request in 3.6742584705352783s
Failed to get response for submission undi95-meta-llama-3-70b_6209_v19: ('http://undi95-meta-llama-3-70b-6209-v19-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', 'request timeout')
Received healthy response to inference request in 2.0120668411254883s
Received healthy response to inference request in 2.3039894104003906s
Received healthy response to inference request in 3.000267744064331s
Received healthy response to inference request in 3.6779496669769287s
5 requests
0 failed requests
5th percentile: 2.070451354980469
10th percentile: 2.1288358688354494
20th percentile: 2.24560489654541
30th percentile: 2.4432450771331786
40th percentile: 2.721756410598755
50th percentile: 3.000267744064331
60th percentile: 3.26986403465271
70th percentile: 3.539460325241089
80th percentile: 3.6749967098236085
90th percentile: 3.6764731884002684
95th percentile: 3.6772114276885985
99th percentile: 3.677802019119263
mean time: 2.9337064266204833
Pipeline stage StressChecker completed in 66.84s
function_leraf_2024-08-17 status is now deployed due to DeploymentManager action
function_leraf_2024-08-17 status is now inactive due to auto deactivation removed underperforming models
function_leraf_2024-08-17 status is now torndown due to DeploymentManager action