I Gutted The Entire Sports Betting Algorithm…

Jul 23, 2023

Baby neural networks, point spreads, and whole heaps of fun.

30 Comments

Good morning, I was getting some NaN on the team names this morning. It appears that the short name coming from the API has been altered. So the CSV file with the Long names needed to be edited. Ex. 'ARI Diamondbacks' to 'ARI Diamondbacks - ARI Diamondbacks' Just wanted to give you the heads up. Cheers

Expand full comment

Jonathan Duffield

ARI Diamondbacks is the short name not long name, so I'm slightly confused.

short_name,long_name

ARI Diamondbacks,Arizona Diamondbacks

CHI Cubs,Chicago Cubs

MIA Marlins,Miami Marlins

Expand full comment

The api has been inconsistent with the naming. To eliminate the code returning NaN as a team name i commented out the name conversions section.

Expand full comment

Jonathan Duffield

Name conversions section?

Expand full comment

Sorry for the delay in reply Jonathan. You can comment out these two lines: # Merged_DataFrame["team_1"] = Merged_DataFrame["team_1"].apply(name_converter)

# Merged_DataFrame["team_2"] = Merged_DataFrame["team_2"].apply(name_converter)

Expand full comment

Hi there,

New subscriber here. I've read this series of posts and find them very enjoyable. Thank you for sharing all of this and for the code.

A quick question: I signed up for the prop-odds free api key and exhausted my 2000 monthly API calls without being able to complete 'mlb-runline-dataset-builder.' Are you using their 'algo better' subscription level? Will 100,000 API calls in their 'algo better' level allow for this algorithm to be run multiple times per month? I am not sure how much of the builder I actually made it through (meaning how many API calls are needed to be made).

Expand full comment

Hi there, glad to have you! 😄

While your first time running the dataset builder will take up a bit over 2,000 queries, you won’t have to make such heavy requests in the future.

Instead of running the dataset-builder in the future, you run the dataset-production which just queries for games in the last few days, for around 30-40 requests.

So, my cost-saving strategy was to get the subscription, make the initial queries, then downgraded back to free which allows all the queries needed going forward.

Hope that helped!

Expand full comment

Thank you for your reply. That sounds like a great idea. Appreciate it.

Expand full comment

gm! Trying to follow this process. Finally got the database built and went on to the training.

In the training flow, my precision results are differing from the example above pretty heavily. I used the exact same notebooks with no changes but I am getting close to 90% on the ensamble of models and 90% on the mlp model. I know this obviously can't be correct and I am wondering where I could have possibly gone wrong?

My dataset in SQL is set up like this

game_datetime team_1 team_1_spread_odds team_2 team_2_spread_odds venue_name spread

0 2023-08-03 14:07:00 Baltimore Orioles -195 Toronto Blue Jays 150 Rogers Centre 1

Any ideas what I could be doing incorrectly?

Expand full comment

I should add, I ran the database builder first. Then the training file (as there are no games currently being played)

Expand full comment

Morning Quant, could we apply this framework to the NFL spread? Do you have any plans for a NFL or NHL version?

Expand full comment

Hey there!

This can definitely be applied, the same api source this pulls from will have NFL data, so it would just be a matter of changing some parameters.

Currently the pipeline is to build out the system for NFL player props, but it’s highly likely I’ll also create an NFL version for the main lines (e.g., spread, moneyline)

Expand full comment

Jonathan Duffield

Getting error

raise KeyError(key) from err

KeyError: 'short_name'

Do you know how to fix?

Expand full comment

Okay,

I've first updated the csv file which converts the team names since the API convention for the short names changed as another commentor mentioned. Essentially, instead of the API returning e.g., "ARI Diamondbacks", it now returns "ARI Diamondbacks - ARI Diamondbacks" which the file did not contain.

So, delete the csv you have saved and replace it with the new one uploaded to the GitHub.

If you're still seeing that error, can you mention which file you're running when it gets brought up? If it works now, please let me know.

Best

Expand full comment

Jonathan Duffield

Its working now, but now I have a new problem. it says it can't connect connect to MySQL server on 'database hostname:3306'

and it says the background of the error is on https://sqlalche.me/e/20/rvf5

Expand full comment

If this is your first time running it, you have to input your own SQL database credentials; the create_engine line is a base template format of how the credentials should be entered.

Setting up the database is relatively simple, but if you’re entirely new to the process I highly recommend just taking the course where we walk through setting up the entire workflow: https://www.quantgalore.com/courses/ml-sports-betting-mlb

If you’d prefer not to, this tutorial might help you get started: https://youtu.be/OllLAQvhAwA?si=UA0-2ykIhM16vYkC

Expand full comment

Jonathan Duffield

So thats fine, but now I have a problem with

#Finalized_Model_save_to_file_string = f"2023-07-22 Baseball Spread"

Finalized_Model_save_to_file_string = f"{datetime.today().strftime('%Y-%m-%d')} Baseball Spread"

Classification_Model = pycaret.classification.load_model(f"/content/drive/MyDrive/{Finalized_Model_save_to_file_string}")

print(Finalized_Model_save_to_file_string), in production

It is saying File not found even though I have saved it to my drive

Expand full comment

I would say make sure you saved it to the MyDrive and not the Colab notebooks folder. You can double check where to save in the line where you run sys.path.append and drive.mount.

If you’re sure it’s there then try re-running the line to mount the drive again.

Expand full comment

Hi Jonathan,

Apologies for the delay, I am looking into this today and will update you if I publish the fix to the GitHub.

Expand full comment

Receiving this error, thougts?: Traceback (most recent call last):

File ~/anaconda3/lib/python3.10/site-packages/spyder_kernels/py3compat.py:356 in compat_exec

exec(code, globals, locals)

File ~/Downloads/Machine Learning for Sports Betting-MLB Editi/mlb-runline-main/mlb-runline-dataset-builder.py:59

games = pd.json_normalize(requests.get(games_url).json()["games"])

KeyError: 'games'

Expand full comment

Hi there,

I recommend using the Spyder IDE so that you can see the actual contents of the variables to immediately see what's wrong.

However, this error comes up when the url response doesn't contain any games, and instead contains a message like "Your max quota is reached.", which in this case means you've gone over the 2,000 queries allowed by the API's free plan.

As mentioned in the other comment below, the best option is to upgrade to get the 100,000 request quota limit, then downgrade to not be charged again.

Hope that helped!

Expand full comment

API is sorted. Sorry, next issue: My test connection works in mySQLworkbench, but it keeps giving me this error: (mysql.connector.errors.ProgrammingError) 1049 (42000): Unknown database 'psiq-mlb-data'

(Background on this error at: https://sqlalche.me/e/14/f405) when I run it in Spyder

I have tried adding in the DB twice following the course tech setup.

Thoughts?

Expand full comment

Is "psiq-mlb-data" the table name or the database name?

Can you upload a screenshot of your workbench tables (left column) and a screenshot of the code snippet that creates the sql engine and sends the data, to https://imgur.com/upload and post the link here?

Expand full comment

Yes, when I go into workbench Im not seeing the DB listed in Schemas. https://imgur.com/a/wTLXoCg

Expand full comment

Alright, I understand now.

I've just added 2 new lines of code to the dataset builder file that will solve this.

The new "initial_engine" variable will connect to the SQL server you already have, without choosing a specific database. Then, in the next line, we run the "CREATE DATABASE" command followed by the desired name of your database.

That would then be the database used to store the data.

After that, you can append and delete data from it as you please.

Let me know if there's still any troubles after this.

Thanks

Expand full comment

Thank you!.. I will run it now.

Expand full comment

Continue thread →

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts