Good morning, I was getting some NaN on the team names this morning. It appears that the short name coming from the API has been altered. So the CSV file with the Long names needed to be edited. Ex. 'ARI Diamondbacks' to 'ARI Diamondbacks - ARI Diamondbacks' Just wanted to give you the heads up. Cheers
Sorry for the delay in reply Jonathan. You can comment out these two lines: # Merged_DataFrame["team_1"] = Merged_DataFrame["team_1"].apply(name_converter)
New subscriber here. I've read this series of posts and find them very enjoyable. Thank you for sharing all of this and for the code.
A quick question: I signed up for the prop-odds free api key and exhausted my 2000 monthly API calls without being able to complete 'mlb-runline-dataset-builder.' Are you using their 'algo better' subscription level? Will 100,000 API calls in their 'algo better' level allow for this algorithm to be run multiple times per month? I am not sure how much of the builder I actually made it through (meaning how many API calls are needed to be made).
While your first time running the dataset builder will take up a bit over 2,000 queries, you won’t have to make such heavy requests in the future.
Instead of running the dataset-builder in the future, you run the dataset-production which just queries for games in the last few days, for around 30-40 requests.
So, my cost-saving strategy was to get the subscription, make the initial queries, then downgraded back to free which allows all the queries needed going forward.
gm! Trying to follow this process. Finally got the database built and went on to the training.
In the training flow, my precision results are differing from the example above pretty heavily. I used the exact same notebooks with no changes but I am getting close to 90% on the ensamble of models and 90% on the mlp model. I know this obviously can't be correct and I am wondering where I could have possibly gone wrong?
This can definitely be applied, the same api source this pulls from will have NFL data, so it would just be a matter of changing some parameters.
Currently the pipeline is to build out the system for NFL player props, but it’s highly likely I’ll also create an NFL version for the main lines (e.g., spread, moneyline)
I've first updated the csv file which converts the team names since the API convention for the short names changed as another commentor mentioned. Essentially, instead of the API returning e.g., "ARI Diamondbacks", it now returns "ARI Diamondbacks - ARI Diamondbacks" which the file did not contain.
So, delete the csv you have saved and replace it with the new one uploaded to the GitHub.
If you're still seeing that error, can you mention which file you're running when it gets brought up? If it works now, please let me know.
If this is your first time running it, you have to input your own SQL database credentials; the create_engine line is a base template format of how the credentials should be entered.
Setting up the database is relatively simple, but if you’re entirely new to the process I highly recommend just taking the course where we walk through setting up the entire workflow: https://www.quantgalore.com/courses/ml-sports-betting-mlb
I would say make sure you saved it to the MyDrive and not the Colab notebooks folder. You can double check where to save in the line where you run sys.path.append and drive.mount.
If you’re sure it’s there then try re-running the line to mount the drive again.
I recommend using the Spyder IDE so that you can see the actual contents of the variables to immediately see what's wrong.
However, this error comes up when the url response doesn't contain any games, and instead contains a message like "Your max quota is reached.", which in this case means you've gone over the 2,000 queries allowed by the API's free plan.
As mentioned in the other comment below, the best option is to upgrade to get the 100,000 request quota limit, then downgrade to not be charged again.
API is sorted. Sorry, next issue: My test connection works in mySQLworkbench, but it keeps giving me this error: (mysql.connector.errors.ProgrammingError) 1049 (42000): Unknown database 'psiq-mlb-data'
Is "psiq-mlb-data" the table name or the database name?
Can you upload a screenshot of your workbench tables (left column) and a screenshot of the code snippet that creates the sql engine and sends the data, to https://imgur.com/upload and post the link here?
I've just added 2 new lines of code to the dataset builder file that will solve this.
The new "initial_engine" variable will connect to the SQL server you already have, without choosing a specific database. Then, in the next line, we run the "CREATE DATABASE" command followed by the desired name of your database.
That would then be the database used to store the data.
After that, you can append and delete data from it as you please.
Let me know if there's still any troubles after this.
Good morning, I was getting some NaN on the team names this morning. It appears that the short name coming from the API has been altered. So the CSV file with the Long names needed to be edited. Ex. 'ARI Diamondbacks' to 'ARI Diamondbacks - ARI Diamondbacks' Just wanted to give you the heads up. Cheers
ARI Diamondbacks is the short name not long name, so I'm slightly confused.
short_name,long_name
ARI Diamondbacks,Arizona Diamondbacks
CHI Cubs,Chicago Cubs
MIA Marlins,Miami Marlins
The api has been inconsistent with the naming. To eliminate the code returning NaN as a team name i commented out the name conversions section.
Name conversions section?
Sorry for the delay in reply Jonathan. You can comment out these two lines: # Merged_DataFrame["team_1"] = Merged_DataFrame["team_1"].apply(name_converter)
# Merged_DataFrame["team_2"] = Merged_DataFrame["team_2"].apply(name_converter)
Hi there,
New subscriber here. I've read this series of posts and find them very enjoyable. Thank you for sharing all of this and for the code.
A quick question: I signed up for the prop-odds free api key and exhausted my 2000 monthly API calls without being able to complete 'mlb-runline-dataset-builder.' Are you using their 'algo better' subscription level? Will 100,000 API calls in their 'algo better' level allow for this algorithm to be run multiple times per month? I am not sure how much of the builder I actually made it through (meaning how many API calls are needed to be made).
Hi there, glad to have you! 😄
While your first time running the dataset builder will take up a bit over 2,000 queries, you won’t have to make such heavy requests in the future.
Instead of running the dataset-builder in the future, you run the dataset-production which just queries for games in the last few days, for around 30-40 requests.
So, my cost-saving strategy was to get the subscription, make the initial queries, then downgraded back to free which allows all the queries needed going forward.
Hope that helped!
Thank you for your reply. That sounds like a great idea. Appreciate it.
gm! Trying to follow this process. Finally got the database built and went on to the training.
In the training flow, my precision results are differing from the example above pretty heavily. I used the exact same notebooks with no changes but I am getting close to 90% on the ensamble of models and 90% on the mlp model. I know this obviously can't be correct and I am wondering where I could have possibly gone wrong?
My dataset in SQL is set up like this
game_datetime team_1 team_1_spread_odds team_2 team_2_spread_odds venue_name spread
0 2023-08-03 14:07:00 Baltimore Orioles -195 Toronto Blue Jays 150 Rogers Centre 1
Any ideas what I could be doing incorrectly?
I should add, I ran the database builder first. Then the training file (as there are no games currently being played)
Morning Quant, could we apply this framework to the NFL spread? Do you have any plans for a NFL or NHL version?
Hey there!
This can definitely be applied, the same api source this pulls from will have NFL data, so it would just be a matter of changing some parameters.
Currently the pipeline is to build out the system for NFL player props, but it’s highly likely I’ll also create an NFL version for the main lines (e.g., spread, moneyline)
Getting error
raise KeyError(key) from err
KeyError: 'short_name'
Do you know how to fix?
Okay,
I've first updated the csv file which converts the team names since the API convention for the short names changed as another commentor mentioned. Essentially, instead of the API returning e.g., "ARI Diamondbacks", it now returns "ARI Diamondbacks - ARI Diamondbacks" which the file did not contain.
So, delete the csv you have saved and replace it with the new one uploaded to the GitHub.
If you're still seeing that error, can you mention which file you're running when it gets brought up? If it works now, please let me know.
Best
Its working now, but now I have a new problem. it says it can't connect connect to MySQL server on 'database hostname:3306'
and it says the background of the error is on https://sqlalche.me/e/20/rvf5
If this is your first time running it, you have to input your own SQL database credentials; the create_engine line is a base template format of how the credentials should be entered.
Setting up the database is relatively simple, but if you’re entirely new to the process I highly recommend just taking the course where we walk through setting up the entire workflow: https://www.quantgalore.com/courses/ml-sports-betting-mlb
If you’d prefer not to, this tutorial might help you get started: https://youtu.be/OllLAQvhAwA?si=UA0-2ykIhM16vYkC
So thats fine, but now I have a problem with
#Finalized_Model_save_to_file_string = f"2023-07-22 Baseball Spread"
Finalized_Model_save_to_file_string = f"{datetime.today().strftime('%Y-%m-%d')} Baseball Spread"
Classification_Model = pycaret.classification.load_model(f"/content/drive/MyDrive/{Finalized_Model_save_to_file_string}")
print(Finalized_Model_save_to_file_string), in production
It is saying File not found even though I have saved it to my drive
I would say make sure you saved it to the MyDrive and not the Colab notebooks folder. You can double check where to save in the line where you run sys.path.append and drive.mount.
If you’re sure it’s there then try re-running the line to mount the drive again.
Hi Jonathan,
Apologies for the delay, I am looking into this today and will update you if I publish the fix to the GitHub.
Receiving this error, thougts?: Traceback (most recent call last):
File ~/anaconda3/lib/python3.10/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
exec(code, globals, locals)
File ~/Downloads/Machine Learning for Sports Betting-MLB Editi/mlb-runline-main/mlb-runline-dataset-builder.py:59
games = pd.json_normalize(requests.get(games_url).json()["games"])
KeyError: 'games'
Hi there,
I recommend using the Spyder IDE so that you can see the actual contents of the variables to immediately see what's wrong.
However, this error comes up when the url response doesn't contain any games, and instead contains a message like "Your max quota is reached.", which in this case means you've gone over the 2,000 queries allowed by the API's free plan.
As mentioned in the other comment below, the best option is to upgrade to get the 100,000 request quota limit, then downgrade to not be charged again.
Hope that helped!
API is sorted. Sorry, next issue: My test connection works in mySQLworkbench, but it keeps giving me this error: (mysql.connector.errors.ProgrammingError) 1049 (42000): Unknown database 'psiq-mlb-data'
(Background on this error at: https://sqlalche.me/e/14/f405) when I run it in Spyder
I have tried adding in the DB twice following the course tech setup.
Thoughts?
Is "psiq-mlb-data" the table name or the database name?
Can you upload a screenshot of your workbench tables (left column) and a screenshot of the code snippet that creates the sql engine and sends the data, to https://imgur.com/upload and post the link here?
Yes, when I go into workbench Im not seeing the DB listed in Schemas. https://imgur.com/a/wTLXoCg
Alright, I understand now.
I've just added 2 new lines of code to the dataset builder file that will solve this.
The new "initial_engine" variable will connect to the SQL server you already have, without choosing a specific database. Then, in the next line, we run the "CREATE DATABASE" command followed by the desired name of your database.
That would then be the database used to store the data.
After that, you can append and delete data from it as you please.
Let me know if there's still any troubles after this.
Thanks
Thank you!.. I will run it now.