Machine Learning Projects

MEEN 423 - Machine Learning for Mechanical Engineers

| Project 1: Ensemble ML for Mechanical Analysis

Developed an ensemble machine learning approach to predict stress values from material interface properties and strain input data. The project focused on capturing non-linear stress-strain relationships through a sophisticated model selection and averaging methodology.

| Technical Approach

Implemented ensemble learning models combining bagging and boosting techniques
Selected Random Forest Regressor (RF) for its anti-overfitting properties
Integrated XGBoost for superior handling of structured numerical data
Utilized Newton-Raphson method for faster convergence and improved accuracy

| Model Performance

Model	Mean R²	Validation R²	Validation RMSE	Runtime
Random Forest	0.7139 ±0.0122	0.7157	55.9973	10.83 min
XGBoost	0.7616 ±0.0058	0.7651	50.9077	0.24 min
Combined	n/a	0.7638	51.0468	n/a

| Validation Strategy

Implemented k-fold cross-validation (k=5) to balance computational efficiency with statistical reliability. The combined model architecture averages predictions from both RF and XGB models, effectively:

Mitigating individual model errors
Leveraging complementary model strengths
Improving overall prediction accuracy
Maintaining computational efficiency

| Project 2: Bitcoin Price Prediction with Sentiment Analysis

Developed a sophisticated cryptocurrency price forecasting system combining historical market data with Twitter sentiment analysis. The project achieved a high R² value of 0.7008 using Random Forest, demonstrating the effectiveness of sentiment analysis in predicting short-term price movements.

| Problem Statement

The Bitcoin value is infamously volatile and many models struggle to predict its behavior. Traditional models rely on historical prices alone, but our goal is to build a smarter forecasting model that integrates both crowd sentiment from Twitter data and technical indicators from previous prices, using more features to better predict Bitcoin's future value.

| Motivation

Bitcoin and other cryptocurrencies are notoriously volatile. Emotions like fear and greed drive massive price swings, often reflected on platforms like Twitter before the market reacts. By analyzing both market behavior and social sentiment, we aim to:

Anticipate price moves earlier
Capture hidden trends
Simulate real-world trading decisions

| Technical Approach

Processed 4 years of BTC price data (2015-2019) and Twitter sentiment (2014-2019)
Implemented VADER sentiment analysis on ~3GB of Twitter data
Created comprehensive feature engineering pipeline including:
- Technical indicators (MACD, RSI14, Bollinger Bands)
- Sentiment ratios and rolling averages
- Price momentum and volatility metrics
Engineered 19 features including sentiment ratios, price momentum, volatility, and technical indicators
Optimized hyperparameters using RandomizedSearchCV with TimeSeriesSplit:
- n_estimators: 100
- max_depth: 10
- min_samples_split: 5
- min_samples_leaf: 2
- max_features: log2

BTC price and sentiment correlation analysis, and Random Forest model price predictions

| Model Performance

Model	R²	MSE
Random Forest	0.7008	2,021,834.19
XGBoost	0.6946	2,063,875.95
LSTM	0.6138	2,692,072.46

| Model Comparison

While all three models were evaluated, Random Forest was chosen as the final model for several key reasons. The model comparison reveals important insights about each approach's strengths and limitations:

Random Forest vs XGBoost: While both tree-based models performed similarly (R² of 0.7008 vs 0.6946), Random Forest showed better stability during high volatility periods and required less hyperparameter tuning. XGBoost, though powerful, was more sensitive to the noisy nature of cryptocurrency data.
Random Forest vs LSTM: The LSTM model (R² of 0.6138) struggled with the mixed nature of our features (numerical and sentiment data). While LSTMs excel at pure time series data, our hybrid approach combining technical indicators with sentiment analysis was better suited for tree-based models.

XGBoost showed similar performance but was more sensitive to market volatility; LSTM struggled with the hybrid nature of our feature set

| Results and Analysis

Our chosen model achieved a R² score of 0.7008 on the post-2018 test data set. The predicted vs. actual BTC prices show that the model tracks short-term trends and direction well. Larger deviations occur during high volatility, suggesting that further robustness techniques (e.g., uncertainty quantification) could be valuable.

| Future Work

Incorporate more current Twitter data to get a live version of Twitter sentiment scores
Use live sentiment scores to predict BTC prices in the near future
Create an ensemble of various models (XGBoost + Random Forest, others) to optimize model performance
Determine if there are other features that can be included to improve performance
Train model with other cryptocurrencies

TL;DR – Developed ensemble ML models for mechanical analysis achieving 0.76 R² and implemented sentiment-based Bitcoin price prediction with 0.70 R² accuracy, demonstrating advanced machine learning applications in engineering and finance.