Concret example to use the Faith Cluster, thanks to Simon Ruffieux
Abschlussbedingungen
This file provides a basic concrete example on how to use the Faith Cluster
Open your terminal and login using ssh
ssh your_username@diufrd200.unifr.ch
Create a workspace (folder) and the venv for your project (while connected in SSH)
-
Create a folder for your project in your home directory your_username@diufrd200:~$ mkdir faith_demo
-
Move into directory your_username@diufrd200:~$ cd faith_demo
-
Create a venv for your project your_username@diufrd200:~$ python -m venv .venv
Move files and manage your workspace (2 options)
- Use Git or another repository to manage your files and pull them from there using your favorite terminal
- Connect Visual Studio through SSH to your remote workspace and manage files form there (provides a visual interface and a terminal) See for instructions https://code.visualstudio.com/docs/remote/ssh
Install necessary packages in your venv
- Note that using a requirements.txt file helps
- your_username@diufrd200:~$ source .venv/bin/activate
- install packages manually using pip or use pip install -r requirements.txt
Create a slurm file
Here is a working example
#!/bin/bash
#SBATCH --job-name=faith_demo
#SBATCH --output=logs/output.txt
#SBATCH --error=logs/error.txt
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_mail@unifr.ch
source .venv/bin/activate
python3 faith_demo.py
Request processing time for your task by running your slurm file
your_username@diufrd200:~$ sbatch slurm_script.sh
Sources
There are 3 files required to make this example work. They should be all located within the directory of your project (<faith_demo>)
1) Python script (faith_demo.py) 2) Requirements file (requirements.txt) 3) Slurm script (slurm_script.sh)
Python script
"""
Train and evaluate different models, either on the Iris dataset (very easy classification task) or Forest Covert dataset (slightly more complex classification task).
Several models are proposed. A GridSearchCV is also available for the RandomForest dataset, which requires more computation time.
"""
from __future__ import print_function, division
from sklearn.datasets import load_iris
from sklearn.datasets import fetch_covtype
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn import neighbors
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
import joblib
def train_dt(X, y):
print("Training the decision tree model")
my_model = DecisionTreeClassifier(max_depth=2)
my_model.fit(X, y)
return my_model
def train_lr(X, y):
print("Training the linear model")
my_model = LogisticRegression()
my_model.fit(X, y)
return my_model
def train_knn(X, y):
print("Training the knn model")
my_model = neighbors.KNeighborsClassifier()
my_model.fit(X, y)
return my_model
def train_rf(X, y):
print("Training the rf model")
my_model = RandomForestClassifier()
my_model.fit(X, y)
return my_model
def gridSearchCV_rf(X,y):
print("Training the rf model with GridSearchCV (might take some time ...)")
# Create the parameter grid based on the results of random search
param_grid = {
'bootstrap': [True],
'max_depth': [80, 90, 100, 110],
'max_features': [2, 3],
'min_samples_leaf': [3, 4, 5],
'min_samples_split': [8, 10, 12],
'n_estimators': [100, 200, 300, 1000]
}
# Create a based model
rf = RandomForestClassifier()
# Instantiate the grid search model
grid_search = GridSearchCV(estimator = rf, param_grid = param_grid, cv = 3, n_jobs = -1, verbose = 0)
# Fit the grid search to the data
grid_search.fit(X, y)
print("Best params found:\n")
print(str(grid_search.best_params_), "\n")
my_model_gs = grid_search.best_estimator_
return my_model_gs
def save_model(model, filename):
print("Saving the model")
joblib.dump(model, filename)
def load_iris_dataset():
print("Loading iris dataset\n")
iris = load_iris()
return iris
def load_forestcovert_dataset():
print("Loading Forest dataset\n")
covert = fetch_covtype()
return covert
def main():
#dset = load_iris_dataset()
dset = load_forestcovert_dataset()
X = dset['data']
y = dset['target']
x_train, x_test, y_train, y_test = train_test_split(X, y, train_size=0.75)
model_dt = train_dt(x_train, y_train)
print("Decision tree accuracy is", accuracy_score(y_test, model_dt.predict(x_test)), "\n")
model_lr = train_lr(x_train, y_train)
print("Linear regression accuracy is", accuracy_score(y_test, model_lr.predict(x_test)), "\n")
model_knn = train_knn(x_train, y_train)
print("KNN accuracy is", accuracy_score(y_test, model_knn.predict(x_test)), "\n")
model_rf = train_rf(x_train, y_train)
print("Random Forest accuracy is", accuracy_score(y_test, model_rf.predict(x_test)), "\n")
#model_rfgs = gridSearchCV_rf(x_train, y_train)
#print("Random Forest from GridSearch accuracy is", accuracy_score(y_test, model_rfgs.predict(x_test)), "\n")
if __name__ == "__main__":
main()
Requirements text file
scikit-learn
joblib
Slurm Script
#!/bin/bash
#SBATCH --job-name=faith_demo
#SBATCH --output=logs/output.txt
#SBATCH --error=logs/error.txt
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your.mail@unifr.ch
source .venv/bin/activate
python3 faith_demo.py