This problem used to be solved by analytic methods as you can find here, but these are extremely outdated and boring. However, for simpler cases they work very well and are much faster to implement.
Using deep learning is much more powerful for this particular case and you dont actually need that much data, although you will need to take some time to prepare your data.
The concept
We will train the data on a set of smaller image patches extracted from the original images where the target labels will be foreground and background.
Preparing the data
First, you will need to take your images and draw a mask over them in order to identify your labels. You can do this using paint I suppose. You will take your original images and color the foreground in white and the background in black. This will be labels 1 and 0 respectively.
In Python you will load the original images and their respective label images. You will then split the image and labels into patches of size $k \times k$. You can pick whatever patch size you think is best suited for your kind of data. This is a hyper-parameter you will need to tune using cross-validation. Each patch will have its associated label which is the label of the center of the patch.
Build the model
Then you will build a standard convolutional neural network model where the inputs are going to be the images patches and the output will be the label.
Segmenting new images
To segment new images, split that image into patches and predict the label. All the patches which result in $label=1$ is the foreground.
Alternative method
Alternatively, you can predict the values for the entire patch at once, that means the outputs of your model network will be the same size as your input. You will thus have a label for each pixel in the patch.