Introduction
In this post, I will attempt to explain the TabularModel class in fast.ai as part of the “Further Research” assignment in chapter 9 of the fastbook.
Disclaimer: This post is inspired by another wonderful blog post explaining the same.
Here’s the code of TabularModel class pasted as is:
class TabularModel(Module):
"Basic model for tabular data."
def __init__(self,
list, # Sequence of (num_embeddings, embedding_dim) for each categorical variable
emb_szs:int, # Number of continuous variables
n_cont:int, # Number of outputs for final `LinBnDrop` layer
out_sz:list, # Sequence of ints used to specify the input and output size of each `LinBnDrop` layer
layers:float|MutableSequence=None, # Sequence of dropout probabilities for `LinBnDrop`
ps:float=0., # Dropout probability for `Embedding` layer
embed_p:=None, # Low and high for `SigmoidRange` activation
y_rangebool=True, # Use `BatchNorm1d` in `LinBnDrop` layers
use_bn:bool=False, # Use `BatchNorm1d` on final layer
bn_final:bool=True, # Use `BatchNorm1d` on continuous variables
bn_cont:=nn.ReLU(inplace=True), # Activation type for `LinBnDrop` layers
act_clsbool=True # Linear layer is first or last in `LinBnDrop` layers
lin_first:
):= ifnone(ps, [0]*len(layers))
ps if not is_listy(ps): ps = [ps]*len(layers)
self.embeds = nn.ModuleList([Embedding(ni, nf) for ni,nf in emb_szs])
self.emb_drop = nn.Dropout(embed_p)
self.bn_cont = nn.BatchNorm1d(n_cont) if bn_cont else None
= sum(e.embedding_dim for e in self.embeds)
n_emb self.n_emb,self.n_cont = n_emb,n_cont
= [n_emb + n_cont] + layers + [out_sz]
sizes = [act_cls for _ in range(len(sizes)-2)] + [None]
actns = [LinBnDrop(sizes[i], sizes[i+1], bn=use_bn and (i!=len(actns)-1 or bn_final), p=p, act=a, lin_first=lin_first)
_layers for i,(p,a) in enumerate(zip(ps+[0.],actns))]
if y_range is not None: _layers.append(SigmoidRange(*y_range))
self.layers = nn.Sequential(*_layers)
def forward(self, x_cat, x_cont=None):
if self.n_emb != 0:
= [e(x_cat[:,i]) for i,e in enumerate(self.embeds)]
x = torch.cat(x, 1)
x = self.emb_drop(x)
x if self.n_cont != 0:
if self.bn_cont is not None: x_cont = self.bn_cont(x_cont)
= torch.cat([x, x_cont], 1) if self.n_emb != 0 else x_cont
x return self.layers(x)
I will go through every (necessary) line of code but will refer you to this post if you want to see the outputs each line of code gives.
__init__
ps
This line of code assigns an array of zeroes of length layers if ps=None
.
= ifnone(ps, [0]*len(layers)) ps
I’m not entirely sure as to what this line does, but I think it creates an array of same valued ps
of length layers
if ps
is a list.
if not is_listy(ps): ps = [ps]*len(layers)
embeds
self.embeds = nn.ModuleList([Embedding(ni, nf) for ni,nf in emb_szs])
self.emb_drop = nn.Dropout(embed_p)
self.bn_cont = nn.BatchNorm1d(n_cont) if bn_cont else None
= sum(e.embedding_dim for e in self.embeds)
n_emb self.n_emb,self.n_cont = n_emb,n_cont
If you notice emb_szs
doesn’t have a default value assigned. This is because (AFAIK) TabularModel
is not used directly, but used with tabular_learner
. emb_szs
are calculated inside tabular_learner
using categorical data.
= get_emb_sz(dls.train_ds, {} if emb_szs is None else emb_szs) emb_szs
In self.embeds = nn.ModuleList(...)
embeddings (nn.Embedding
) are created based on the sizes in emb_szs
and are “packaged” together.
self.emb_drop = nn.Dropout(...)
(acc. to the documentation) randomly zeroes out the input tensors based on the probabilities in embed_p
.
I have no clue about nn.BatchNorm1d
😅.
n_emb
gets assigned the total number of embedding dimensions.
= [(4, 2), (5, 3)]
emb_szs = nn.ModuleList([Embedding(ni, nf) for ni,nf in emb_szs])
embeds = sum(e.embedding_dim for e in embeds)
n_emb print(n_emb)
ModuleList(
(0): Embedding(4, 2)
(1): Embedding(5, 3)
)
layers
= [n_emb + n_cont] + layers + [out_sz]
sizes = [act_cls for _ in range(len(sizes)-2)] + [None]
actns = [LinBnDrop(sizes[i], sizes[i+1], bn=use_bn and (i!=len(actns)-1 or bn_final), p=p, act=a, lin_first=lin_first)
_layers for i,(p,a) in enumerate(zip(ps+[0.],actns))]
if y_range is not None: _layers.append(SigmoidRange(*y_range))
self.layers = nn.Sequential(*_layers)
Assuming n_emb=5, n_cont=1, layers=[100, 50], out_sz=1
, then sizes = [6, 100, 50, 1]
.
actns
holds the activation function to be used between each layers. In this case, it holds:
[ReLU(inplace=True), ReLU(inplace=True), None]
In the 3rd line, _layers
gets assigned a “grouping” of BatchNorm1d
, Dropout
and Linear
layers. This is where we understand why sizes is assigned such values. The first two arguments passed to LinBnDrop
specify the number of input and output neurons respectively.
The activations are passed through act=a
where a
is an object enumerated from the zip
of ps
and actns
. The first ReLU
activation is applied between layers of size 6 and 100, and the second between layers of size 100 and 50. actns
has its third element as None
because there isn’t supposed to be any activation function between the last two layers (of size 50 and 1, in this case).
In the same line, there’s a bn
parameter that says whether to use batchnorm
after the current layer (i
). If use_bn
is False
, then it isn’t used. If it is True
, and one of i!=len(actns)-1
(if i!=2
or if i
, the current layer, isn’t the 2nd to last layer) or bn_final
is True
, then batchnorm
is used.
I refer you to Vishal’s post I put in the disclaimer earlier if you aren’t satisfied with this explanation.
The next line appends a SigmoidRange
function that limits the output values within the specified range. And finally, the last line wraps all the layers in nn.Sequential
.
forward
Prepping categorical variables
= [e(x_cat[:,i]) for i,e in enumerate(self.embeds)]
x = torch.cat(x, 1)
x = self.emb_drop(x) x
Firstly, embeddings for x_cat
are created and stored in x
. Then they are concanetated along the columns into a single tensor. Dropout
layers are added in the next line based on embed_p
.
Prepping continous variables
if self.bn_cont is not None: x_cont = self.bn_cont(x_cont)
= torch.cat([x, x_cont], 1) if self.n_emb != 0 else x_cont x
In the first line, x_cont = self.bn_cont(x_cont)
is run if bn_cont
(a parameter to TabularModel
) is not None
. Next, x_cont
is concatenated to x
which has the categorical embeddings if self.n_emb
is not 0. Otherwise, x_cont
is assigned to x
.
Finally, x
is passed to self.layers
which is essentially the model with BatchNorm1d
, Dropout
and Linear
layers and the output is returned.
Conclusion
This was a challenging exercise that filled gaps in my understanding of python and also taught me how Neural Networks are used on tabular datasets. Again, this post helped me a lot with my understanding.
Thank you for reading my blog. You can reach out to me through my socials here:
- Discord - “lostsquid.”
- LinkedIn - /in/suchitg04/
I hope to see you soon. Until then 👋