Transformer Implementation

User will be implement Transformer:

1. Positional Encoding
2. Scaled Dot-Product Attention
3. Multi-Head Attention
4. Feed-Forward Network (FFN)
5. Layer Normalization
6. Residual Connections

Lets discuss for any discovery discussion needed