Hi Amir,
please take a look at the second quote in my post above, wich contains the statement: the egress port for unknown unicast packets [...] based on Layer 3 source and destination IP address.
Because you are using the custom algorithm for a LAG on the switch, unknown unicast packets are always load shared in a LAG based on both source and destination IP address instead of what you configured for the LAG. Thus some packets, i.e. the unknown unicast frames, can be sent on a different port than the known unicast frames. This is an explanation why it is possible for a layer 3 switch to break some packets out of a flow.
Anyway, a firewall between the two switches should be configured to understand that both physical links are one logical link. Thus any flow that is seen on one of the physical links must be treated as using the logical link. Otherwise your setup might work for a specific hardware/software combination of switches, but that may change with a software upgrade, or when replacing the hardware with a newer (or just different) model, or when changing vendors. If you were to change to an MLAG setup (replacing each switch by an MLAG pair), the packet would always egress the local port. If you were to use a stack or virtual switch bond (or virtual chassis or virtual switching system as other vendors call it), you may or may not be able to configure if a local member port is used or not, thus you may not be able to create the same behavior as with the MLAG.
Long story short, placing some device that does not understand LAG into the middle of a LAG (actually, breaking up the LAG into two parts, both comprised of one LAG aware device and a LAG unaware device) is a bad idea.
Thanks,
Erik