My apologies, I really should have read your post more carefully before replying!
But still, a program written for the four-level stack is always likely to mis-behave with the big stack. It just sounds like in this particular case, the mis-behavior consists of leaving a steadily increasing amount of garbage on the stack... and with enough iterations, that could potentially even get Free42 on a Mac to run out of memory. It's just easier to run out of memory when there isn't as much of it available to begin with.
Regarding the stack labels vs. the empty stack: ah, cool, so I did do it the way I intended in Free42. That's a relief! So I can blame SM for this particular quirk, then.