z, ?	toggle help (this)
space, →	next slide
shift-space, ←	previous slide
d	toggle debug mode
## <ret>	go to slide #
r	reload slides
n	toggle notes

loading presentation...

Hello everyone. In this presentation, I will be discussing backtraces. I will discuss why backtraces are important, the implementation of backtraces in Ruby, why production of partial backtraces was slow in older versions of Ruby, and how it was fixed in Ruby 3. I will also discuss bugs that occurred as a result of the optimization, and how the bugs were fixed.

Optimizing

Partial

Backtraces

in Ruby 3

RubyKaigi 2021

My name is Jeremy Evans. I am a Ruby committer who mostly works on fixing bugs in Ruby.

GitHub: jeremyevans

Twitter: @jeremyevans0

I am also the author of Polished Ruby Programming, which was published a couple months ago. This book is aimed at intermediate Ruby programmers and focuses on teaching principles of Ruby programming, as well as trade-offs to consider when making implementation decisions.

So, backtraces. Why do we care about backtraces? Well, consider programming without backtraces.

Backtraces

Let’s say you are working on a refactoring in your library, and after working on it, you run the tests, and it turns out one test fails with an ArgumentError, with output similar to this. This output tells you have an error, and what type of error it is, but without a backtrace, you have no idea where the error is occurring, which makes debugging much more difficult.


Run options: --seed 64029

# Running:

...E..............

Finished in 0.003209s, 5608.6308 runs/s, 6543.4026 assertions/s.

  1) Error:
MyClass#test_0001_#foo should not raise error:
ArgumentError: wrong number of arguments (given 2, expected 0..1)


18 runs, 21 assertions, 0 failures, 1 errors, 0 skips

_ OK, so backtraces themselves are important. However, how important is backtrace performance, specifically partial backtrace performance? I do not actually know the answer to that question.

Backtraces

As I mentioned, I focus most of my time on Ruby going through the bug tracker and fixing bugs, not implementing performance improvements. However, it happened that another Ruby committer filed issue 17031 on the issue tracker, regarding partial backtraces not being optimized. The other committer mentioned they found the issue when checking memory usage of RuboCop. In any case, this performance issue was filed as a bug, and since I try to fix a lot of Ruby bugs, I did some research into it.

https://bugs.ruby-lang.org/issues/17031

This was the first time I looked into how backtraces are created in Ruby, so it was a learning experience for me. I think how Ruby generates a backtrace is interesting, and I am guessing most of you are unfamiliar with the process, as I was before working on this bug, so I will go over the process now.

Backtrace

Generation

Before understanding the backtrace generation process, it helps to understand better what backtraces are.

What Are

Backtraces?

As in most programming languages, backtraces in Ruby are a representation of the call stack.

Call Stack

Representation

Let’s start off with an example of a call stack. We start with a method foo that calls bar with a block that calls caller.


def foo
  bar{caller(0)}
end

We explicitly pass 0 as the argument to caller. This makes the backtrace include the call to this method. The default value for caller’s first argument is 1, which excludes the call to the method that called caller.


def foo
  bar{caller(0)}
end

Then we have the bar method just yield.


def foo
  bar{caller(0)}
end

def bar
  yield
end

To see the results of the backtrace, we will use puts foo.


def foo
  bar{caller(0)}
end

def bar
  yield
end

puts foo

And these are the results.


def foo
  bar{caller(0)}
end

def bar
  yield
end

puts foo

# -:2:in `block in foo'
# -:6:in `bar'
# -:2:in `foo'
# -:9:in `<main>'

Note that the dashes here represent the filename. In this case, I was providing the Ruby code to execute on the standard input, in which case Ruby uses the dash to represent the filename.


def foo
  bar{caller(0)}
end

def bar
  yield
end

puts foo

# -:2:in `block in foo'
# -:6:in `bar'
# -:2:in `foo'
# -:9:in `<main>'

The first entry in the backtrace is the call to caller inside the block you passed to bar. Each call to a block has a separate entry in the backtrace.


def foo
  bar{caller(0)}
end

def bar
  yield
end

puts foo

# -:2:in `block in foo'
# -:6:in `bar'
# -:2:in `foo'
# -:9:in `<main>'

The second entry in the backtrace is the the yield inside bar, which is the same as calling the block that was passed to bar.


def foo
  bar{caller(0)}
end

def bar
  yield
end

puts foo

# -:2:in `block in foo'
# -:6:in `bar'
# -:2:in `foo'
# -:9:in `<main>'

The next entry in the backtrace is the call to bar inside foo.


def foo
  bar{caller(0)}
end

def bar
  yield
end

puts foo

# -:2:in `block in foo'
# -:6:in `bar'
# -:2:in `foo'
# -:9:in `<main>'

The last entry in the backtrace is the call to foo in the main script.


def foo
  bar{caller(0)}
end

def bar
  yield
end

puts foo

# -:2:in `block in foo'
# -:6:in `bar'
# -:2:in `foo'
# -:9:in `<main>'

I mentioned earlier that a backtrace is a representation of the call stack. If you read the backtrace from bottom to top, you can see how each call was added to the stack. In the main script,


def foo
  bar{caller(0)}
end

def bar
  yield
end

puts foo

# -:2:in `block in foo'
# -:6:in `bar'
# -:2:in `foo'
# -:9:in `<main>'

you called foo


def foo
  bar{caller(0)}
end

def bar
  yield
end

puts foo

# -:2:in `block in foo'
# -:6:in `bar'
# -:2:in `foo'
# -:9:in `<main>'

which called bar


def foo
  bar{caller(0)}
end

def bar
  yield
end

puts foo

# -:2:in `block in foo'
# -:6:in `bar'
# -:2:in `foo'
# -:9:in `<main>'

which called the block in foo.


def foo
  bar{caller(0)}
end

def bar
  yield
end

puts foo

# -:2:in `block in foo'
# -:6:in `bar'
# -:2:in `foo'
# -:9:in `<main>'

So let’s dive in to how backtraces are generated.

Backtrace

Generation

I mentioned that backtraces represent the call stack, but Ruby does not have a call stack as a separate data structure.

Call Stack

Representation

Ruby uses a virtual machine to execute code. In order to execute your Ruby code, Ruby compiles your Ruby source code into bytecode that can be executed on the virtual machine. The virtual machine then processes the bytecode to do the execution.

Virtual

Machine

The virtual machine keeps keeps track of execution of the current method or block using a virtual machine control frame, often abbreviated to VM frame, control frame, or just frame. The virtual machine tracks these control frames using a stack. Each time a method or block is called, the virtual machine pushes a frame onto the stack. When the method or block returns, the virtual machine pops the frame from the stack.

Virtual

Machine

Control Frames

I just described something that sounds a lot like a call stack, but earlier I said that Ruby does not have a call stack as a separate data structure, which appears to be a contradiction.

Call Stack

Representation

The reason for the difference is the virtual machine control frame stack is not exactly a call stack. There are control frames that are not considered part of the call stack and are ignored during backtrace generation. We will see later that these control frames that are not part of the call stack ended up causing bugs in the optimization of partial backtraces.

Virtual

Machine

Control Frame Stack

Like most parts of Ruby, the implementation of backtraces is written in C. Backtrace handling is implemented in Ruby’s vm_backtrace.c file. To make the backtrace generation process more understandable to Ruby programmers, this presentation is going to translate the algorithm used into Ruby pseudocode. I will also simplify it slightly to avoid parts that are not important to this discussion. | We will start with a description of backtrace generation used in Ruby 2.7, which is mostly the same for older versions of Ruby, to describe the process before the optimization.

vm_backtrace.c

The backtrace generation process starts with the virtual machine’s end control frame. While it says end control frame, this is the starting control frame for backtrace generation, since the Ruby 2.7 backtrace generation process runs from the earliest control frame to the current control frame in the order that each control frame was added to the stack.

cf = VM.end_control_frame

The first two control frames after the end control frame should be skipped, as they are considered dummy frames from the perspective of backtrace generation. Let’s assume the control frame’s next method returns the control frame pushed after the receiver.

cf = VM.end_control_frame.next.next

Since the backtrace is returned as an array, we will initialize an empty array for it.

cf = VM.end_control_frame.next.next
bt = []

The backtrace process will iterate over the remaining control frames. It needs to know the total number of backtrace frames, and have a counter that is incremented for each control frame to ensure it does not process off the end of the control frame stack.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

The majority of backtrace generation process happens while iterating over the control frames, which we will do here using a while loop. At the end of each loop iteration, we will move to the next frame and increment the counter.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

while i  size
  
  cf = cf.next
  i += 1
end

Here is the main part of the backtrace generation algorithm. Let’s go over each step.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

while i  size
  if cf.iseq?
    if cf.instruction_pointer? 
      last_iseq_loc = cf.loc
      bt  cf.iseq_backtrace_entry
    end
  elsif cf.cfunc?
    bt  cf.cfunc_backtrace_entry(last_iseq_loc)
  end
  
  cf = cf.next
  i += 1
end

As I mentioned earlier, there are different types of virtual machine control frames, and not all of them are considered in backtrace generation. There are only two control frame types considered during backtrace generation. These are iseq frames and cfunc frames. Iseq stands for instruction sequence, and iseq frames are usually, but not always, frames for normal methods defined in Ruby. Cfunc stands for C function, and cfunc frames are frames for calling methods defined in C. | Iseq and cfunc frames must be treated differently during backtrace generation, because methods defined in C do not have file or line numbers associated with them, so the location of cfunc frames is the same as the preceding iseq frame. This is the reason that the backtrace is processed starting with the virtual machine’s end control frame.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

while i  size
  if cf.iseq?
    if cf.instruction_pointer? 
      last_iseq_loc = cf.loc
      bt  cf.iseq_backtrace_entry
    end
  elsif cf.cfunc?
    bt  cf.cfunc_backtrace_entry(last_iseq_loc)
  end
  
  cf = cf.next
  i += 1
end

Backtrace generation does not use all iseq frames, it only uses iseq frames that have an instruction pointer. Iseq frames without an instruction pointer can occur for internal functions that call blocks, such as Enumerable’s group_by method. | These frames should be ignored, as otherwise a single method call would result in multiple backtrace entries, one of which wouldn’t have location information, since the location information is determined using the instruction pointer.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

while i  size
  if cf.iseq?
    if cf.instruction_pointer?
      last_iseq_loc = cf.loc
      bt  cf.iseq_backtrace_entry
    end
  elsif cf.cfunc?
    bt  cf.cfunc_backtrace_entry(last_iseq_loc)
  end
  
  cf = cf.next
  i += 1
end

For the iseq frames that have an instruction pointer, these represent methods defined in Ruby. In that case, we need to use the location info of the current iseq frame for any future cfunc frames until the next iseq frame. So we record the location of the call in the current control frame, so that we can use it if needed when creating the backtrace entry for a future cfunc frame.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

while i  size
  if cf.iseq?
    if cf.instruction_pointer? 
      last_iseq_loc = cf.loc
      bt  cf.iseq_backtrace_entry
    end
  elsif cf.cfunc?
    bt  cf.cfunc_backtrace_entry(last_iseq_loc)
  end
  
  cf = cf.next
  i += 1
end

After recording the location, we add the backtrace entry to the array storing the backtrace.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

while i  size
  if cf.iseq?
    if cf.instruction_pointer? 
      last_iseq_loc = cf.loc
      bt  cf.iseq_backtrace_entry
    end
  elsif cf.cfunc?
    bt  cf.cfunc_backtrace_entry(last_iseq_loc)
  end
  
  cf = cf.next
  i += 1
end

For cfunc frames, we also add the backtrace entry to the array storing the backtrace. However, creating the backtrace entry for cfunc frames requires knowledge of the last iseq location, so we pass that in when creating the entry.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

while i  size
  if cf.iseq?
    if cf.instruction_pointer? 
      last_iseq_loc = cf.loc
      bt  cf.iseq_backtrace_entry
    end
  elsif cf.cfunc?
    bt  cf.cfunc_backtrace_entry(last_iseq_loc)
  end
  
  cf = cf.next
  i += 1
end

After this whole process completes, we reverse the backtrace array, since we want to return the backtrace ordered from the most recent call to the oldest call.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

while i  size
  if cf.iseq?
    if cf.instruction_pointer? 
      last_iseq_loc = cf.loc
      bt  cf.iseq_backtrace_entry
    end
  elsif cf.cfunc?
    bt  cf.cfunc_backtrace_entry(last_iseq_loc)
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

Finally, at the very end of this process, we use the arguments provided to caller to extract the entries in the backtrace we are interested in. caller’s default start value is 1 and default length value is nil, meaning all backtrace entries after the first entry.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

while i  size
  if cf.iseq?
    if cf.instruction_pointer? 
      last_iseq_loc = cf.loc
      bt  cf.iseq_backtrace_entry
    end
  elsif cf.cfunc?
    bt  cf.cfunc_backtrace_entry(last_iseq_loc)
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!
bt[start, length||size]

Hopefully this shows why partial backtraces were slow. They basically required a bunch of unnecessary work.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

while i  size
  if cf.iseq?
    if cf.instruction_pointer? 
      last_iseq_loc = cf.loc
      bt  cf.iseq_backtrace_entry
    end
  elsif cf.cfunc?
    bt  cf.cfunc_backtrace_entry(last_iseq_loc)
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!
bt[start, length||size]

For example, if you want only the most recent caller entry before the current method, you would call caller with 1 as the start entry and 1 as the length. This will return an array of at most one backtrace entry.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

while i  size
  if cf.iseq?
    if cf.instruction_pointer? 
      last_iseq_loc = cf.loc
      bt  cf.iseq_backtrace_entry
    end
  elsif cf.cfunc?
    bt  cf.cfunc_backtrace_entry(last_iseq_loc)
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!
bt[1, 1]

However, to get that array of at most one entry, Ruby would still create backtrace entries for most or all of the control frames, before filtering it to only the frames requested. | So the problem was fairly clear, Ruby is doing unnecessary work that will be thrown away. The solution was also fairly clear, in that we needed to avoid that unnecessary work, and only add backtrace frames that we cared about.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

while i  size
  if cf.iseq?
    if cf.instruction_pointer? 
      last_iseq_loc = cf.loc
      bt  cf.iseq_backtrace_entry
    end
  elsif cf.cfunc?
    bt  cf.cfunc_backtrace_entry(last_iseq_loc)
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!
bt[1, 1]

So, as a first start of implementing the solution, we need consider the start and length arguments passed to caller. In this initial example, we’ll assume that the start and length were both provided and that the start added to the length is still less than the number of control frames. Let’s walk through the changes in this example.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
start = last - length 

while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer? 
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

Instead of iterating over the total number of control frames, we use the start argument passed to caller to reduce the number of iterations, which we store in the variable named last.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
start = last - length 

while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer? 
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

We subtract the desired length of the backtrace from the number of iterations to determine the starting point where we actually want to add backtrace entries.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
start = last - length

while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer? 
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

We still iterate starting from the same control frame as we did in the previous implementation.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
start = last - length 

while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer? 
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

However, until we reach the starting point, we only record the last iseq location for every iseq control frame with an instruction pointer, we do not actually add backtrace entries.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
start = last - length 

while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer? 
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

We need to do this so that if the first frame in the returned backtrace is a cfunc frame, it will have the correct location, even though it gets that location from an iseq frame that is not in the returned backtrace.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
start = last - length 

while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer? 
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

I mentioned that in this initial example, we will assume that the start and length were both provided and that the start added to the length is still less than the number of control frames. However, the caller can provide arguments where the length is nil or where the start frame plus the length is greater than the number of control frames. In which case this logic is not sufficient. You would need to replace it.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
start = last - length

while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer? 
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

To handle cases where the start plus the length is more than the number of frames, you would have to replace it with the following code.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer? 
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

if start  size
  return nil
elsif length && start + length  size
  start = last - length 
else
  start = 0
end

If the start argument passed to caller is greater than the number of frames, caller needs to return nil instead of an array.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer? 
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

if start  size
  return nil
elsif length && start + length  size
  start = last - length 
else
  start = 0
end

If the start plus the length is greater than the number of frames, or no length argument is provided, the starting point for backtrace generation needs to be set to zero, so that backtrace generation starts at the outermost frame, the main script.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer? 
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

if start  size
  return nil
elsif length && start + length  size
  start = last - length 
else
  start = 0
end

This is basically the algorithm that Ruby 3.0.0 shipped with. It passed all tests and specs and resulted in much faster partial backtraces in situations where there were many control frames.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer? 
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

if start  size
  return nil
elsif length && start + length  size
  start = last - length 
else
  start = 0
end

I was quite pleased with myself, declaring victory over this longstanding performance issue.

勝利

Unfortunately, the optimized algorithm is subtly wrong. It took a whole month after the release of Ruby 3.0.0 before someone reported problems with it.

敗北

However, as reported in issue 17581, the optimized algorithm could result in missing backtrace entries.

https://bugs.ruby-lang.org/issues/17581

The way the algorithm is implemented, it is implicitly expected that each loop iteration will result in a backtrace entry in order to arrive at the expected number of backtrace entries.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer?
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

if start  size
  return nil
elsif length && start + length  size
  start = last - length 
else
  start = 0
end

The problem occurs when there is a iseq control frame without an instruction pointer. In this case, an iteration occurs without adding a backtrace entry, so if you are past the start iteration, any iterations where there is an iseq control frame without an instruction pointer result in that many missing backtrace entries.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer?
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

if start  size
  return nil
elsif length && start + length  size
  start = last - length 
else
  start = 0
end

The solution here is to decrement the counter if this happens. This ensures you generate the desired number of backtrace frames. However, then in certain circumstances, you can process off the end of the control frame stack.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
i = 0

last = size - start
while i  last
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer?
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      else
        i -= 1
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
end

bt.reverse!

if start  size
  return nil
elsif length && start + length  size
  start = last - length 
else
  start = 0
end

To make sure you do not process off the end of the control frame stack, you need a separate counter that is compared against the number of control frames. This ensures that even if you are skipping control frames, you do not go beyond the bounds of the control frame stack.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
j = i = 0

last = size - start
while i  last && j  size
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer?
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      else
        i -= 1
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
  j += 1
end

bt.reverse!

if start  size
  return nil
elsif length && start + length  size
  start = last - length 
else
  start = 0
end

Again, I was quite pleased with myself, I was able to fix this regression quickly and easy.

勝利

Unfortunately, while that change fixed the case where backtrace was missing frames, after the fix was committed, it was found to have broken a test case in Rails, which was not broken with the original optimization. Originally, neither the Rails developers nor I were able to construct a self contained example with the failure, which complicated debugging. However, Rails developers were able to provide a way to reproduce the issue using the Rails test framework, which enabled me to debug the issue.

敗北

Later, after the fix for this issue was developed but before it was committed, another Ruby committer was able to construct a self contained example showing the problem, which they posted as a note on the bug tracker. This example was included as a test case when the fix was committed. | It turns out the cause of this issue was similar to the cause of the previous issue, as it occured when dealing with iseq control frames without an instruction pointer. Instead of missing a backtrace entry as in the previous issue, it would result in an incorrect backtrace entry. For example, it could result in caller without an argument being treated as caller with a start argument of 0.

https://bugs.ruby-lang.org/issues/17581#note-5

There were two changes that were needed to fix this bug, one simple and one complex.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2
j = i = 0

last = size - start
while i  last && j  size
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer?
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      else
        i -= 1
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
  j += 1
end

bt.reverse!

if start  size
  return nil
elsif length && start + length  size
  start = last - length 
else
  start = 0
end

The simple fix is that we need to ensure that we do not process frames after the requested start frame. This is the issue that could make caller without an argument act like caller with a 0 start argument. We make this change by subtracting the caller’s start argument from the initial size of the backtrace, to ensure we do not process frames that the caller has asked us to ignore. We then adjust the related code that uses caller’s start argument to assume the caller’s start argument is 0, since it has already been used to adjust the size.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2 - start
j = i = 0

last = size

while i  last && j  size
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer?
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      else
        i -= 1
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
  j += 1
end

bt.reverse!

if size  0
  return nil
elsif length && length  size
  start = last - length 
else
  start = 0
end

Here is the whole complex fix. The purpose of this fix is to adjust the starting frame for backtrace collection so that we are sure the optimized process will result in the correct number of frames. As the text is quite small, I will zoom in for the explanation.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2 - start
j = i = 0

last = size
while i  last && j  size
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer?
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      else
        i -= 1
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
  j += 1
end

bt.reverse!

if size  0
  return nil
elsif length && length  size
  start = last - length 
else
  start = 0
end

if start > 0 && length > 0 && length  size
  start_cf = cf
  while i  last && j  size
    if cf.iseq? && !cf.instruction_pointer?
      if j  start
        ignored_frames_before_start = true
      else
        i -= 1
      end
    end
    i += 1; j += 1; cf = cf.next
  end
  ignored_frames = j - i
  if ignored_frames > 0
    if ignored_frames_before_start
      j = i = 0; cf = start_cf
      while i  last && j  size && j  start
        i += 1; j += 1; cf = cf.next
      end
      while start > 0 && ignored_frames > 0 && j > 0
        if cf.iseq? && !cf.instruction_pointer?
          ignored_frames += 1
        end
        j -= 1; ignored_frames -= 1; start -= 1; cf = cf.prev
      end
    else
      start -= ignored_frames
    end
  end
  j = i = 0; cf = start_cf
end

We only need to use this fix if we have a backtrace limited in both directions, where start argument to caller is not 0 and it is given a length shorter than the actual number of frames.


if start > 0 && length > 0 && length  size
  start_cf = cf
  while i  last && j  size
    if cf.iseq? && !cf.instruction_pointer?
      if j  start
        ignored_frames_before_start = true
      else
        i -= 1
      end
    end
    i += 1; j += 1; cf = cf.next
  end
  ignored_frames = j - i
  # ...
end

We scan the entire backtrace from the most outward control frame to the most recent control frame, looking for ignored frames.


if start > 0 && length > 0 && length  size
  start_cf = cf
  while i  last && j  size
    if cf.iseq? && !cf.instruction_pointer?
      if j  start
        ignored_frames_before_start = true
      else
        i -= 1
      end
    end
    i += 1; j += 1; cf = cf.next
  end
  ignored_frames = j - i
  # ...
end

The check for ignored frames is the same as before, iseq frames without an instruction pointer.


if start > 0 && length > 0 && length  size
  start_cf = cf
  while i  last && j  size
    if cf.iseq? && !cf.instruction_pointer?
      if j  start
        ignored_frames_before_start = true
      else
        i -= 1
      end
    end
    i += 1; j += 1; cf = cf.next
  end
  ignored_frames = j - i
  # ...
end

If the ignored frame is before the starting frame in the backtrace, we flag that, as that case requires additional work.


if start > 0 && length > 0 && length  size
  start_cf = cf
  while i  last && j  size
    if cf.iseq? && !cf.instruction_pointer?
      if j  start
        ignored_frames_before_start = true
      else
        i -= 1
      end
    end
    i += 1; j += 1; cf = cf.next
  end
  ignored_frames = j - i
  # ...
end

Otherwise, we decrement the i counter.


if start > 0 && length > 0 && length  size
  start_cf = cf
  while i  last && j  size
    if cf.iseq? && !cf.instruction_pointer?
      if j  start
        ignored_frames_before_start = true
      else
        i -= 1
      end
    end
    i += 1; j += 1; cf = cf.next
  end
  ignored_frames = j - i
  # ...
end

The difference between the i and j counters is now the number of ignored frames.


if start > 0 && length > 0 && length  size
  start_cf = cf
  while i  last && j  size
    if cf.iseq? && !cf.instruction_pointer?
      if j  start
        ignored_frames_before_start = true
      else
        i -= 1
      end
    end
    i += 1; j += 1; cf = cf.next
  end
  ignored_frames = j - i
  # ...
end

If there are no ignored frames, then we do not have to do anything. Otherwise, additional work is required.


if start > 0 && length > 0 && length  size
  # ...
  if ignored_frames > 0
    if ignored_frames_before_start
      j = i = 0; cf = start_cf
      while i  last && j  size && j  start
        i += 1; j += 1; cf = cf.next
      end
      while start > 0 && ignored_frames > 0 && j > 0
        if cf.iseq? && !cf.instruction_pointer?
          ignored_frames += 1
        end
        j -= 1; ignored_frames -= 1; start -= 1; cf = cf.prev
      end
    else
      start -= ignored_frames
    end
  end
  j = i = 0; cf = start_cf
end

If there are not any ignored frames before when we expect to start backtrace generation, then we can just subtract the number of ignored frames from the start. Otherwise, the process is significantly more involved.


if start > 0 && length > 0 && length  size
  # ...
  if ignored_frames > 0
    if ignored_frames_before_start
      j = i = 0; cf = start_cf
      while i  last && j  size && j  start
        i += 1; j += 1; cf = cf.next
      end
      while start > 0 && ignored_frames > 0 && j > 0
        if cf.iseq? && !cf.instruction_pointer?
          ignored_frames += 1
        end
        j -= 1; ignored_frames -= 1; start -= 1; cf = cf.prev
      end
    else
      start -= ignored_frames
    end
  end
  j = i = 0; cf = start_cf
end

We first need to scan to the expected start control frame.


if start > 0 && length > 0 && length  size
  # ...
  if ignored_frames > 0
    if ignored_frames_before_start
      j = i = 0; cf = start_cf
      while i  last && j  size && j  start
        i += 1; j += 1; cf = cf.next
      end
      while start > 0 && ignored_frames > 0 && j > 0
        if cf.iseq? && !cf.instruction_pointer?
          ignored_frames += 1
        end
        j -= 1; ignored_frames -= 1; start -= 1; cf = cf.prev
      end
    else
      start -= ignored_frames
    end
  end
  j = i = 0; cf = start_cf
end

Then we need to scan backwards from that control frame, decrementing the start and the number of ignored frames until there are no ignored frames left.


if start > 0 && length > 0 && length  size
  # ...
  if ignored_frames > 0
    if ignored_frames_before_start
      j = i = 0; cf = start_cf
      while i  last && j  size && j  start
        i += 1; j += 1; cf = cf.next
      end
      while start > 0 && ignored_frames > 0 && j > 0
        if cf.iseq? && !cf.instruction_pointer?
          ignored_frames += 1
        end
        j -= 1; ignored_frames -= 1; start -= 1; cf = cf.prev
      end
    else
      start -= ignored_frames
    end
  end
  j = i = 0; cf = start_cf
end

If during the backwards scan we come across an ignored frame, we have to increment the number of ignored frames.


if start > 0 && length > 0 && length  size
  # ...
  if ignored_frames > 0
    if ignored_frames_before_start
      j = i = 0; cf = start_cf
      while i  last && j  size && j  start
        i += 1; j += 1; cf = cf.next
      end
      while start > 0 && ignored_frames > 0 && j > 0
        if cf.iseq? && !cf.instruction_pointer?
          ignored_frames += 1
        end
        j -= 1; ignored_frames -= 1; start -= 1; cf = cf.prev
      end
    else
      start -= ignored_frames
    end
  end
  j = i = 0; cf = start_cf
end

This did fix this issue, and both fixes shipped in Ruby 3.0.1. Once again I could claim victory over this issue. However, the feeling of victory is quite hollow in this case.

勝利

When you look at the code required for the optimization, it is complex, and does not bring joy. The workarounds necessary to make the existing backtrace algorithm work with the optimization is far more code than the algorithm itself.

cf = VM.end_control_frame.next.next
bt = []
size = VM.num_control_frames - 2 - start
j = i = 0

last = size
while i  last && j  size
  if i  start
    if cf.iseq? && cf.instruction_pointer?
      last_iseq_loc = cf.loc
    end
  else
    if cf.iseq?
      if cf.instruction_pointer?
        last_iseq_loc = cf.loc
        bt  cf.iseq_backtrace_entry
      else
        i -= 1
      end
    elsif cf.cfunc?
      bt  cf.cfunc_backtrace_entry(last_iseq_loc)
    end
  end
  
  cf = cf.next
  i += 1
  j += 1
end

bt.reverse!

if size  0
  return nil
elsif length && length  size
  start = last - length 
else
  start = 0
end

if start > 0 && length > 0 && length  size
  start_cf = cf
  while i  last && j  size
    if cf.iseq? && !cf.instruction_pointer?
      if j  start
        ignored_frames_before_start = true
      else
        i -= 1
      end
    end
    i += 1; j += 1; cf = cf.next
  end
  ignored_frames = j - i
  if ignored_frames > 0
    if ignored_frames_before_start
      j = i = 0; cf = start_cf
      while i  last && j  size && j  start
        i += 1; j += 1; cf = cf.next
      end
      while start > 0 && ignored_frames > 0 && j > 0
        if cf.iseq? && !cf.instruction_pointer?
          ignored_frames += 1
        end
        j -= 1; ignored_frames -= 1; start -= 1; cf = cf.prev
      end
    else
      start -= ignored_frames
    end
  end
  j = i = 0; cf = start_cf
end

This optimized approach gives us performance,

Performance

but at what cost?

Performance

at what cost?

As I was developing this presentation, I came to the conclusion that what we need is not an optimization to the previous algorithm, but an entirely new algorithm. Instead of working inward from the end control frame, we need to work outward from the current control frame. Theoretically, this should result in a significant improvement in partial backtrace performance, by only scanning the control frames that we need to scan.

New

Backtrace

Algorithm

Here is the new algorithm, again translated into Ruby code. I will go over each part.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

We will start with the top section.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

Instead of having the starting VM frame for backtrace generation be the virtual machine’s end control frame, we will use the virtual machine’s start control frame. We will still need the end control frame so that we do not process off the end of the control frame stack.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

Remember that the reason that the old algorithm originally processed from the outermost frame and worked inward is to set the location information for cfunc backtrace locations when they were created. This new algorithm cannot do that, since at the time the cfunc backtrace locations are created, we do not know the location of the calling iseq backtrace location. So we need to keep a counter of how many cfunc backtrace locations have been added.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

If the length of the requested backtrace is not set or is greater than the number of frames, we set the maximum backtrace length to be the number of frames.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

Here is the main part of the algorithm, the loop over the control frames.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

Unlike the previous algorithm, this new algorithm loops from the innermost frame, going to the previous frame in each iteration. It terminates when we have reached the end control frame, or when the size of the backtrace has reached the requested length.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

Just like the previous algorithm, we only care about iseq control frames with an instruction pointer and cfunc control frames.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

Assuming the control frame for the current iteration has one of those two frame types, if we are skipping initial backtrace frames, we decrement the number of initial backtrace frames to skip. caller defaults to skipping a single frame, but you can change this by passing the first argument to caller.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

For cfunc frames, if we are not skipping the frame, we add the backtrace location to the array, and increment the cfunc counter. This cfunc backtrace location will not currently have location information.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

For iseq frames with instruction pointers, if we are not skipping the frame, we first add the iseq backtrace location. Then if the cfunc counter is positive, we set the same location to the directly preceding cfunc backtrace locations. After that, we reset the cfunc counter to 0.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

Then we can move on to the final section. This section is only needed if the last entry in the backtrace is a cfunc backtrace location, because in that case we need to continue scanning the control frames to find the location to use.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

So we first check that the cfunc counter is positive. If it is not positive, the last entry in the backtrace is an iseq backtrace location, and the location will already be set for it.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

This scan uses pretty much the same critera as the main loop, except that we do not need to check against the backtrace size, as we will not be adding to the backtrace.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

To find the location to use for the final cfunc backtrace locations, we need to find the next iseq frame with an instruction pointer.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

If we find such an iseq frame, we update the location of previous cfunc frames, then we stop the scan.

end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size
end

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
      else
        bt  cf.iseq_backtrace_entry
        cfunc_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        end
        cfunc_counter = 0
      end
    end
  elsif cf.cfunc?
    if start > 0
      start -= 1
    else
      bt  cf.cfunc_backtrace_entry
      cfunc_counter += 1
    end
  end
  
  cf = cf.prev
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cfunc_counter.times do |i|
        bt[-i].loc = cf.loc
      end
      break
    end
    cf = cf.prev
  end
end

Switching from the old algorithm to this new algorithm took only a few hours. This new algorithm is much simpler than the previous one. It is about twice as fast as the previous optimized algorithm if you only need the most recent backtrace location, and slightly faster if you need the entire backtrace. Additionally, during testing of the new algorithm, I found an as-yet unreported bug in the previous optimization, which occured when you requested a backtrace with the starting frame after an iseq frame without an instruction pointer. Shortly thereafter, two additional bugs were reported in the previous optimization, one which resulted in a segfault, and one which resulted in an infinite loop.

New

Backtrace

Algorithm

So while I though the attempt to optimize the previous algorithm was a least a small victory after the two previous bug fixes,

勝利

It turns out it was still a defeat.

敗北

I submitted a pull request for the new algorithm, which was merged and will be part of Ruby 3.1.

https://github.com/ruby/ruby/pull/4671

Then I can once again claim victory over this performance issue, at least until the next bug report comes in.

勝利

However, I still needed to fix Ruby 3.0, and there were a few approaches I could use for that. One was reverting to the Ruby 2.7 behavior. However, reverting the optimization proved problematic because of unrelated changed to backtrace generation that were made after the optimization was committed. Another approach was backporting the entire new algorithm from the master branch to Ruby 3.0. While the new algorithm is simple, the actual change committed was large and invasive due to changing related data structures. I chose the third approach, which was trying to fix the bugs in the previous optimization while still keeping the same basic algorithm.

Ruby 3.0

It took me much longer to fix the previous optimization than to implement the new algorithm, but I was eventually able to do so. I ended up using a similar approach that the new algorithm uses, by scanning from the current control frame towards older control frames, and stopping when we found the last frame that should be included in the returned backtrace. That frame then becomes the starting frame for the backtrace generation. Unfortunately, due to limited time, I cannot walk through how that algorithm works in this presentation.

https://github.com/ruby/ruby/pull/4714

Here are the lessons I learned related to this optimization.

Lessons

Learned

First, do not be afraid of expanding your comfort zone by working on unfamiliar code. I had never looked at the vm_backtrace.c file before working on this issue. Often, we as programmers have the idea that low level code such as programming languages, databases, and operating systems must be impenetrable to newcomers. However, in general it is just code, and if you take the time to read it, you are likely to be able to understand it.

Expand Your

Comfort Zone

Second, do not get discouraged when your first attempt at making improvements results in problems. In complex systems, it is impossible to test for all possible use cases, and you should expect that any significant change could cause regressions in an untested case. It is important to understand that regressions in untested cases are the cost of improvements.

Accept & Fix

Regressions

Finally, while you often do not need a new algorithm, sometimes you do. Try to recognize cases where what you are doing is difficult because of how you are doing it, and see if taking a completely different approach will make what you are doing easier.

Consider

Switching

Algorithms

I hope you had fun learning about backtraces and how we optimized partial backtraces in Ruby 3.

Optimizing

Partial

Backtraces

in Ruby 3

If you enjoyed this presentation, and want to read more of my thoughts on Ruby programming, consider picking up a copy of Polished Ruby Programming.

That concludes my presentation. I would like to thank all of you for listening to me.

Optimizing

Partial

Backtraces

in Ruby 3

RubyKaigi 2021

GitHub: jeremyevans

Twitter: @jeremyevans0

Backtraces

Backtraces

https://bugs.ruby-lang.org/issues/17031

Backtrace

Generation

What Are

Backtraces?

Call Stack

Representation

Backtrace

Generation

Call Stack

Representation

Virtual

Machine

Virtual

Machine

Control Frames

Call Stack

Representation

Virtual

Machine

Control Frame Stack

vm_backtrace.c

勝利

敗北

https://bugs.ruby-lang.org/issues/17581

勝利

敗北

https://bugs.ruby-lang.org/issues/17581#note-5

勝利

Performance

Performance

at what cost?

New

Backtrace

Algorithm

New

Backtrace

Algorithm

勝利

敗北

https://github.com/ruby/ruby/pull/4671

勝利

Ruby 3.0

https://github.com/ruby/ruby/pull/4714

Lessons

Learned

Expand Your

Comfort Zone

Accept & Fix

Regressions

Consider

Switching

Algorithms

Optimizing

Partial

Backtraces

in Ruby 3

ありがとうございました