Be Quick

In this stage we will make the application quicker. We apply two changes:

  • issue fewer image recomputations
  • use parallel computation

There is nothing gtk-related in this chapter, but perhaps it is nice to see two techniques for making things faster. The code of this stage can be found on branch 8-parallel

Fewer recomputations

When we select a preset or when we click on the picture, several widgets are changed and each of those changes invoke a recompute. Now, there is no better optimization than avoiding unnecessary work, so let's do that.

In module state we introduce the function postpone_redraw which disables recompute_image requests. The function returns an object. When that object is dropped, recompute_image is reenabled and called.

#![allow(unused)]
fn main() {
fn on_clicked(state: &Rc<RefCell<State>>, ...) {
    gesture.set_state(gtk::EventSequenceState::Claimed);
    let _late_redraw = postpone_redraw(state);
    let (new_cx, new_cy) = state.borrow().win_to_mandel(wx, wy);
    cx_value.set_text(&new_cx.to_string());
    cy_value.set_text(&new_cy.to_string());
}
}

Here is how we use this in the handler for the click signal. After the last line of the function _late_redraw is dropped and only then a recompute of the image is requested.

#![allow(unused)]
fn main() {
fn preset_ready(state: &Rc<RefCell<State>>, ...) {
    let preset = state.borrow_mut().preset.take();
    let _late_redraw = postpone_redraw(state);
    if let Some(preset) = preset {
        ...
    }
}
}

We use it in the same way in the preset_ready handler function.

#![allow(unused)]
fn main() {
#[must_use = "if unused would redraw without postponing anything"]
pub fn postpone_redraw(state: &Rc<RefCell<State>>) -> PostponedRedraw {
    state.borrow_mut().block = true;
    PostponedRedraw {
        state: state.clone(),
    }
}

#[clippy::has_significant_drop]
pub struct PostponedRedraw {
    state: Rc<RefCell<State>>,
}

impl Drop for PostponedRedraw {
    fn drop(&mut self) {
        let mut state = self.state.borrow_mut();
        state.block = false;
        state.recompute_image();
    }
}

}

The implementation is simple. We add a field block: bool to the state, which we initialize to false. The function recompute_image returns immediately if this field is true. The function postpone_redraw sets it to true and returns a PostponedRedraw object. The drop function of that object sets block to false and redraws.

Parallel computation

The computation time for the Mandelbrot image depends on the settings. When we have a large window and look at a region where we have many iterations, it can take a few seconds. We do the same kind of computation for every pixel, so there is ample of room to do things in parallel. Depending on the number of cores of the hardware, this can give a nice speedup.

#![allow(unused)]
fn main() {
use scoped_threadpool::Pool;

pub fn mandel_producer(
    req_receiver: async_channel::Receiver<MandelReq>,
    reply_sender: async_channel::Sender<MandelReply>,
) {
    let par_count: usize;
    match thread::available_parallelism() {
        Ok(pc) => par_count = pc.into(),
        Err(_) => par_count = 8,
    }
    let mut pool = Pool::new(par_count as u32);
    ...
        make_mandel_image(&request.mapping, &request.coloring, &mut pool)
    ...
}
}

We use the crate scoped-threadpool as a threadpool. We will explain later why we need the specific features of this (not so very popular) crate. Before we start the loop in mandel_producer, we ask the standard thread library what would be a good amount of parallelism and create a thread pool of that size. In the actual code you can see that we treat the case where par_count is less than 2 specially. We don't show that here. We pass that pool as parameter to every make_mandel_image call. That function passes it through to the new function fill_mandel_image_parallel.

The job of this function is to divide the work over the threads in the threadpool. Because the function is large, we will show it in two parts.

#![allow(unused)]
fn main() {
fn fill_mandel_image_parallel(
    pool: &mut Pool,
    data: &mut [u8],
    col_producer: &Box<dyn Coloring>,
    stride: usize,
    mapping: &Mapping,
) -> bool {
    let converter = WinToMandel::from_mapping(mapping);
    let w = mapping.win_width;
    let h = mapping.win_height;
    let max = mapping.iteration_depth;
    let par_count = pool.thread_count() as usize;
    let mut splits = compute_splits(h, par_count);
    let mut end = h;
    let mut statuses = vec![true; par_count];
}

The first part of the function is the declaration of the parameters and then a section where we prepare some data. We need to fill data, which has h rows of w pixels ( = stride bytes). To divide those rows as evenly as possible over par_count threads, we compute at what rows we should split with the function compute_splits. It returns a vector of par_count elements, starting with 0 and non decreasing. We create a vector of bools to record the success or failure of each thread.

#![allow(unused)]
fn main() {
    pool.scoped(|scope| {
        let mut rest_of_data = data;
        let mut rest_of_statuses = &mut statuses[..];
        while let Some(s) = splits.pop() {
            let (cur_status, cur_data);
            (rest_of_data, cur_data) = rest_of_data.split_at_mut(ustride * s);
            (cur_status, rest_of_statuses) = rest_of_statuses.split_at_mut(1);
            let converter_ref = &converter;
            scope.execute(move || {
                cur_status[0] = fill_mandel_image_partial(
                    cur_data,
                    col_producer,
                    converter_ref,
                    w,
                    s,
                    end,
                    max,
                    ustride,
                );
            });
            end = s;
        }
    });
    statuses.into_iter().fold(true, |a, b| a && b)
}
}

The pool.scoped fuction calls the closure that we pass it and then joins all the threads that were created with scope.execute. This guarantee of joining is what makes borrowing of statuses and data possible. Otherwise, it would be impossible to guarantee that those lifetimes were limited. You can read more about scoped threads in the documentation of crate std.

The variable rest_of_data contains the data that is not yet filled and rest_of_statuses the statuses that are not yet set. Before the while loop, rest_of_data contains all data and rest_of statuses all statuses. After it both will be empty slices.

The slice function split_at_mut divides a mutable slice in two mutable slices. That is what we use here in a loop. We will work with cur_data and cur_status and will leave the other part in rest_of_data and rest_of_statuses respectively, to be handled in later rounds. The function scope.execute starts a new thread. This thread fills rows [s..end), by calling fill_mandel_image_partial.

There will be par_count rounds of the while loop. That rest_of_data becomes empty follows from the fact that the last pop yields 0. For rest_of_statuses from the fact that it has par_count elements, and we remove one status in every round. So indeed both slices will become empty.

After the scoped call we check if all threads finished succesfully and return this value.

#![allow(unused)]
fn main() {
fn compute_splits(h: usize, par_count: usize) -> Vec<usize> {
    let h_step = h / par_count;
    let h_extra = h % par_count;
    let mut splits = Vec::with_capacity(par_count);
    let mut split = 0;
    for _i in 0..h_extra {
        splits.push(split);
        split += h_step + 1;
    }
    for _i in h_extra..par_count {
        splits.push(split);
        split += h_step;
    }
    splits
}
}

You can check yourself that the above function delivers the splits we want. For instance, compute_splits(7,3) would yield [0,3,5]. That would give slices 5..7, 3..5 and 0..3. After the second for loop, the value of split will be h.

#![allow(unused)]
fn main() {
fn fill_mandel_image_partial(
    data: &mut [u8],
    col_producer: &Box<dyn Coloring>,
    converter: &WinToMandel,
    w: usize,
    h_start: usize,
    h_end: usize,
    max: u32,
    ustride: usize,
) -> bool {
    {
        let mut ok = true;
        for dy in 0..(h_end - h_start) {
            let y = converter.cvt_y(h_start + dy);
            let line = &mut data[dy * ustride..(dy + 1) * ustride];
            let mut iter = line.iter_mut();
            for wx in 0..w {
                let x = converter.cvt_x(wx);
                let mv = mandel_value(x, y, max);
                let bytes = col_producer.get_color(mv, max).to_ne_bytes();
                for i in 0..bytes.len() {
                    if let Some(v) = iter.next() {
                        *v = bytes[i];
                    } else {
                        ok = false;
                    }
                }
            }
        }
        return ok;
    }
}
}

Function fill_mandel_image_partial fills rows h_start till (and not including) h_end with the right pixel values. The code is a reasonably simple double loop over rows and columns. It converts the window coordinates to mandelbrot coordinates, computes the mandelbrot value, converts that to a color and places the bytes of the color in the right spot in the array.