Skip to content

Conversation

@adrian-niculescu
Copy link

This PR fixes five bugs in Timers.cpp that could cause timers to be lost, file descriptors to leak, and memory to accumulate.

1. Inverted EAGAIN retry condition (line 162)

When the pipe buffer fills up and write() returns EAGAIN, the retry loop had the wrong condition:

// Before - wrong: continues if timer IS deleted
while (!stopped && deletedTimers_.find(timer->id) != deletedTimers_.end() && ...)

// After - correct: continues if timer is NOT deleted
while (!stopped && deletedTimers_.find(timer->id) == deletedTimers_.end() && ...)

This caused live timers to be silently dropped when the pipe was full, because the loop never executed for timers that weren't deleted.

Additionally, if a timer is cancelled while waiting for buffer space, the fix now cleans up deletedTimers_ to prevent leaking the timer ID:

// After the retry loop
if (deletedTimers_.find(timer->id) != deletedTimers_.end()) {
    deletedTimers_.erase(timer->id);
}

2. File descriptor leak (line 192)

Destroy() was closing only the read end of the pipe:

close(fd_[0]);
// fd_[1] never closed - leaked one FD per isolate lifetime

Now both ends are properly closed:

close(fd_[0]);
close(fd_[1]);

3. deletedTimers_ accumulation (line 323)

Immediate timers (delay ≤ 0) skip the sortedTimers_ queue and write directly to the pipe. If cleared before the callback runs, their IDs were added to deletedTimers_ but never removed.

Added cleanup in PumpTimerLoopCallback:

auto it = thiz->timerMap_.find(timerId);
if (it != thiz->timerMap_.end()) {
    // ... execute callback ...
} else {
    // Timer was cleared before callback ran - clean up
    std::lock_guard<std::mutex> lock(thiz->mutex);
    thiz->deletedTimers_.erase(timerId);
}

4. Data race on stopped flag (Timers.h line 131)

The stopped flag was written under mutex in Destroy() but read without synchronization in PumpTimerLoopCallback(), causing undefined behavior per C++ standard.

// Before
bool stopped = false;

// After
std::atomic<bool> stopped = false;

5. Non-EAGAIN write errors drop timers (line 92)

The condition if (result != -1 || errno != EAGAIN) treated all non-EAGAIN errors (EPIPE, EBADF, EINTR) as success, disabling fallback scheduling:

// Before - wrong: treats EPIPE/EBADF as success
if (result != -1 || errno != EAGAIN) {
    needsScheduling = false;  // timer lost!
}

// After - correct: only success disables scheduling
if (result != -1) {
    needsScheduling = false;
} else if (errno == EAGAIN) {
    isBufferFull = true;
}
// Other errors: needsScheduling stays true, timer goes to sortedTimers_

Additional fix: isBufferFull reset guard (line 171)

The isBufferFull optimization flag was being reset even when write() failed with non-EAGAIN errors:

// Before - resets on any non-EAGAIN result
} else if (isBufferFull.load() && ...) {
    isBufferFull = false;
}

// After - only resets on actual success
} else if (result != -1 && isBufferFull.load() && ...) {
    isBufferFull = false;
}

This prevents incorrectly re-enabling the immediate timer optimization when the pipe is broken.

  Fixed five bugs in Timers.cpp:
  - Inverted EAGAIN retry condition causing timer loss when pipe buffer full
  - File descriptor leak (fd_[1] never closed in Destroy)
  - deletedTimers_ accumulation for immediate timers cleared before callback
  - Data race on stopped flag (now atomic)
  - Non-EAGAIN write errors incorrectly treated as success

  Additional improvements:
  - Clean up deletedTimers_ when timer cancelled during EAGAIN retry
  - Guard isBufferFull reset to only occur on actual write success
@adrian-niculescu adrian-niculescu marked this pull request as draft December 2, 2025 21:53
@adrian-niculescu adrian-niculescu marked this pull request as ready for review December 2, 2025 21:58
@edusperoni
Copy link
Collaborator

@adrian-niculescu thanks for the fixes!

Unfortunately, I think we'll do a full refactor of timers soon. When I initally wrote it I thought that the NDK looper was similar to the Java looper. Turns out they're completely different, which leads to breaking hevavior. Example:

newTimeout(() => console.log(1), 0);
oldTimeout(() => console.log(2), 0);

for(let i =0; i< 100; i++) newTimeout(() => console.log('loop', i), 0);

oldTimeout(() => console.log(3), 10);

You'd expect it to run something like:
1, 2, loop 0-99, 3
but you'll often get things like:
2, 1, loop 0-30, 3, loop 31-99

which means that the ordering is wrong and you'll get some Java code executing before the timeouts. In fact, if you replace setTimeout with the new timeouts, you gets tons of issues even with the example apps. So the new idea is to implement as a mix of java and C++ (doing the most to avoid crossing the JNI bridge unecessarily)

@adrian-niculescu
Copy link
Author

adrian-niculescu commented Dec 2, 2025

@adrian-niculescu thanks for the fixes!

Unfortunately, I think we'll do a full refactor of timers soon. When I initally wrote it I thought that the NDK looper was similar to the Java looper. Turns out they're completely different, which leads to breaking hevavior. Example:

newTimeout(() => console.log(1), 0);
oldTimeout(() => console.log(2), 0);

for(let i =0; i< 100; i++) newTimeout(() => console.log('loop', i), 0);

oldTimeout(() => console.log(3), 10);

You'd expect it to run something like: 1, 2, loop 0-99, 3 but you'll often get things like: 2, 1, loop 0-30, 3, loop 31-99

which means that the ordering is wrong and you'll get some Java code executing before the timeouts. In fact, if you replace setTimeout with the new timeouts, you gets tons of issues even with the example apps. So the new idea is to implement as a mix of java and C++ (doing the most to avoid crossing the JNI bridge unecessarily)

Cool! Looking forward to it! I remember using the timers in tns-core-modules - they were basically straight NativeScript plugins, which worked fine for a long time until your more performant 2022-2023 implementation of these timers in the Android and iOS runtimes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants