一个关于wait/notify与锁关系的探究

  wait/notify 机制是解决生产者消费者问题的良药。它的核心逻辑是基于条件变量的锁机制处理。所以,它们到底是什么关系?wait()时是否需要持有锁? notify()是否需要持有锁?先说答案:都需要持有锁。

  wait需要持有锁的原因是,你肯定需要知道在哪个对象上进行等待,如果不持有锁,将无法做到对象变更时进行实时感知通知的作用,但是为了让其他对象可以操作该值的变化,它必须要先释放掉锁,然后在该节点上进行等待。不持有锁而进行wait,可能会导致长眠不起。

  notify需要持有锁的原因是,它要保证线程的安全,只有它知道数据变化了,所以它有权力去通知其他线程数据变化。而且通知完之后,不能立即释放锁,即必须在持有锁的情况下进行通知,否则notify后续的工作的线程安全性将无法保证,尽量它是在lock的范围内,但却因为锁释放,将导致不可预期的结果。而且在notify的时候,并不能真正地将对应的线程唤醒,即不能从操作系统层面唤醒线程,因为此时当前通知线程持有锁,而此时如果将其他等待线程唤醒,它们将立即参与到锁的竞争中来,而这时的竞争是一定会失败的,这可能会导致被唤醒的线程立即又进入等待队列,更糟糕的是它可能再也不会被唤醒 了。所以不能将在持有锁的时,将对应的线程真正唤醒,我们看到的notify只是从语言上下文级别,将它从等待队列转移到同步队列而已,对此操作系统一无所知。

1. 实验验证

  我们通过一个实验来看一下,wait/和notify是否会在持有锁的情况下进行。

private ReentrantLock mainLock = new ReentrantLock();

@Test
public void testWaitNotify() throws InterruptedException {
    Condition c1 \= mainLock.newCondition();
    Condition c3 \= mainLock.newCondition();

    CountDownLatch t1StartLatch \= new CountDownLatch(2);
    Thread t1 \= new Thread(() -> {
        mainLock.lock();
        try {
            System.out.println(LocalDateTime.now() \+ " - t1 start");
            c1.await();
            System.out.println(LocalDateTime.now() \+ " - t1 c1 await out");
            // 过早通知问题,导致无法测试下一步

// c3.await();
// System.out.println(LocalDateTime.now() + “ - t1 c2 await out”);
t1StartLatch.await();
System.out.println(LocalDateTime.now() + “ - t1 sleeping”);
SleepUtil.sleepMillis(10_000L);
c1.signalAll();
c3.signalAll();
System.out.println(LocalDateTime.now() + “ - t1 notified, sleeping again”);
SleepUtil.sleepMillis(10_000L);
System.out.println(LocalDateTime.now() + “ - t1 out”);
}
catch (Exception e) {
System.err.println(“t1 exception “);
e.printStackTrace();
}
finally {
mainLock.unlock();
}
}, “t1”);
Thread t2 = new Thread(() -> {
mainLock.lock();
try {
t1StartLatch.countDown();
System.out.println(LocalDateTime.now() + “ - t2 c1 signal”);
c1.signalAll();
System.out.println(LocalDateTime.now() + “ - t2 wait”);
c1.await();
System.out.println(LocalDateTime.now() + “ - t2 out”);
}
catch (Exception e) {
System.err.println(“t2 exception “);
e.printStackTrace();
}
finally {
mainLock.unlock();
}
}, “t2”);
Thread t3 = new Thread(() -> {
mainLock.lock();
try {
t1StartLatch.countDown();
System.out.println(LocalDateTime.now() + “ - t3 c3 signal”);
c3.signalAll();
System.out.println(LocalDateTime.now() + “ - t3 wait”);
c3.await();
System.out.println(LocalDateTime.now() + “ - t3 out”);
}
catch (Exception e) {
System.err.println(“t2 exception “);
e.printStackTrace();
}
finally {
mainLock.unlock();
}
}, “t3”);
t1.start();
t2.start();
t3.start();
t1.join();
System.out.println(LocalDateTime.now() + “ - main t1 out”);
t2.join();
System.out.println(LocalDateTime.now() + “ - main t2 out”);
t3.join();
System.out.println(LocalDateTime.now() + “ - main t3 out”);
}

  大概意思是,针对同一个锁,wait之后,是否可以被其他线程进入临界区?如果wait之前不通知进入,wait之后能进入,说明wait依赖于锁,而且会释放当前锁。notify之后,wait()是否会立即执行,如果必须等到notify的模块完成后,才执行,说明notify是必须要依赖于锁的。

  结果如下:

2022-03-27T20:09:43.588 - t1 start
2022-03-27T20:09:43.603 - t2 c1 signal
2022-03-27T20:09:43.603 - t2 wait
2022-03-27T20:09:43.603 - t3 c3 signal
2022-03-27T20:09:43.603 - t3 wait
2022-03-27T20:09:43.603 - t1 c1 await out
2022-03-27T20:09:43.603 - t1 sleeping
2022-03-27T20:09:53.605 - t1 notified, sleeping again
2022-03-27T20:10:03.612 - t1 out
2022-03-27T20:10:03.612 - t2 out
2022-03-27T20:10:03.612 - main t1 out
2022-03-27T20:10:03.612 - t3 out
2022-03-27T20:10:03.612 - main t2 out
2022-03-27T20:10:03.612 - main t3 out

2022-03-27T20:11:39.982 - t1 start
2022-03-27T20:11:39.982 - t2 c1 signal
2022-03-27T20:11:39.982 - t2 wait
2022-03-27T20:11:39.982 - t3 c3 signal
2022-03-27T20:11:39.982 - t3 wait
2022-03-27T20:11:39.982 - t1 c1 await out
2022-03-27T20:11:39.982 - t1 sleeping
2022-03-27T20:11:49.989 - t1 notified, sleeping again
2022-03-27T20:11:59.990 - t1 out
2022-03-27T20:11:59.990 - t2 out
2022-03-27T20:11:59.990 - main t1 out
2022-03-27T20:11:59.990 - t3 out
2022-03-27T20:11:59.990 - main t2 out
2022-03-27T20:11:59.990 - main t3 out

2. wait/notify 的实现机制

  我们以AQS的实现机制为线索,探索wait/notify机制。它在唤醒操作队列时,设置状态为 SIGNAL , 但它实际不执行操作系统唤醒。

    //     java.util.concurrent.locks.AbstractQueuedSynchronizer.ConditionObject#signalAll
    /\*\*
     \* Moves all threads from the wait queue for this condition to
     \* the wait queue for the owning lock.
     \*
     \* @throws IllegalMonitorStateException if {@link #isHeldExclusively}
     \*         returns {@code false}
     \*/
    public final void signalAll() {
        if (!isHeldExclusively())
            throw new IllegalMonitorStateException();
        Node first \= firstWaiter;
        if (first != null)
            doSignalAll(first);
    }

    // java.util.concurrent.locks.AbstractQueuedSynchronizer.ConditionObject#doSignalAll
    /\*\*
     \* Removes and transfers all nodes.
     \* @param first (non-null) the first node on condition queue
     \*/
    private void doSignalAll(Node first) {
        lastWaiter \= firstWaiter = null;
        do {
            Node next \= first.nextWaiter;
            first.nextWaiter \= null;
            transferForSignal(first);
            first \= next;
        } while (first != null);
    }
// java.util.concurrent.locks.AbstractQueuedSynchronizer#transferForSignal
/\*\*
 \* Transfers a node from a condition queue onto sync queue.
 \* Returns true if successful.
 \* @param node the node
 \* @return true if successfully transferred (else the node was
 \* cancelled before signal)
 \*/
final boolean transferForSignal(Node node) {
    /\*
     \* If cannot change waitStatus, the node has been cancelled.
     \*/
    if (!compareAndSetWaitStatus(node, Node.CONDITION, 0))
        return false;

    /\*
     \* Splice onto queue and try to set waitStatus of predecessor to
     \* indicate that thread is (probably) waiting. If cancelled or
     \* attempt to set waitStatus fails, wake up to resync (in which
     \* case the waitStatus can be transiently and harmlessly wrong).
     \*/
    Node p \= enq(node);
    int ws = p.waitStatus;
    // 不到万不得已,不会真正唤醒等待中的队列,从而满足notify无法将线程唤醒的作用,或者说线程仍然在操作系统的等待队列上
    // 它只是将当前线程移动到本语文的同步队列中,以下线程下次运行过来时可以通过该限制
    if (ws > 0 || !compareAndSetWaitStatus(p, ws, Node.SIGNAL))
        LockSupport.unpark(node.thread);
    return true;
}

/\*\*
 \* Inserts node into queue, initializing if necessary. See picture above.
 \* @param node the node to insert
 \* @return node's predecessor
 \*/
private Node enq(final Node node) {
    for (;;) {
        Node t \= tail;
        if (t == null) { // Must initialize
            if (compareAndSetHead(new Node()))
                tail \= head;
        } else {
            node.prev \= t;
            if (compareAndSetTail(t, node)) {
                t.next \= node;
                return t;
            }
        }
    }
}

    // java.util.concurrent.locks.AbstractQueuedSynchronizer.ConditionObject#await()
    /\*\*
     \* Implements interruptible condition wait.
     \* <ol>
     \* <li> If current thread is interrupted, throw InterruptedException.
     \* <li> Save lock state returned by {@link #getState}.
     \* <li> Invoke {@link #release} with saved state as argument,
     \*      throwing IllegalMonitorStateException if it fails.
     \* <li> Block until signalled or interrupted.
     \* <li> Reacquire by invoking specialized version of
     \*      {@link #acquire} with saved state as argument.
     \* <li> If interrupted while blocked in step 4, throw InterruptedException.
     \* </ol>
     \*/
    public final void await() throws InterruptedException {
        if (Thread.interrupted())
            throw new InterruptedException();
        Node node \= addConditionWaiter();
        // 进来等待队列,先释放锁,此时进入线程不安全状态
        int savedState = fullyRelease(node);
        int interruptMode = 0;
        // 此判断只是本语文级别的等待队列限制
        // notify 时只能满足这个条件,而不会将线程从操作系统挂起队列中唤醒,即不会进行 LockSupport.unpark()
        while (!isOnSyncQueue(node)) {
            // 交由操作系统进行线程挂起
            LockSupport.park(this);
            if ((interruptMode = checkInterruptWhileWaiting(node)) != 0)
                break;
        }
        // 重新进行锁的获取,尝试
        if (acquireQueued(node, savedState) && interruptMode != THROW\_IE)
            interruptMode \= REINTERRUPT;
        if (node.nextWaiter != null) // clean up if cancelled
            unlinkCancelledWaiters();
        if (interruptMode != 0)
            reportInterruptAfterWait(interruptMode);
    }
// java.util.concurrent.locks.AbstractQueuedSynchronizer#acquireQueued
/\*\*
 \* Acquires in exclusive uninterruptible mode for thread already in
 \* queue. Used by condition wait methods as well as acquire.
 \*
 \* @param node the node
 \* @param arg the acquire argument
 \* @return {@code true} if interrupted while waiting
 \*/
final boolean acquireQueued(final Node node, int arg) {
    boolean failed = true;
    try {
        boolean interrupted = false;
        for (;;) {
            final Node p = node.predecessor();
            // 获取当锁,则替换head后返回
            // 而 tryAcquire() 则由各自策略实现
            if (p == head && tryAcquire(arg)) {
                setHead(node);
                p.next \= null; // help GC
                failed = false;
                return interrupted;
            }
            // 如果获取不到锁,则重新进入操作系统等待队列
            if (shouldParkAfterFailedAcquire(p, node) &&
                parkAndCheckInterrupt())
                interrupted \= true;
        }
    } finally {
        if (failed)
            cancelAcquire(node);
    }
}

  所以,总结:

1. wait将会释放持有的锁;
2. wait将会加入到语言级别的等待队列,同时也会提交给操作系统的等待队列,做到真正的线程挂起;
3. wait将会在被操作系统唤醒后,重新进行新一轮的锁获取尝试,返回时已携带回原有的锁,从外部看起来就像锁一直都在一样;
4. notify不会真正的唤醒等待的线程,而只是将各等待线程从语言级别的等待队列移出,到语言级别的同步队列;
5. notify只有在极端情况下,才会做到线程的真正唤醒作用,比如中断,但这被唤醒的线程将无法正常进行业务操作,所以也是安全的;
6. 只有在整体的锁在进行 unlock() 的时候,才会唤醒线程,使其重新参与锁的竞争;

3. lock/unlock 流程

  同样的AQS的实现为线索,lock/unlock 流程如下:

// java.util.concurrent.locks.ReentrantLock#lock
/\*\*
 \* Acquires the lock.
 \*
 \* <p>Acquires the lock if it is not held by another thread and returns
 \* immediately, setting the lock hold count to one.
 \*
 \* <p>If the current thread already holds the lock then the hold
 \* count is incremented by one and the method returns immediately.
 \*
 \* <p>If the lock is held by another thread then the
 \* current thread becomes disabled for thread scheduling
 \* purposes and lies dormant until the lock has been acquired,
 \* at which time the lock hold count is set to one.
 \*/
public void lock() {
    sync.lock();
}

    // java.util.concurrent.locks.ReentrantLock.NonfairSync#lock
    /\*\*
     \* Performs lock.  Try immediate barge, backing up to normal
     \* acquire on failure.
     \*/
    final void lock() {
        if (compareAndSetState(0, 1))
            setExclusiveOwnerThread(Thread.currentThread());
        else
            acquire(1);
    }
// java.util.concurrent.locks.AbstractQueuedSynchronizer#acquire
/\*\*
 \* Acquires in exclusive mode, ignoring interrupts.  Implemented
 \* by invoking at least once {@link #tryAcquire},
 \* returning on success.  Otherwise the thread is queued, possibly
 \* repeatedly blocking and unblocking, invoking {@link
 \* #tryAcquire} until success.  This method can be used
 \* to implement method {@link Lock#lock}.
 \*
 \* @param arg the acquire argument.  This value is conveyed to
 \*        {@link #tryAcquire} but is otherwise uninterpreted and
 \*        can represent anything you like.
 \*/
public final void acquire(int arg) {
    if (!tryAcquire(arg) &&
        // 同上wait时的锁争抢操作
        acquireQueued(addWaiter(Node.EXCLUSIVE), arg))
        selfInterrupt();
}

// java.util.concurrent.locks.ReentrantLock#unlock
/\*\*
 \* Attempts to release this lock.
 \*
 \* <p>If the current thread is the holder of this lock then the hold
 \* count is decremented.  If the hold count is now zero then the lock
 \* is released.  If the current thread is not the holder of this
 \* lock then {@link IllegalMonitorStateException} is thrown.
 \*
 \* @throws IllegalMonitorStateException if the current thread does not
 \*         hold this lock
 \*/
public void unlock() {
    sync.release(1);
}

// java.util.concurrent.locks.AbstractQueuedSynchronizer#release
/\*\*
 \* Releases in exclusive mode.  Implemented by unblocking one or
 \* more threads if {@link #tryRelease} returns true.
 \* This method can be used to implement method {@link Lock#unlock}.
 \*
 \* @param arg the release argument.  This value is conveyed to
 \*        {@link #tryRelease} but is otherwise uninterpreted and
 \*        can represent anything you like.
 \* @return the value returned from {@link #tryRelease}
 \*/
public final boolean release(int arg) {
    if (tryRelease(arg)) {
        Node h \= head;
        // 直接唤醒头节点(真正的唤醒)
        if (h != null && h.waitStatus != 0)
            unparkSuccessor(h);
        return true;
    }
    return false;
}

// java.util.concurrent.locks.AbstractQueuedSynchronizer#unparkSuccessor
/\*\*
 \* Wakes up node's successor, if one exists.
 \*
 \* @param node the node
 \*/
private void unparkSuccessor(Node node) {
    /\*
     \* If status is negative (i.e., possibly needing signal) try
     \* to clear in anticipation of signalling.  It is OK if this
     \* fails or if status is changed by waiting thread.
     \*/
    int ws = node.waitStatus;
    if (ws < 0)
        compareAndSetWaitStatus(node, ws, 0);

    /\*
     \* Thread to unpark is held in successor, which is normally
     \* just the next node.  But if cancelled or apparently null,
     \* traverse backwards from tail to find the actual
     \* non-cancelled successor.
     \*/
    Node s \= node.next;
    if (s == null || s.waitStatus > 0) {
        s \= null;
        for (Node t = tail; t != null && t != node; t = t.prev)
            if (t.waitStatus <= 0)
                s \= t;
    }
    // 真正唤醒线程,只有一个线程将被唤醒
    if (s != null)
        LockSupport.unpark(s.thread);
}

  总结: lock/unlock 是一个真正的上锁解锁操作,上锁时如未成功,则进行park()进行操作系统挂起,解锁时将头节点unpark()交由操作系统调度。

4. 唤醒多个等待线程

  如何唤醒多个等待线程?共享锁有这个需求,其实也是notifyAll 的表面语义所在。

// java.util.concurrent.locks.AbstractQueuedSynchronizer#releaseShared
/\*\*
 \* Releases in shared mode.  Implemented by unblocking one or more
 \* threads if {@link #tryReleaseShared} returns true.
 \*
 \* @param arg the release argument.  This value is conveyed to
 \*        {@link #tryReleaseShared} but is otherwise uninterpreted
 \*        and can represent anything you like.
 \* @return the value returned from {@link #tryReleaseShared}
 \*/
public final boolean releaseShared(int arg) {
    if (tryReleaseShared(arg)) {
        doReleaseShared();
        return true;
    }
    return false;
}

// java.util.concurrent.locks.AbstractQueuedSynchronizer#doReleaseShared
/\*\*
 \* Release action for shared mode -- signals successor and ensures
 \* propagation. (Note: For exclusive mode, release just amounts
 \* to calling unparkSuccessor of head if it needs signal.)
 \*/
private void doReleaseShared() {
    /\*
     \* Ensure that a release propagates, even if there are other
     \* in-progress acquires/releases.  This proceeds in the usual
     \* way of trying to unparkSuccessor of head if it needs
     \* signal. But if it does not, status is set to PROPAGATE to
     \* ensure that upon release, propagation continues.
     \* Additionally, we must loop in case a new node is added
     \* while we are doing this. Also, unlike other uses of
     \* unparkSuccessor, we need to know if CAS to reset status
     \* fails, if so rechecking.
     \*/
    for (;;) {
        Node h \= head;
        if (h != null && h != tail) {
            int ws = h.waitStatus;
            if (ws == Node.SIGNAL) {
                if (!compareAndSetWaitStatus(h, Node.SIGNAL, 0))
                    continue;            // loop to recheck cases
                // 唤醒头节点
                unparkSuccessor(h);
            }
            // 因为上一头节点刚刚被设置为0,说明正在执行中,设置当前head为 PROPAGATE
            else if (ws == 0 &&
                     !compareAndSetWaitStatus(h, 0, Node.PROPAGATE))
                continue;                // loop on failed CAS
        }
        // 即尽量只设置一个 head 节点即可
        // 除非在这期间发生变更
        if (h == head)                   // loop if head changed
            break;
    }
}


// java.util.concurrent.locks.AbstractQueuedSynchronizer#acquireSharedInterruptibly
/\*\*
 \* Acquires in shared mode, aborting if interrupted.  Implemented
 \* by first checking interrupt status, then invoking at least once
 \* {@link #tryAcquireShared}, returning on success.  Otherwise the
 \* thread is queued, possibly repeatedly blocking and unblocking,
 \* invoking {@link #tryAcquireShared} until success or the thread
 \* is interrupted.
 \* @param arg the acquire argument.
 \* This value is conveyed to {@link #tryAcquireShared} but is
 \* otherwise uninterpreted and can represent anything
 \* you like.
 \* @throws InterruptedException if the current thread is interrupted
 \*/
public final void acquireSharedInterruptibly(int arg)
        throws InterruptedException {
    if (Thread.interrupted())
        throw new InterruptedException();
    if (tryAcquireShared(arg) < 0)
        doAcquireSharedInterruptibly(arg);
}
// java.util.concurrent.locks.AbstractQueuedSynchronizer#doAcquireSharedInterruptibly
/\*\*
 \* Acquires in shared interruptible mode.
 \* @param arg the acquire argument
 \*/
private void doAcquireSharedInterruptibly(int arg)
    throws InterruptedException {
    final Node node = addWaiter(Node.SHARED);
    boolean failed = true;
    try {
        for (;;) {
            final Node p = node.predecessor();
            if (p == head) {
                int r = tryAcquireShared(arg);
                if (r >= 0) {
                    // 共享式锁的传播性质实现
                    setHeadAndPropagate(node, r);
                    p.next \= null; // help GC
                    failed = false;
                    return;
                }
            }
            if (shouldParkAfterFailedAcquire(p, node) &&
                parkAndCheckInterrupt())
                throw new InterruptedException();
        }
    } finally {
        if (failed)
            cancelAcquire(node);
    }
}

// java.util.concurrent.locks.AbstractQueuedSynchronizer#setHeadAndPropagate
/\*\*
 \* Sets head of queue, and checks if successor may be waiting
 \* in shared mode, if so propagating if either propagate > 0 or
 \* PROPAGATE status was set.
 \*
 \* @param node the node
 \* @param propagate the return value from a tryAcquireShared
 \*/
private void setHeadAndPropagate(Node node, int propagate) {
    Node h \= head; // Record old head for check below
    setHead(node);
    /\*
     \* Try to signal next queued node if:
     \*   Propagation was indicated by caller,
     \*     or was recorded (as h.waitStatus either before
     \*     or after setHead) by a previous operation
     \*     (note: this uses sign-check of waitStatus because
     \*      PROPAGATE status may transition to SIGNAL.)
     \* and
     \*   The next node is waiting in shared mode,
     \*     or we don't know, because it appears null
     \*
     \* The conservatism in both of these checks may cause
     \* unnecessary wake-ups, but only when there are multiple
     \* racing acquires/releases, so most need signals now or soon
     \* anyway.
     \*/
    if (propagate > 0 || h == null || h.waitStatus < 0 ||
        (h \= head) == null || h.waitStatus < 0) {
        Node s \= node.next;
        // 递归进行唤醒下一线程节点,从而级联唤醒
        if (s == null || s.isShared())
            doReleaseShared();
    }
}

/\*\*
 \* Release action for shared mode -- signals successor and ensures
 \* propagation. (Note: For exclusive mode, release just amounts
 \* to calling unparkSuccessor of head if it needs signal.)
 \*/
private void doReleaseShared() {
    /\*
     \* Ensure that a release propagates, even if there are other
     \* in-progress acquires/releases.  This proceeds in the usual
     \* way of trying to unparkSuccessor of head if it needs
     \* signal. But if it does not, status is set to PROPAGATE to
     \* ensure that upon release, propagation continues.
     \* Additionally, we must loop in case a new node is added
     \* while we are doing this. Also, unlike other uses of
     \* unparkSuccessor, we need to know if CAS to reset status
     \* fails, if so rechecking.
     \*/
    for (;;) {
        Node h \= head;
        if (h != null && h != tail) {
            int ws = h.waitStatus;
            if (ws == Node.SIGNAL) {
                if (!compareAndSetWaitStatus(h, Node.SIGNAL, 0))
                    continue;            // loop to recheck cases
                unparkSuccessor(h);
            }
            else if (ws == 0 &&
                     !compareAndSetWaitStatus(h, 0, Node.PROPAGATE))
                continue;                // loop on failed CAS
        }
        if (h == head)                   // loop if head changed
            break;
    }
}

  总结: 多个线程的唤醒,主要是使用了级联唤醒的机制,在做共享锁时,根据现有的情况,进行唤醒下一线程。而当线程调度很快或算法不确定时,就会给人一种所有线程一起被唤醒工作的效果。

不要害怕今日的苦,你要相信明天,更苦!