From graham.hazel at gmail.com Tue Jul 20 13:00:55 2021 From: graham.hazel at gmail.com (Graham Hazel) Date: Tue, 20 Jul 2021 14:00:55 +0100 Subject: [libnice] TRANSITION(CONNECTING, GATHERING) in agent.c/agent_signal_component_state_change Message-ID: Hi, I'm running the Janus WebRTC Server to handle some video transport and on several occasions have observed an assertion firing in libnice (agent/agent.c, line 2573) during an ICE restart. It appears likely that adding TRANSITION(CONNECTING, GATHERING) to the list of valid state changes in the assert statement "fixes" the issue. Now, I fully admit both (a) I'm probably doing "something weird" in my setup to hit this case surprisingly often; and (b) I have very little idea what any of the libnice code is doing. So my very naive question is: is this actually a valid transition that's missing from the assertion, or is something bad happening that I'm now ignoring? Thanks! Graham -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.crete at collabora.com Tue Jul 20 15:47:56 2021 From: olivier.crete at collabora.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=) Date: Tue, 20 Jul 2021 11:47:56 -0400 Subject: [libnice] TRANSITION(CONNECTING, GATHERING) in agent.c/agent_signal_component_state_change In-Reply-To: References: Message-ID: Hi, Can you get a stack trace when the assertion is triggered? It's definitely not a valid transition. How often can you reproduce this? Do you have an easy way to trigger it ? If you're running a production server, you can always compile with - DG_DISABLE_ASSERT if you want to not abort. But it would really help me if you have a way to know where this invalid transition is called from. Olivier On Tue, 2021-07-20 at 14:00 +0100, Graham Hazel wrote: > Hi, > > I'm running the Janus WebRTC Server to handle some video transport and > on several occasions have observed an assertion firing in libnice > (agent/agent.c, line 2573) during an ICE restart. > > It appears likely that adding TRANSITION(CONNECTING, GATHERING) to the > list of valid state changes in the assert statement "fixes" the issue. > > Now, I fully admit both (a) I'm probably doing "something weird" in my > setup to hit this case surprisingly often; and (b) I have very little > idea what any of the libnice code is doing. So my very naive question > is: is this actually a valid transition that's missing from the > assertion, or is something bad happening that I'm now ignoring? > > Thanks! > Graham > > _______________________________________________ > nice mailing list > nice at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nice -- Olivier Cr?te olivier.crete at collabora.com From graham.hazel at gmail.com Tue Jul 20 16:20:15 2021 From: graham.hazel at gmail.com (Graham Hazel) Date: Tue, 20 Jul 2021 17:20:15 +0100 Subject: [libnice] TRANSITION(CONNECTING, GATHERING) in agent.c/agent_signal_component_state_change In-Reply-To: References: Message-ID: Hi Olivier, Thanks for your reply. It happened ~8-10 times over ~20k CPU-hours of execution (and has happened 0 times in well over 10k hours since I vandalised the assertion statement). I'm afraid I don't have an easy way to reproduce it, nor a useable stack trace, but by code inspection and correlation with Janus logs I infer that the call stack is: Janus function janus_ice_restart calls nice_agent_restart -> nice_agent_restart calls nice_stream_restart -> nice_stream_restart calls agent_signal_component_state_change I hope that helps! Graham On Tue, Jul 20, 2021 at 4:48 PM Olivier Cr?te wrote: > Hi, > > Can you get a stack trace when the assertion is triggered? It's > definitely not a valid transition. How often can you reproduce this? Do > you have an easy way to trigger it ? > > If you're running a production server, you can always compile with - > DG_DISABLE_ASSERT if you want to not abort. But it would really help me > if you have a way to know where this invalid transition is called from. > > Olivier > > On Tue, 2021-07-20 at 14:00 +0100, Graham Hazel wrote: > > Hi, > > > > I'm running the Janus WebRTC Server to handle some video transport and > > on several occasions have observed an assertion firing in libnice > > (agent/agent.c, line 2573) during an ICE restart. > > > > It appears likely that adding TRANSITION(CONNECTING, GATHERING) to the > > list of valid state changes in the assert statement "fixes" the issue. > > > > Now, I fully admit both (a) I'm probably doing "something weird" in my > > setup to hit this case surprisingly often; and (b) I have very little > > idea what any of the libnice code is doing. So my very naive question > > is: is this actually a valid transition that's missing from the > > assertion, or is something bad happening that I'm now ignoring? > > > > Thanks! > > Graham > > > > _______________________________________________ > > nice mailing list > > nice at lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/nice > > -- > Olivier Cr?te > olivier.crete at collabora.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.crete at collabora.com Tue Jul 20 16:25:29 2021 From: olivier.crete at collabora.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=) Date: Tue, 20 Jul 2021 12:25:29 -0400 Subject: [libnice] TRANSITION(CONNECTING, GATHERING) in agent.c/agent_signal_component_state_change In-Reply-To: References: Message-ID: <50fc880fd0245774083c4a2da19f5f4e39a7c0ef.camel@collabora.com> Hi, That actually really helps! Which exact version of libnice are you using ? On Tue, 2021-07-20 at 17:20 +0100, Graham Hazel wrote: > Hi Olivier, > > Thanks for your reply. It happened ~8-10 times over ~20k CPU-hours of > execution (and has happened 0 times in well over 10k hours since I > vandalised the assertion statement). > I'm afraid I don't have an easy way to reproduce it, nor a useable > stack trace, but by code inspection and correlation with Janus logs I > infer that the call stack is: > > Janus function janus_ice_restart calls nice_agent_restart -> > nice_agent_restart calls nice_stream_restart -> nice_stream_restart > calls agent_signal_component_state_change > > I hope that helps! > > Graham > > > > > On Tue, Jul 20, 2021 at 4:48 PM Olivier Cr?te > wrote: > > Hi, > > > > Can you get a stack trace when the assertion is triggered? It's > > definitely not a valid transition. How often can you reproduce > > this? Do > > you have an easy way to trigger it ? > > > > If you're running a production server, you can always compile with - > > DG_DISABLE_ASSERT if you want to not abort. But it would really > > help me > > if you have a way to know where this invalid transition is called > > from. > > > > Olivier > > > > On Tue, 2021-07-20 at 14:00 +0100, Graham Hazel wrote: > > > Hi, > > > > > > I'm running the Janus WebRTC Server to handle some video > > transport and > > > on several occasions have observed an assertion firing in libnice > > > (agent/agent.c, line 2573) during an ICE restart. > > > > > > It appears likely that adding TRANSITION(CONNECTING, GATHERING) > > to the > > > list of valid state changes in the assert statement "fixes" the > > issue. > > > > > > Now, I fully admit both (a) I'm probably doing "something weird" > > in my > > > setup to hit this case surprisingly often; and (b) I have very > > little > > > idea what any of the libnice code is doing. So my very naive > > question > > > is: is this actually a valid transition that's missing from the > > > assertion, or is something bad happening that I'm now ignoring? > > > > > > Thanks! > > > Graham > > > > > > _______________________________________________ > > > nice mailing list > > > nice at lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/nice > > -- Olivier Cr?te olivier.crete at collabora.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham.hazel at gmail.com Tue Jul 20 16:57:02 2021 From: graham.hazel at gmail.com (Graham Hazel) Date: Tue, 20 Jul 2021 17:57:02 +0100 Subject: [libnice] TRANSITION(CONNECTING, GATHERING) in agent.c/agent_signal_component_state_change In-Reply-To: <50fc880fd0245774083c4a2da19f5f4e39a7c0ef.camel@collabora.com> References: <50fc880fd0245774083c4a2da19f5f4e39a7c0ef.camel@collabora.com> Message-ID: Hi, AFAICS the HEAD of master (commit a0cfef727...) which helpfully predates all my hackery, so I'm fairly sure both failing and "succeeding" builds are using the same version. On Tue, Jul 20, 2021 at 5:25 PM Olivier Cr?te wrote: > Hi, > > That actually really helps! Which exact version of libnice are you using ? > > On Tue, 2021-07-20 at 17:20 +0100, Graham Hazel wrote: > > Hi Olivier, > > Thanks for your reply. It happened ~8-10 times over ~20k CPU-hours of > execution (and has happened 0 times in well over 10k hours since I > vandalised the assertion statement). > I'm afraid I don't have an easy way to reproduce it, nor a useable stack > trace, but by code inspection and correlation with Janus logs I infer that > the call stack is: > > Janus function janus_ice_restart calls nice_agent_restart -> > nice_agent_restart calls nice_stream_restart -> nice_stream_restart calls > agent_signal_component_state_change > > I hope that helps! > > Graham > > > > > On Tue, Jul 20, 2021 at 4:48 PM Olivier Cr?te > wrote: > > Hi, > > Can you get a stack trace when the assertion is triggered? It's > definitely not a valid transition. How often can you reproduce this? Do > you have an easy way to trigger it ? > > If you're running a production server, you can always compile with - > DG_DISABLE_ASSERT if you want to not abort. But it would really help me > if you have a way to know where this invalid transition is called from. > > Olivier > > On Tue, 2021-07-20 at 14:00 +0100, Graham Hazel wrote: > > Hi, > > > > I'm running the Janus WebRTC Server to handle some video transport and > > on several occasions have observed an assertion firing in libnice > > (agent/agent.c, line 2573) during an ICE restart. > > > > It appears likely that adding TRANSITION(CONNECTING, GATHERING) to the > > list of valid state changes in the assert statement "fixes" the issue. > > > > Now, I fully admit both (a) I'm probably doing "something weird" in my > > setup to hit this case surprisingly often; and (b) I have very little > > idea what any of the libnice code is doing. So my very naive question > > is: is this actually a valid transition that's missing from the > > assertion, or is something bad happening that I'm now ignoring? > > > > Thanks! > > Graham > > > > _______________________________________________ > > nice mailing list > > nice at lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/nice > > > -- > > Olivier Cr?teolivier.crete at collabora.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.crete at collabora.com Tue Jul 20 17:11:17 2021 From: olivier.crete at collabora.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=) Date: Tue, 20 Jul 2021 13:11:17 -0400 Subject: [libnice] TRANSITION(CONNECTING, GATHERING) in agent.c/agent_signal_component_state_change In-Reply-To: References: <50fc880fd0245774083c4a2da19f5f4e39a7c0ef.camel@collabora.com> Message-ID: <02f06dc13b230cb76a6449872bfbc4f6b316d127.camel@collabora.com> Hi, You were actually right, this transition was made possibly by a recent commit, I filed a MR to allow it. Olivier On Tue, 2021-07-20 at 17:57 +0100, Graham Hazel wrote: > Hi, > > AFAICS the HEAD of master (commit?a0cfef727...) which helpfully > predates all my hackery, so I'm fairly sure both failing and > "succeeding" builds are using the same version. > > > > On Tue, Jul 20, 2021 at 5:25 PM Olivier Cr?te > wrote: > > Hi, > > > > That actually really helps! Which exact version of libnice are you > > using ? > > > > On Tue, 2021-07-20 at 17:20 +0100, Graham Hazel wrote: > > > Hi Olivier, > > > > > > Thanks for your reply. It happened ~8-10 times over ~20k CPU- > > > hours of execution (and has happened 0 times in well over 10k > > > hours since I vandalised the assertion statement). > > > I'm afraid I don't have an easy way to reproduce it, nor a > > > useable stack trace, but by code inspection and correlation with > > > Janus logs I infer that the call stack is: > > > > > > Janus function janus_ice_restart calls nice_agent_restart -> > > > nice_agent_restart calls nice_stream_restart -> > > > nice_stream_restart calls agent_signal_component_state_change > > > > > > I hope that helps! > > > > > > Graham > > > > > > > > > > > > > > > On Tue, Jul 20, 2021 at 4:48 PM Olivier Cr?te > > > wrote: > > > > Hi, > > > > > > > > Can you get a stack trace when the assertion is triggered? It's > > > > definitely not a valid transition. How often can you reproduce > > > > this? Do > > > > you have an easy way to trigger it ? > > > > > > > > If you're running a production server, you can always compile > > > > with - > > > > DG_DISABLE_ASSERT if you want to not abort. But it would really > > > > help me > > > > if you have a way to know where this invalid transition is > > > > called from. > > > > > > > > Olivier > > > > > > > > On Tue, 2021-07-20 at 14:00 +0100, Graham Hazel wrote: > > > > > Hi, > > > > > > > > > > I'm running the Janus WebRTC Server to handle some video > > > > transport and > > > > > on several occasions have observed an assertion firing in > > > > libnice > > > > > (agent/agent.c, line 2573) during an ICE restart. > > > > > > > > > > It appears likely that adding TRANSITION(CONNECTING, > > > > GATHERING) to the > > > > > list of valid state changes in the assert statement "fixes" > > > > the issue. > > > > > > > > > > Now, I fully admit both (a) I'm probably doing "something > > > > weird" in my > > > > > setup to hit this case surprisingly often; and (b) I have > > > > very little > > > > > idea what any of the libnice code is doing. So my very naive > > > > question > > > > > is: is this actually a valid transition that's missing from > > > > the > > > > > assertion, or is something bad happening that I'm now > > > > ignoring? > > > > > > > > > > Thanks! > > > > > Graham > > > > > > > > > > _______________________________________________ > > > > > nice mailing list > > > > > nice at lists.freedesktop.org > > > > > https://lists.freedesktop.org/mailman/listinfo/nice > > > > > > > > -- Olivier Cr?te olivier.crete at collabora.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.crete at collabora.com Tue Jul 20 17:15:31 2021 From: olivier.crete at collabora.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=) Date: Tue, 20 Jul 2021 13:15:31 -0400 Subject: [libnice] TRANSITION(CONNECTING, GATHERING) in agent.c/agent_signal_component_state_change In-Reply-To: <02f06dc13b230cb76a6449872bfbc4f6b316d127.camel@collabora.com> References: <50fc880fd0245774083c4a2da19f5f4e39a7c0ef.camel@collabora.com> <02f06dc13b230cb76a6449872bfbc4f6b316d127.camel@collabora.com> Message-ID: <80919ebaf460c4ad01d2377226501330a60e6a07.camel@collabora.com> Hi, And here is the MR https://gitlab.freedesktop.org/libnice/libnice/-/merge_requests/204 I'm fixing the CI now to be able to merge it. Olivier On Tue, 2021-07-20 at 13:11 -0400, Olivier Cr?te wrote: > Hi, > > You were actually right, this transition was made possibly by a > recent commit, I filed a MR to allow it. > > Olivier > > On Tue, 2021-07-20 at 17:57 +0100, Graham Hazel wrote: > > Hi, > > > > AFAICS the HEAD of master (commit?a0cfef727...) which helpfully > > predates all my hackery, so I'm fairly sure both failing and > > "succeeding" builds are using the same version. > > > > > > > > On Tue, Jul 20, 2021 at 5:25 PM Olivier Cr?te > > wrote: > > > Hi, > > > > > > That actually really helps! Which exact version of libnice are > > > you using ? > > > > > > On Tue, 2021-07-20 at 17:20 +0100, Graham Hazel wrote: > > > > Hi Olivier, > > > > > > > > Thanks for your reply. It happened ~8-10 times over ~20k CPU- > > > > hours of execution (and has happened 0 times in well over 10k > > > > hours since I vandalised the assertion statement). > > > > I'm afraid I don't have an easy way to reproduce it, nor a > > > > useable stack trace, but by code inspection and correlation > > > > with Janus logs I infer that the call stack is: > > > > > > > > Janus function janus_ice_restart calls nice_agent_restart -> > > > > nice_agent_restart calls nice_stream_restart -> > > > > nice_stream_restart calls agent_signal_component_state_change > > > > > > > > I hope that helps! > > > > > > > > Graham > > > > > > > > > > > > > > > > > > > > On Tue, Jul 20, 2021 at 4:48 PM Olivier Cr?te > > > > wrote: > > > > > Hi, > > > > > > > > > > Can you get a stack trace when the assertion is triggered? > > > > > It's > > > > > definitely not a valid transition. How often can you > > > > > reproduce this? Do > > > > > you have an easy way to trigger it ? > > > > > > > > > > If you're running a production server, you can always compile > > > > > with - > > > > > DG_DISABLE_ASSERT if you want to not abort. But it would > > > > > really help me > > > > > if you have a way to know where this invalid transition is > > > > > called from. > > > > > > > > > > Olivier > > > > > > > > > > On Tue, 2021-07-20 at 14:00 +0100, Graham Hazel wrote: > > > > > > Hi, > > > > > > > > > > > > I'm running the Janus WebRTC Server to handle some video > > > > > transport and > > > > > > on several occasions have observed an assertion firing in > > > > > libnice > > > > > > (agent/agent.c, line 2573) during an ICE restart. > > > > > > > > > > > > It appears likely that adding TRANSITION(CONNECTING, > > > > > GATHERING) to the > > > > > > list of valid state changes in the assert statement "fixes" > > > > > the issue. > > > > > > > > > > > > Now, I fully admit both (a) I'm probably doing "something > > > > > weird" in my > > > > > > setup to hit this case surprisingly often; and (b) I have > > > > > very little > > > > > > idea what any of the libnice code is doing. So my very > > > > > naive question > > > > > > is: is this actually a valid transition that's missing from > > > > > the > > > > > > assertion, or is something bad happening that I'm now > > > > > ignoring? > > > > > > > > > > > > Thanks! > > > > > > Graham > > > > > > > > > > > > _______________________________________________ > > > > > > nice mailing list > > > > > > nice at lists.freedesktop.org > > > > > > https://lists.freedesktop.org/mailman/listinfo/nice > > > > > > > > > > > > > _______________________________________________ > nice mailing list > nice at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nice -- Olivier Cr?te olivier.crete at collabora.com -------------- next part -------------- An HTML attachment was scrubbed... URL: