Discussion:
[libusb] Synchronising multiple endpoints
Chris E
2017-06-19 09:16:16 UTC
Permalink
Hi everyone.
I have an application where I need to sample multiple isochronous endpoints
at the same time.

However, I'm running into a problem. On slow machines (mainly Android
phones), it takes more than 1ms to submit all of the transfers.
This leads to a situation where the different endpoints are out of sync.
This means that, for example, while the application may be reading frames
100-200 on Endpoint 0, I'll be reading frames 101-201 on endpoint 6! This
obviously causes errors with the packet ordering that leads to all kinds of
chaos further up the chain.

On Windows (using LibusbK), this is simple to work around since you can
submit delayed transfers and are provided with frame numbers when the
transfer completes.

However, I could not see a simple way to synchronise my endpoints with
plain ol' Libusb-1.0.
Does anyone have any ideas?
Tim Roberts
2017-06-19 17:03:36 UTC
Permalink
Post by Chris E
I have an application where I need to sample multiple isochronous
endpoints at the same time.
However, I'm running into a problem. On slow machines (mainly Android
phones), it takes more than 1ms to submit all of the transfers.
Is that because you're submitting a slew of requests on endpoint 2, and
then a slew of requests on endpoint 6? Couldn't you alternate the
endpoints when you submit?
Post by Chris E
This leads to a situation where the different endpoints are out of
sync. This means that, for example, while the application may be
reading frames 100-200 on Endpoint 0, I'll be reading frames 101-201
on endpoint 6! This obviously causes errors with the packet ordering
that leads to all kinds of chaos further up the chain.
Are there no timestamps in the data to allow you to synchronize them?
If so, then you may have an unsolvable problem. No matter how fast you
are, there will always be a chance that your requests will cross a frame
boundary.
--
Tim Roberts, ***@probo.com
Providenza & Boekelheide, Inc.
Alan Stern
2017-06-19 18:44:52 UTC
Permalink
Post by Tim Roberts
Post by Chris E
I have an application where I need to sample multiple isochronous
endpoints at the same time.
However, I'm running into a problem. On slow machines (mainly Android
phones), it takes more than 1ms to submit all of the transfers.
Is that because you're submitting a slew of requests on endpoint 2, and
then a slew of requests on endpoint 6? Couldn't you alternate the
endpoints when you submit?
Of course, even if he did that, it would still be possible that a frame
boundary could occur at an unexpected time and mess up the
synchronization.
Post by Tim Roberts
Post by Chris E
This leads to a situation where the different endpoints are out of
sync. This means that, for example, while the application may be
reading frames 100-200 on Endpoint 0, I'll be reading frames 101-201
on endpoint 6! This obviously causes errors with the packet ordering
that leads to all kinds of chaos further up the chain.
Are there no timestamps in the data to allow you to synchronize them?
If so, then you may have an unsolvable problem. No matter how fast you
are, there will always be a chance that your requests will cross a frame
boundary.
In Linux, the USB stack does provide this timing information. When an
isochronous URB completes, its start_frame field is set to the
(micro-)frame number of the first packet. But this field is not
exported to userspace! It's currently available only to kernel
drivers. :-(

It would be simple to change the kernel to copy the start_frame value
back to the userspace caller. But then libusb would need to be changed
to make the information available somehow to the user.

And I don't know whether similar information can be obtained under
other operating systems.

Alan Stern
Chris E
2017-06-19 22:41:00 UTC
Permalink
Thanks, guys. :)

I actually am alternating between endpoints when I submit, and it's still
failing. It seems that libusb_sumbit_transfer() takes more than 133us (on
average) to return on my particular phone/OS combo.
A small probability of crossing a frame boundary is not an unfixable
problem (just detect the desynchronisation and try again), but a 100%
failure rate definitely is.

In libusbK, the start_frame is copied to a structure similar to the
libusb_transfer. This is how I detect the bad connect on Windows.
Of course, best practice would be scheduling future transfers, so that the
possibility of a desynchronised connection would be near-zero, but I
haven't worried because the probability of a bad connection is already so
low and the cost is only 600ms delay to the user on launch.

It's a shame there's no way to handle this software-side, but you're
right. There are ways around this device-side.
It's possible to roll the 6 128-byte endpoints into a single 1023-byte
endpoint. The reason I did not do this is because, under Windows, the
device would hog 100% of the host bandwidth and thus prevent the keyboard
and mouse from working.
On Linux/MacOS under libusb, could this problem be avoided simply by
requesting no more than 768 bytes per frame? Or would libusb reserve the
full 1023 bytes?
Unfortunately, I cannot send the endpoint size to 768 bytes on the device -
only certain powers of two or 1023. The device has two independent
384-byte buffers that need to be dealt with independently, hence the 6
128-byte endpoints. God bless Atmel.
Post by Alan Stern
Post by Tim Roberts
Post by Chris E
I have an application where I need to sample multiple isochronous
endpoints at the same time.
However, I'm running into a problem. On slow machines (mainly Android
phones), it takes more than 1ms to submit all of the transfers.
Is that because you're submitting a slew of requests on endpoint 2, and
then a slew of requests on endpoint 6? Couldn't you alternate the
endpoints when you submit?
Of course, even if he did that, it would still be possible that a frame
boundary could occur at an unexpected time and mess up the
synchronization.
Post by Tim Roberts
Post by Chris E
This leads to a situation where the different endpoints are out of
sync. This means that, for example, while the application may be
reading frames 100-200 on Endpoint 0, I'll be reading frames 101-201
on endpoint 6! This obviously causes errors with the packet ordering
that leads to all kinds of chaos further up the chain.
Are there no timestamps in the data to allow you to synchronize them?
If so, then you may have an unsolvable problem. No matter how fast you
are, there will always be a chance that your requests will cross a frame
boundary.
In Linux, the USB stack does provide this timing information. When an
isochronous URB completes, its start_frame field is set to the
(micro-)frame number of the first packet. But this field is not
exported to userspace! It's currently available only to kernel
drivers. :-(
It would be simple to change the kernel to copy the start_frame value
back to the userspace caller. But then libusb would need to be changed
to make the information available somehow to the user.
And I don't know whether similar information can be obtained under
other operating systems.
Alan Stern
Tim Roberts
2017-06-20 04:28:35 UTC
Permalink
It's a shame there's no way to handle this software-side, but you're right. There are ways around this device-side.
It's possible to roll the 6 128-byte endpoints into a single 1023-byte endpoint. The reason I did not do this is because, under Windows, the device would hog 100% of the host bandwidth and thus prevent the keyboard and mouse from working.
It certainly does not. Where on earth did you read that?

Tim Roberts, ***@probo.com
Providenza & Boekelheide, Inc.
Chris E
2017-06-20 06:04:12 UTC
Permalink
Oh, I did not read it anywhere.
I actually did it. The device would either refuse to connect, or
disconnect other devices as soon as it was connected.
This was using an .inf file that plugged it into LibusbK, not libusb.
Also, this is on a FS host, not HS.
Post by Chris E
Post by Chris E
It's a shame there's no way to handle this software-side, but you're
right. There are ways around this device-side.
Post by Chris E
It's possible to roll the 6 128-byte endpoints into a single 1023-byte
endpoint. The reason I did not do this is because, under Windows, the
device would hog 100% of the host bandwidth and thus prevent the keyboard
and mouse from working.
It certainly does not. Where on earth did you read that?
—
Providenza & Boekelheide, Inc.
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
libusb-devel mailing list
https://lists.sourceforge.net/lists/listinfo/libusb-devel
Alan Stern
2017-06-20 14:17:12 UTC
Permalink
Post by Chris E
Thanks, guys. :)
I actually am alternating between endpoints when I submit, and it's still
failing. It seems that libusb_sumbit_transfer() takes more than 133us (on
average) to return on my particular phone/OS combo.
A small probability of crossing a frame boundary is not an unfixable
problem (just detect the desynchronisation and try again), but a 100%
failure rate definitely is.
In libusbK, the start_frame is copied to a structure similar to the
libusb_transfer. This is how I detect the bad connect on Windows.
Of course, best practice would be scheduling future transfers, so that the
possibility of a desynchronised connection would be near-zero, but I
haven't worried because the probability of a bad connection is already so
low and the cost is only 600ms delay to the user on launch.
It's a shame there's no way to handle this software-side, but you're
right. There are ways around this device-side.
It's possible to roll the 6 128-byte endpoints into a single 1023-byte
endpoint. The reason I did not do this is because, under Windows, the
device would hog 100% of the host bandwidth and thus prevent the keyboard
and mouse from working.
On Linux/MacOS under libusb, could this problem be avoided simply by
requesting no more than 768 bytes per frame? Or would libusb reserve the
full 1023 bytes?
The reservation isn't done by libusb, it is done by the kernel driver.
And yes, the reservation would be for the entire maxpacket size, since
the kernel has no way to know that you will never submit a transfer
request that large.
Post by Chris E
Unfortunately, I cannot send the endpoint size to 768 bytes on the device -
only certain powers of two or 1023.
Why not? The maxpacket size is just a number in the endpoint
descriptor, which is managed by the firmware, isn't it?
Post by Chris E
The device has two independent
384-byte buffers that need to be dealt with independently, hence the 6
128-byte endpoints. God bless Atmel.
What type of USB host controller do you use for your full-speed
connection?

Note that Linux's EHCI driver is deficient when it comes to scheduling
full-speed isochronous endpoints (lying behind a hub). I don't think
it will allow you to schedule transfers with a maxpacket size as large
as 1023 bytes. 768 bytes is probably okay.

Alan Stern
Chris E
2017-06-20 21:42:41 UTC
Permalink
Post by Alan Stern
And yes, the reservation would be for the entire maxpacket size
Bugger. Thanks for the warning, though.
Post by Alan Stern
Post by Chris E
Unfortunately, I cannot send the endpoint size to 768 bytes on the device -
only certain powers of two or 1023.
Why not? The maxpacket size is just a number in the endpoint
descriptor, which is managed by the firmware, isn't it?
In theory, yes. In practice, it's a 3-bit (!!!) value stored in a hardware
register. No way to override it. Your only options are 8, 16, 32, 64 etc..
It's a design quirk of Atmel's Xmega, but wouldn't really ever cause
problems unless you're dealing with more than 512 bytes of isochronous data.
Post by Alan Stern
What type of USB host controller do you use for your full-speed connection?
Ideally it should be able to run on any host. It's a product that will be
shipped to customers.
Post by Alan Stern
Post by Chris E
Thanks, guys. :)
I actually am alternating between endpoints when I submit, and it's still
failing. It seems that libusb_sumbit_transfer() takes more than 133us
(on
Post by Chris E
average) to return on my particular phone/OS combo.
A small probability of crossing a frame boundary is not an unfixable
problem (just detect the desynchronisation and try again), but a 100%
failure rate definitely is.
In libusbK, the start_frame is copied to a structure similar to the
libusb_transfer. This is how I detect the bad connect on Windows.
Of course, best practice would be scheduling future transfers, so that
the
Post by Chris E
possibility of a desynchronised connection would be near-zero, but I
haven't worried because the probability of a bad connection is already so
low and the cost is only 600ms delay to the user on launch.
It's a shame there's no way to handle this software-side, but you're
right. There are ways around this device-side.
It's possible to roll the 6 128-byte endpoints into a single 1023-byte
endpoint. The reason I did not do this is because, under Windows, the
device would hog 100% of the host bandwidth and thus prevent the keyboard
and mouse from working.
On Linux/MacOS under libusb, could this problem be avoided simply by
requesting no more than 768 bytes per frame? Or would libusb reserve the
full 1023 bytes?
The reservation isn't done by libusb, it is done by the kernel driver.
And yes, the reservation would be for the entire maxpacket size, since
the kernel has no way to know that you will never submit a transfer
request that large.
Post by Chris E
Unfortunately, I cannot send the endpoint size to 768 bytes on the
device -
Post by Chris E
only certain powers of two or 1023.
Why not? The maxpacket size is just a number in the endpoint
descriptor, which is managed by the firmware, isn't it?
Post by Chris E
The device has two independent
384-byte buffers that need to be dealt with independently, hence the 6
128-byte endpoints. God bless Atmel.
What type of USB host controller do you use for your full-speed
connection?
Note that Linux's EHCI driver is deficient when it comes to scheduling
full-speed isochronous endpoints (lying behind a hub). I don't think
it will allow you to schedule transfers with a maxpacket size as large
as 1023 bytes. 768 bytes is probably okay.
Alan Stern
Loading...