Author Topic: Experimental UDP code is working!  (Read 19826 times)

0 Members and 1 Guest are viewing this topic.

Offline OFF-meister

  • Regular
  • **
  • Posts: 98
  • Karma: +3/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #15 on: September 01, 2008, 01:18:51 AM »
"I don't think you are talking about the parts within the OFF code because that should be completely compatible with win32"

Yep, right now we are just talking about a standalone build of UDT, once that is done it should be easy to include it in the OFF MSVC6 project for testing.


"One big question is can you compile OFF using cygwin?"

Simple answer: yes.

Complicated answer: deficiencies in the cygwin network libraries were what killed our first scaling push - network connections would lock up inexplicably and for indefinite periods of time causing many a sleepless night before the code was converted to use native win32 and build in MSVC6, which solved all these problems. For this reason the cygwin makefile in CVS is unmaintained. Although I find cygwin invaluable while actually writing code it is currently useless for testing in the wild.

A further issue with cygwin is that it does not seem to handle unicode yet, which is something else we require in release builds.

"As for UDT, I assume you looked at the UDT info about compiling http://udt.sourceforge.net/ , there isn't much....

...Maybe that's what you are talking about with visual studio. And they say "It requires Visual C++ 7.0 or above to compile. If other Windows compilers are used, you may need to create your own Makefile or project files.", so that may be a problem."

Yes, I made a project in MSVC6 and gave up trying to get it to build after a couple of hours clicking around in the menus. It's best to stop at that stage to avoid hardware damage due to flying into an apoplectic rage. I try to minimise my interaction with the MS IDEs.

As I said, their supplied project built fine in VS2005, but we can't use that to build OFF releases. Please don't ask why. :)

It seems best to leave this task to someone 'more motivated'. :)


"I don't think using their dll file is the way to go..."

That's right. Once we have it built and tested, we will most likely link it in statically much as we do with openssl.

It is a sad reality that windows development must be a priority simply due to the fact that most of the people running OFF use windows. We do our best for other platforms though! The rule we try to stick to is that improvements should work on all platforms.
« Last Edit: September 01, 2008, 01:39:00 AM by OFF-meister »

Offline scripter

  • Regular
  • **
  • Posts: 88
  • Karma: +2/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #16 on: September 01, 2008, 04:31:38 AM »
> I try to minimise my interaction with the MS IDEs

Yea, that's why a script that does it all automatically would be the best. If I have to copy/paste commands into a Wine emulated window that might not be too bad.

I am hoping that cygwin will allow a build without any dll's, but you already need one (MSVCP60.DLL), so adding another to the install script and dumping it into the OFF directory shouldn't be a problem.

It looks like "minggw" lets you cross compile but uses a dll at run time. I'm not 100% up on that so the info I read may be wrong.

One other choice is where you build a cross compiler somehow from GCC source code and it does it directly in Linux. That's the way it should be done, then I use wine only to test the .exe file.

Then of course something has to be figured out for building a installer. If that could be done on a command line...

Thanks for the heads up on the unicode thing, wxwidgets has --enable-unicode as a option and I think it's typically on so I don't know what's going to happen yet.

What happens to OFF if you turned off unicode in wxwidgets? Is it used for something specific?

As for the socket problems, it may be possible to use some code that's not a cygwin supplied thing. The problem would be testing, I'm bet it's one of those random problems that shows up after 50 or more people test it on different systems.

I don't understand the difference between VS2005 and MSVC6 or how it does a makefile type of thing, but I think it would be nice to use all open source and cross compile to build this thing someday, even for the mac.

Would it be possible to build the UDT dll with VS2005 and then link it in using MSVC6? Since the UDT code wouldn't change often the dll would stay the same between versions, that might be a way to go. MSVCP60.DLL is from 2004.

And I agree, UDT should work on all supported platforms. The only other last ditch option we have would be to take the important parts of UDT and try to just stick them within the OFF code file, if the license permits. It's all just C++ code and all the problems we are having right now are because of makefiles.

Maybe I should just do that and be done. Put their little notice at the top of where the code starts and cram it all into offcnxn. Then it has to work on mac too, right? Boy, that would be one long file.

Offline scripter

  • Regular
  • **
  • Posts: 88
  • Karma: +2/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #17 on: September 01, 2008, 01:30:25 PM »
This is now working, compiled and tested on two nodes for a short period, I took all the UDT code and stuffed it into two files

udt.h
udt.cpp

Then included them in offcnxn.cpp and it compiles and works with no modifications to the original makefile so it should compile on windows with MSVC6 or whatever compiler. No DLL's or linking this way!

I never got OFF to compile under cygwin properly so I dropped that idea.

It's using V.19.24 right now, I'm working on patching it into the latest version. Watch for another post soon.

The devs in the future will need to watch the UDT site's changelog once and a while. If there are any *security* bug fixes to UDT they need to be made to that section of these files.

I included references to each code section's old file name so it won't be too hard.

I don't think any further improvements to UDT that the original UDT team does to the code should be included in the OFF code unless it's something really, really important.

The code is now part of the OFF code, just like as if I wrote it myself and provided it to the project. The author's credits are included in the source files as per their requests.

The one thing I would like to see stay is the ability for people to just change the config option so users can test this now on windows or linux (UDP only on/off) until the code is finished that makes this work with both UDP and TCP together.

It's experimental and just for testing in special situations, not to split the network (and it doesn't work with any existing nodes anyway).

We need more testing on this with different network setups so any bugs or tweeking can be worked out before the other code is finished that does both UDP and TCP.

Offline scripter

  • Regular
  • **
  • Posts: 88
  • Karma: +2/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #18 on: September 01, 2008, 09:34:35 PM »
The new UDT code is ready and tested for a short time on 3 nodes. See the first post of this thread for more info, I just updated it.

It would help if everyone who has a different style network tested this even for a short period to see how well it works. If you have access to a Wifi mesh, NAT routers inside of NAT routers, or other interesting setups, please give it a try.

If two nodes know their NAT IP (what may be called the "outside" IP) and the port that they have opened for incoming connections, if they both try to connect to each other within 30 seconds it may work, "punching a hole" through both NATs. It may take 3 nodes, one simple one that can answer back so the NAT opens up, then the other NATed node may be able to get through and the 3rd node could go away.

Tech details:

It's patched against V0.19.27 and should patch to .28 OK if there haven't been too many changes in those files.

I am using the global variable again so it's still a UDP only or TCP only setting, user selectable in the config file.

I suggest to just leave it for now until the other support code is finished.

The UDT code is in one big file, I suggest to leave it as-is and spend time on something else because it probably won't change for years. No need to spend time breaking it up into little files.

It's a compromise, everyone gets reliable UDP now, the devs get to go on to more important stuff.

I forgot to mention that the other end has to be in UDP mode too so there's really no chance of splitting the network with this test code.

The "#ifdef UDP_ON" should now be used while developing the code to combine and test UDP and TCP together. It shouldn't turn off this test feature if used properly.

The "IsUDP()" thing caused segfaults first thing so I exchanged them for the global. It's headed in the right direction and so is a lot of the other code, but I couldn't easily figure out why. The global is set once so it doesn't create a thread safe problem. It's easy to find/replace it anyway.

UDT is already thread safe and C++ so no need to wrap it further.

I tried to make it as windows ready as possible, there will be problems I'm sure, and it doesn't look like I'm going to be able to help much since I can't cross compile it yet.

Some todo items:

Need a date stamp in the URL :)

Obvious GUI changes and code to support both UDP and TCP together, let the user choose which one(s) to use.

IP address filters that let you block or allow address ranges, for situations where you want to keep OFF from connecting to the internet and stay inside a LAN or connecting to other subnets on the same LAN network. Or become a in-between node that is like a portal to the internet so only one node is seen going outside (think dorm room, fast LAN or Wifi, slow internet).

A setting with speed levels that changes internal delay settings so that a network could be completely saturated with random block pushes.

A button that sets everything to default settings for a LAN and back again.

Improve the ability to punch through a NAT and stay that way.

Go through all the code, new and old, with some tool that looks for buffer overflow problems just to be safe.
« Last Edit: September 01, 2008, 11:03:33 PM by scripter »

Offline OFF-meister

  • Regular
  • **
  • Posts: 98
  • Karma: +3/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #19 on: September 04, 2008, 11:59:03 AM »
After a few more tries, the MSVC compile errors for UDT have been reduced to a manageable level, but there seem to be no simple solutions for the following (my comments are preceded by '#'):

--------------------Configuration: UDT_win32 - Win32 Debug--------------------
Compiling...
StdAfx.cpp
Compiling...
api.cpp
buffer.cpp
ccc.cpp
channel.cpp
C:off_network\udt4\src\channel.cpp(112) : error C2065: 'addrinfo' : undeclared identifier
C:off_network\udt4\src\channel.cpp(112) : error C2146: syntax error : missing ';' before identifier 'hints'
C:off_network\udt4\src\channel.cpp(112) : error C2065: 'hints' : undeclared identifier
C:off_network\udt4\src\channel.cpp(113) : error C2065: 'res' : undeclared identifier
C:off_network\udt4\src\channel.cpp(113) : warning C4552: '*' : operator has no effect; expected operator with side-effect
C:off_network\udt4\src\channel.cpp(115) : error C2027: use of undefined type 'addrinfo'
        C:off_network\udt4\src\channel.cpp(115) : see declaration of 'addrinfo'
C:off_network\udt4\src\channel.cpp(117) : error C2228: left of '.ai_flags' must have class/struct/union type
C:off_network\udt4\src\channel.cpp(117) : error C2065: 'AI_PASSIVE' : undeclared identifier
C:off_network\udt4\src\channel.cpp(118) : error C2228: left of '.ai_family' must have class/struct/union type
C:off_network\udt4\src\channel.cpp(119) : error C2228: left of '.ai_socktype' must have class/struct/union type
C:off_network\udt4\src\channel.cpp(121) : error C2065: 'getaddrinfo' : undeclared identifier
C:off_network\udt4\src\channel.cpp(124) : error C2227: left of '->ai_addr' must point to class/struct/union
C:off_network\udt4\src\channel.cpp(124) : error C2227: left of '->ai_addrlen' must point to class/struct/union
C:off_network\udt4\src\channel.cpp(127) : error C2065: 'freeaddrinfo' : undeclared identifier

#all the references to the addrinfo struct and associated functions are claimed to be in ws2tcpip.h, but are not present in the version of that file that shipped with msvc6. It may be necessary to figure out what this code does and rewrite it to use available methods.

common.cpp
control.cpp
core.cpp
C:off_network\udt4\src\core.cpp(2058) : error C2065: 'getnameinfo' : undeclared identifier
C:off_network\udt4\src\core.cpp(2058) : error C2065: 'NI_NUMERICHOST' : undeclared identifier
C:off_network\udt4\src\core.cpp(2058) : error C2065: 'NI_NUMERICSERV' : undeclared identifier

# same as above

list.cpp
md5.cpp
packet.cpp
queue.cpp
C:off_network\udt4\src\queue.cpp(935) : error C2664: 'retrieve' : cannot convert parameter 2 from '__int32' to 'int &'
        A reference that is not to 'const' cannot be bound to a non-lvalue
        Conversion loses qualifiers

#this seems to be a type error and may be hackable to work after the addrinfo issue is dealt with.

Generating Code...
Error executing cl.exe.

UDT_win32.lib - 18 error(s), 1 warning(s)

-----------------------------------------------------------------

I don't suppose you have any idea how to deal with these errors? The addrinfo one is clearly the most serious as those methods do not seem to be available in MSVC6.


Offline scripter

  • Regular
  • **
  • Posts: 88
  • Karma: +2/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #20 on: September 04, 2008, 05:56:00 PM »
Off the top of my head, whenever you get into this sort of problem, go look in the VS2005 libraries or even the Linux ones and steal what you need in little bits.

Make sure you put these additions into the code in some way so that other people can compile from a standard install of MSVC. And whatever settings you had to use for MSVC.

I hope you see that I recently took the whole UDT code and placed it into two files to hopefully make this simpler. The full patch for V19.28 and mushing it all together with your new changes are posted to SF. Just remember that you have to re-compile offcnxn.cxx to get the udt stuff to re-compile. On Linux you just make it think you changed something.

Google says.... (with keywords "ws2tcpip.h addrinfo")

typedef struct addrinfo {
  int ai_flags;
  int ai_family;
  int ai_socktype;
  int ai_protocol;
  size_t ai_addrlen;
  char *ai_canonname;
  struct sockaddr *ai_addr;
  struct addrinfo *ai_next;
} ADDRINFOA,
 *PADDRINFOA;

"The addrinfo structure is used by the getaddrinfo function to hold host address information"
http://msdn.microsoft.com/en-us/library/ms737530(VS.85).aspx

I think the above will just work since they just fill in stuff after zeroing the whole struct.

I think that header is the same for Linux, but I didn't go looking.

But... " 'getaddrinfo' : undeclared identifier" shouldn't happen, it should be included in #include <winsock2.h> file, it should be in there so somethings wrong here.
http://msdn.microsoft.com/en-us/library/ms738519(VS.85).aspx

Are you sure you are defining WIN32 along with your normal define of _WIN32 ?

With the new patch I define it at the top of the file and don't have to worry about it anymore.

Oops.. I see that I forgot to define it in udt.h

You know what, just change the top part of offcnxn.cxx where it defines stuff for windows

#if defined(_WIN32)
  #define WIN32
  #include <winsock2.h>
#else

That should carry into both udt.h and udt.cxx

« Last Edit: September 04, 2008, 06:19:10 PM by scripter »

Offline scripter

  • Regular
  • **
  • Posts: 88
  • Karma: +2/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #21 on: September 07, 2008, 08:00:09 AM »
From a number of things I've read online the problem may be the age of the MSVC6 libraries. It might help to take the VS2005 files for winsock2 and ws2tcpip.

It also looks like the winsock2.h call should come before windows.h, and ws2tcpip.h is suppose to be called from within winsock2.h so I don't understand why it's in the source files.

Are you sure it's a good idea to be using all those other old libraries and a old compiler?

I am still thinking it would be better to use open source tools somehow so that all the code used and the compiler are up to date.

I know you mentioned some bugs, is there a chance that those have been fixed by now?

Offline OFF-meister

  • Regular
  • **
  • Posts: 98
  • Karma: +3/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #22 on: September 07, 2008, 09:23:21 PM »
We have not forgotten your patch! But until we have a working cross-platform build we can't really include it. Once this happens it won't be too hard to include the code by comparing the patched and unpatched v19.27.

So we almost upgraded the windows compiler a few months ago, but this was aborted due to the reappearance of several nasty gui glitches which were solved in MSVC6 by loading all images at runtime. The worst of these is the download progress bars vanishing when the list items are selected - this also seems to occur when a manifest is used with the current release builds. MSVC6 also seems to produce much leaner and more efficient code than VS2005 (which is the other candidate), although this may be a settings problem. Despite our best efforts we were unable to solve these problems, so we are stuck with MSVC6 for the foreseeable future.

If you could fix these things, no-one would be happier than I. We could then update the compiler and this would solve the UDT build problem - it's all connected, you see. :)

Anyway, I am sorry for not being able to reply more often here, but time is short and there is much to be done by only a few of us. I guess you know where you can get more detailed/current information... :)

Offline OFF-meister

  • Regular
  • **
  • Posts: 98
  • Karma: +3/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #23 on: September 08, 2008, 10:01:38 PM »
Well, we have good news and we have bad news.

The good news is that it seems possible with a bit of work to hack UDT4 into building and include it in OFF.

The bad news is that its performance appears to leave a great deal to be desired. In tests it managed to reach about 15 daemon nodes before having serious problems. At this stage the RAM use was roughly 35M per instance of OFF and page faults were about 300000 per instance, rising at around 50000 per minute. CPU use was at 100%. This was after about 5 minutes of the network *idling*. Compare this with a non-UDT build in a network of 42 daemon nodes, RAM use is 8M per instance, pagefaults at ~2000 and holding indefinitely, CPU use: 0%.

This is in addition to a few random internal segfaults, 4 extra threads
and the fact that several of it's operations seem to be unconfigurably blocking (notably, the connect).

However it does seem to function if there are less than 5 nodes, apart from the increased resource usage - which is not really acceptable.

We will probably allow the current UDT code to be in CVS for a few versions, just in case all is not lost and to allow you to play with it a bit. I think you mentioned it was quite configurable - maybe these issues can be addressed? However, on the face of it, UDT appears to be highly inappropriate for our needs.

Expect this code to appear in CVS some time over the next couple of days, but the UDT part may disappear whenever it gets in the way of something important.

Offline scripter

  • Regular
  • **
  • Posts: 88
  • Karma: +2/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #24 on: September 09, 2008, 12:36:01 AM »
> the download progress bars vanishing when the list items are selected - this also seems to occur when a manifest is used with the current release builds.

It's hard to believe that the compiler has anything to do with this, sounds like a problem in wxwidgets. It's possible that one compiler ignores certain things while the other doesn't, producing different code. Or it may just be some setting for the compiler, like some sort of "strict" thing.

Is it possible that it's a problem with the older version of wxwidgets? If they fixed it, it might be possible to take that code and put it into the code you are using so that it's not a big project.

I'm sorry it's taking more of your time, that wasn't my intention. I didn't realize that different windows compilers had so many differences, and the fact that they are both from MS.

Is there any chance of using a borland compiler? At least that is closer to using a open source compiler, in my opinion. I think we both agree that if you could just run a script and have it compile both versions automatically (without windows anything) that would be the best.

> I am sorry for not being able to reply more often here

The forums worked as they should, you come here when you get a break, it's not a problem.

As for the UDT problems at 15 nodes, thanks for testing it so throughly and it shows that leaving it in a test setting is a good idea for now.

35M? Or what you are saying is 27M more memory to support this. All that it should need is a small list of IPs so it can match it to a particular connection and a little more for thread support etc.

It's possible that there's some sort of statistics code enabled, and it's keeping that data in RAM in case it's called for (and that might become useful for testing, but turn it off in release).

Does this mean it compiled in MSVC? And what incarnation, the single file one or the directory one?

I'm going to do some more research on this so watch for more postings.

I started working on the proposed freenet interface, trying to use the same methods that are used for the loopback server but on port 23401. It's mainly going to be a pass through for http connections, only modifying the http stream when it sees a OFF link go back to the user, making it clickable. It would be good to hear what you think about it. I posted info in the freenet section here.

It would also be good to hear your opinions on putting a date stamp into the OFF URL. I could pull that out and send it to the user when I do the freenet conversion thing. I'm planning on pulling the filename and size from the URL so it makes a nice looking html page using only the info from a simple list of links.

Offline scripter

  • Regular
  • **
  • Posts: 88
  • Karma: +2/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #25 on: September 09, 2008, 03:26:48 AM »
There are some easily set options in UDT, see http://www.cs.uic.edu/~ygu1/doc/opt.htm

When testing, I found that UDT_SNDBUF was 12058624, UDP_SNDBUF was 12288000 which is a bit much so I set them to 500000 (500KB).

These are *upper limits*, they were set high because UDT is normally used on very high speed networks.

It's really up to you what it's set to, I just wouldn't put it too low.

It didn't make any difference when I tested it because UDT only uses what it needs for the buffer, and we quickly take the data and use it, so it shouldn't reach the limits unless a node is really overworked.

A bigger buffer may be needed for LAN sharing, and this could become a setting that's changed when someone picks LAN mode, at this point I don't know.

The code change in "set_socket_options()" is below. Please include it. (it really doesn't fix the above problems you posted, but it's good to have)

Looking at my system, OFF uses 193m with a 7GB cache loaded and 168m with a 500MB cache. Java, wonderful bloated Java... 501m, firefox with 6 windows open 171m.

So I don't think 27M is much to worry about but I will spend a little more time looking into this.

For the statistics stuff, it's at http://www.cs.uic.edu/~ygu1/doc/structure.htm if you want to see the values, but it's just a struct with a bunch of int's so that's not taking much memory.

Now, the BIG problem... Blocking. It's a problem and it's something I didn't catch at first, sorry.

UDT docs say "UDT does not allow non-blocking operation on connection setup and close", probably not good for P2P situations.
http://www.cs.uic.edu/~ygu1/doc/t-config.htm

It could just be some timeout setting. Is there any chance you know exactly when it blocks? Like at what UDP packet during the first connection?

It could also be the way I tied it into the OFF code, I don't know yet.

I'll just say that worse case it should be possible to call their code in a way to make it non blocking. I'm not sure how TCP does it's thing so it is non blocking at connect, but maybe UDT isn't handling it that way or UDP connections always have to block since you have to wait for the other node to do something.

Again, I'll have to look into it more, if you have any suggestions let me know.

As for the segfaults and page faults, was that on a windows system? I had to ask :)

Did it only start happening at the 15 node limit problem? I haven't seen any on my testing on Linux with three nodes running and transferring blocks for hours on end.

Code: [Select]
  if (G_offcnxn_use_udp) { // UDP set send and rcv to non blocking
    bool block = false;
    int udtbuf = 500000; // 500KB buffer for each UDP connection
    if (UDT::setsockopt(sockfd, 0, UDT_SNDSYN, &block, sizeof(bool)) !=0 ||
        UDT::setsockopt(sockfd, 0, UDT_RCVSYN, &block, sizeof(bool)) !=0) {
      string last_err=UDT::getlasterror().getErrorMessage();
      OFFLog(" OFFCnxnBase::set_socket_options(): ERROR sockfd: %i",sockfd);
      OFFLog(" OFFCnxnBase::set_socket_options(): last error  : %s\n",last_err.c_str());
    }
    if (UDT::setsockopt(sockfd, 0, UDT_SNDBUF, &udtbuf, sizeof(int)) !=0 ||
        UDT::setsockopt(sockfd, 0, UDT_RCVBUF, &udtbuf, sizeof(int)) !=0) {
      string last_err=UDT::getlasterror().getErrorMessage();
      OFFLog(" OFFCnxnBase::set_socket_options(): ERROR sockfd: %i",sockfd);
      OFFLog(" OFFCnxnBase::set_socket_options(): last error  : %s\n",last_err.c_str());
    }
    if (UDT::setsockopt(sockfd, 0, UDP_SNDBUF, &udtbuf, sizeof(int)) !=0 ||
        UDT::setsockopt(sockfd, 0, UDP_RCVBUF, &udtbuf, sizeof(int)) !=0) {
      string last_err=UDT::getlasterror().getErrorMessage();
      OFFLog(" OFFCnxnBase::set_socket_options(): ERROR sockfd: %i",sockfd);
      OFFLog(" OFFCnxnBase::set_socket_options(): last error  : %s\n",last_err.c_str());
    }
  return 1;
  }


Offline scripter

  • Regular
  • **
  • Posts: 88
  • Karma: +2/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #26 on: September 10, 2008, 03:06:04 AM »
Wrote a little script so I can easily test more nodes on the local machine.
http://board.planetpeer.de/index.php/topic,5267.0.html

I tested 20 nodes with and without UDP on (via the global config setting), no segfaults, no problems, run was about 30 minutes. On Ubuntu Linux "hardy". CPU usage was just about the same for TCP or UDP but never 100%. Memory usage was higher for UDP.

The setup used 0.19.27 with the single file UDT version (because it was there already), one node with a lot of blocks, the others had none. They all connected together, 20 nodes with each having about 20 nodes on their list, and were pushing blocks around to each other. I could see the printouts for each node in it's own terminal. I used no throttling.

I ran 18 nodes in UDP mode for 1 hour with no segfaults.

I ran 18 nodes again under gdb for 30 or so minutes and looked through each terminal looking for any gdb errors, all I got was new thread notices at startup. If there's a special gdb setting I should use let me know.

The memory usage was up about 30MB over TCP on each node but did not increase much as each new node connected, it looks like it happens when the first server port is open, I'm still trying to figure that out. It's going to take some time.

Blocking wasn't obvious, like the GUI stopping, I'm not sure how to detect that (do tell) but we know it's happening, going through the UDT code it looks like it asks the other node what its settings are and waits for a time then returns. It may be possible to always have the same settings for all OFF nodes and then the handshake won't be necessary and it won't block, hopefully.

The blocking on close may be caused by it wanting to send out the rest of the buffer. It may be possible to control that in other code, like the TCP code that checks to see if the buffer is empty before closing, I didn't tie into that to save some time and get the code out. I'm sure that can be solved.

Each connection should be in it's own thread anyway, from what I saw on on gdb it isn't. Could that be the slow (local) speed problem? I didn't realize it's actually polling each connection one by one, correct me if I'm wrong.

So at this point I'm confused about your tests having segfault problems. Any chance it used up all available memory?

EDIT: On the memory issue, if I open the server port and then close it, then open it again the ram usage vs. TCP is about 10M more instead of the 30M. The reason is that when you make the call to change the buffer size it looks like it really doesn't resize the existing buffer and just changes the variables. If you close the socket the buffer is gone, then next time it uses the new values to open a buffer.

So the solution is that we set the default values to what we want in the UDT code itself so it uses those values first thing. I just have to figure out which ones to set.
« Last Edit: September 10, 2008, 06:16:09 AM by scripter »

Offline scripter

  • Regular
  • **
  • Posts: 88
  • Karma: +2/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #27 on: September 12, 2008, 12:37:30 AM »
Update on where I'm at.

If you want to throw the baby out with the bath water, go ahead. I think this code is usable with some changes.

Unlike TCP with UDP we use the same buffer and socket for everything. So the initial setup of that socket is all we get.

If we are expecting to get 200 connections for a normal OFF node, then we need to open a buffer first thing to handle that. Remember there's a UDP socket buffer and also a internal UDT buffer that is used for buffering each "virtual" connection (it's sort of virtual because we are using the same UDP socket and simulating that it's more than one). Once the UDP buffer is set and the socket is open, it's set. But it doesn't have to be that big I think because the other buffer is used.

So some simple math 200 * 50K = 10M. So we should open a 10M buffer so each connection gets it's own 50K buffer.

Some code could be made to stretch the internal UDT buffer as new connections are made, but that code isn't written yet according to one of the UDT developers. It's not a big problem and I may find how to do that later.

Since variables in the code keep track of how much buffer it uses, stretching and reducing it shouldn't be a problem as long as it isn't fragmented.

On top of that, 2GB of ram is now about $20. I don't think people are worried much about a extra 10M being used during P2P operations. Maybe if it was 1985 or something. Lots of people run Java to use a lot of popular P2P programs, that adds 500M on my system.

Bottom line is I'm not going to sweat over a extra 10M, I will make the code set the buffer up at the start at a level we can live with and move on to more important things.

As for blocking. It looks like I can set the variables up so that no "handshake" is needed at connect(). UDT sends a random sequence number to start the connection and waited for the other end to send one back and that's not really necessary. That can be fixed without too much trouble, but the UDT developers probably won't want it that way due to the trouble it will cause etc..

So the main UDT base probably won't change just for us.

For now it times out in 3 seconds, according to a UDT developer.

What I'm still not sure of is why this is such a big problem that we should drop this code completely and go for a year or two of part time development to re-create it from scratch. If there's a better piece of code for reliable UDP out there, let's jump on it. I'm not attached to UDT in any particular way but it looks like the best we can get for this.

If each connection is in it's own thread in the OFF code, then it should be able to block forever and not effect the other code. I thought that was the way things were. I have to spend more time on researching that.

Unfortunately, trying to keep this code as a simple module that we can just plug in and get updates directly from the UDT people and just plug them in is a pipe dream. Sorry.

I think the way to go is to insert this code and call it "ours" from now on. Meaning that we keep this code in OFF and modify it as we need to. When and if security changes are made to the main UDT tree, we copy them line by line and include them in the OFF code. That's just the way it has to go.

I am also of the opinion that the single, large file for the UDT code is the way to go because it will make compiling on other platforms and compilers easier, no special makefile changes needed. If there are really good reasons not to do this, I'm listening.

I need to know what changes were made to UDT if any to make it compile on MSVC6 so I can include that in the single udt.cxx file.

As for page faults, UDT uses "new" and "delete" to open up memory etc.. Is there something being created that isn't being deleted?

I don't mind tracking down the problem, I just need to konw, exactly, in detail please, and under Linux how to set things up so I can see this for myself and hopefully use gdb or something to track this down.


Offline scripter

  • Regular
  • **
  • Posts: 88
  • Karma: +2/-0
    • View Profile
Re: Experimental UDP code is working!
« Reply #28 on: September 17, 2008, 08:36:08 AM »
After studying the UDT code more I can see that the difference vs. TCP is that you normally call select() to see if the TCP connection is complete, but UDT doesn't do that.

UDT needs to do a little handshaking, like TCP, it gives the other end a random sequence number to start with which basically helps put out of order packets back together.

That handshaking can be completed with the further calls to select() instead of doing it in connect(). The OFF code at that point can also do a timeout check.

So I'm headed in that direction and created some code that completes the handshake so there's no blocking code. It may not work, but what the heck, I'm headed that way for now.

On the close() call, it shouldn't be hard to check to make sure the UDT (virtual socket) buffer is empty before calling the actual close. With serial communications this is a normal operation, you always check to see if all the data is sent out before closing.