TOPIC: String memory management
#573
String memory management 3 Years, 7 Months ago
I have a question on Suneido memory management, related to strings.
Let's suppose we read the contents of a file with:

stringBuffer = GetFile("aFile")

Then if after we do:

stringBuffer = GetFile("anotherFile")

will previous stringBuffer be deallocated? Or will it remains allocated? If that's the case, then how can I do a GetFile in a for cycle, without incur soon into memory run out? For example:

Code:

for f in Dir()
  {
  stringBuffer = GetFile(f)
  // do some computations with the stringBuffer data:
  ...
  ...
  }



If I have 1000 files in my dir (each one about 1MB in size) what would happen with the above code?
 
 
Mauro
 
#574
Re:String memory management 3 Years, 7 Months ago
Suneido uses what is called "conservative" garbage collection. Because C++ does not have support for garbage collection it is difficult to know where every pointer is. Instead, conservative garbage collection just looks at all of the memory in use and if it finds something that could be a pointer to heap memory, then it assumes that memory is still in use.

The advantage is that no special support from the language (C++) is needed. The drawback is that some heap memory may be kept alive by "false" pointers - either random data that looks like a pointer, or an old pointer that is left behind in some memory.

Memory is not freed until the garbage collection runs periodically. This may not necessarily happen for each loop in your example. (Reference counting, another alternative, frees memory immediately (unless it is deferred reference counting) but it does not handle "cycles" of objects and it has more overhead.)

Your loop should run fine. However, it is possible in some rare odd cases that the memory will not be freed properly and your memory usage will be higher than it should have been.

In the majority of cases you can simply assume that memory will be freed when it is no longer referenced anywhere.

Originally I wrote my own conservative garbage collector for Suneido but later switched to the Boehm-Demers-Weiser collector.
 
 
andrew
 
#578
Re:String memory management 3 Years, 7 Months ago
Thank you for the answer. Now another question, always related to the strings: if I read a 1MB file into a variable, with something like:

stringBuffer = GetFile("myFile")

then, if I wanted to change some bytes in this file and rewrite it on disk, should I create another copy of stringBuffer? Only to change one or two bytes? (let's suppose that we don't know the position of the bytes to change, before we read the entire file)

Is there a way to change directly the contents of a string? I would like to do something as:

Code:


foo = String("test")
foo[0] = 'b'

foo
  => "best"



Post edited by: Mauro, at: 2007/01/23 00:44
 
 
Mauro
 
#579
Re:String memory management 3 Years, 7 Months ago
Immutable strings are better in many ways but they are not very efficient when you want to modify a large string.

But Suneido is quite smart in how it handles this problem. For example:

Code:

s = "x".Repeat(1000000)
 t = s.Substr(0, 500000) $ 'v' $ s.Substr(500000)



In Suneido this only uses 1,000,000 bytes of memory (not 2,000,000) because t will actually reference the same memory as s. This "sharing" is possible because strings are immutable. If you could actually modify s then you would not be able to share (at least not without undesired results).

In most languages, building a large string by concatenating many small ones is very inefficient because each concatenation re-allocates a new copy of the string which gets more costly as the string gets larger. Suneido avoids this by building the string up as a linked list. Other languages get around this by providing additional data types like Java's StringBuffer, but that means the programmer has to learn when to use them.

One drawback of immutable strings is that you can't say something like s[5] = 'w' and you have to write s = s.Replace(...) instead of just s.Replace(...) because operations like Replace do not modify the original string, they make a copy.

One "weakness" in Suneido's string implementation is that the only operations that take advantage of sharing are Substr and concatenation ($). Almost all other operations will "flatten" strings that are made up of pieces of other strings into a normal contiguous unshared string. So if, for example, you wrote t out to a file this would cause it to take another 1,000,000 bytes of memory.

A lot of this "flattening" could be avoided by making the implementation smarter. However, there are trade offs between space and speed. A flattened, contiguous string is much faster to process for some operations than a linked list of pieces.

Another potential drawback of sharing is the following:

Code:

s = 'x'.Repeat(1000000)
 s = s.Substr(1000, 1000)



at this point, only 1000 bytes of memory is needed, but because of sharing, s will actually be referencing a piece of the original 1,000,000 byte string, keeping it from being garbage collected. In practice this usually is not a big problem. Suneido avoids it to some extent by not sharing if the piece is less than a certain size.

Again, most of the time you can forget about all these internal details and most things will just work.
 
 
andrew
 
#580
Re:String memory management 3 Years, 7 Months ago
Immutable strings are better in many ways but they are not very efficient when you want to modify a large string.

But Suneido is quite smart in how it handles this problem. For example:

Code:

s = "x".Repeat(1000000)
 t = s.Substr(0, 500000) $ 'v' $ s.Substr(500000)



In Suneido this only uses 1,000,000 bytes of memory (not 2,000,000) because t will actually reference the same memory as s. This "sharing" is possible because strings are immutable. If you could actually modify s then you would not be able to share (at least not without undesired results).

In most languages, building a large string by concatenating many small ones is very inefficient because each concatenation re-allocates a new copy of the string which gets more costly as the string gets larger. Suneido avoids this by building the string up as a linked list. Other languages get around this by providing additional data types like Java's StringBuffer, but that means the programmer has to learn when to use them.

One drawback of immutable strings is that you can't say something like s[5] = 'w' and you have to write s = s.Replace(...) instead of just s.Replace(...) because operations like Replace do not modify the original string, they make a copy.

One "weakness" in Suneido's string implementation is that the only operations that take advantage of sharing are Substr and concatenation ($). Almost all other operations will "flatten" strings that are made up of pieces of other strings into a normal contiguous unshared string. So if, for example, you wrote t out to a file this would cause it to take another 1,000,000 bytes of memory.

A lot of this "flattening" could be avoided by making the implementation smarter. However, there are trade offs between space and speed. A flattened, contiguous string is much faster to process for some operations than a linked list of pieces.

Another potential drawback of sharing is the following:

Code:

s = 'x'.Repeat(1000000)
 s = s.Substr(1000, 1000)



at this point, only 1000 bytes of memory is needed, but because of sharing, s will actually be referencing a piece of the original 1,000,000 byte string, keeping it from being garbage collected. In practice this usually is not a big problem. Suneido avoids it to some extent by not sharing if the piece is less than a certain size.

Again, most of the time you can forget about all these internal details and most things will just work.
 
 
andrew
 
#581
Re:String memory management 3 Years, 7 Months ago
Hi, I think there could be something wrong... When I do in the WorkSpace:

s = 'x'.Repeat(1000000)

the memory used by suneido.exe in task manager goes from 4MB up to 20MB... Why?
 
 
Mauro
 
#582
Re:String memory management 3 Years, 7 Months ago
It is very hard to relate the memory reported by the Task Manager to what you are doing in Suneido. For example, Suneido uses virtual memory space for various things. Often, much of these areas are never used and so do not actually take up any physical memory, but they can still be reported as part of the memory usage. Also, the memory management will expand the heap in large jumps, but again not all of this space is physically used.

You can get a better idea of the heap size within Suneido using MemoryArena(), but even this will not be very "predictable" depending on what is happening with garbage collection.

This probably is not very helpful to you, but hopefully it might explain the results you are getting.

Another test is to run your code multiple times. At first the memory will increase, but it should stop growing after a few times. Hopefully :-)
 
 
andrew
 
#583
Re:String memory management 3 Years, 7 Months ago
I think there could be some issue with the WorkSpace, because if I do following instruction multiple times:

s = 'x'.Repeat(1000000)

then the memory will grow fast and the WorkSpace window will be a little slow in updating its contents. If you clear the 'output' subwindow, then WorkSpace will be no more slow.

Instead, if you do multiple times:

s = 'x'.Repeat(1000000)
Print("")

then the memory will grow up to a certain point (about 30MB on my system) and then it doesn't grow anymore (or grows very little). I think this happens because, in this last case, the output subwindow doesn't have to show the result of 's = ...' statement at each execution (thanks to the Print("") statement).

Anyway, I have noticed that if you minimize the WorkSpace window, then the memory occupation decreases very much, as if the garbage collector worked at its fullest only when you minimize the window.
 
 
Mauro
 
#589
Re:String memory management 3 Years, 7 Months ago
Yes, displaying huge results in the WorkSpace output pane will use a lot of memory. You can avoid displaying results by adding an empty statement:

s = 'x'.Repeat(1000000);;

Note: The WorkSpace variables pane will also keep a reference to the result which will prevent it being freed.

I am not sure what minimizing the WorkSpace would do. I can not think of why that would affect memory or garbage collection. The only thing I can think of is that less Windows messages will be sent to the window if it is minimized.
 
 
andrew
 
#591
Re:String memory management 3 Years, 7 Months ago
Nice trick that of ';;' in the WorkSpace... :)
 
 
Mauro