PDC: Shared Bytes, Private Bytes and Fixups
I attended both of Rico Mariani's CLR performance talks and I thought they were great. Actually, my biggest complaint was that they were both only 45 minutes long. I left wanting more...
One of the things he talked about a lot was private bytes, shared bytes and fixups. He went over them in his presentation but I needed to catch up with him to ensure that I was getting the whole picture. Here is what I found:
Shared Bytes vs Private Bytes
A shared byte is one that can be shared across multiple processes. A private byte cannot. So what bytes qualify as shareable? Unaltered pages of a dll where the backing file for that dll is not the page file but the dll itself.
Anytime a page of code or data from a dll needs to change for a particular process, it is marked dirty and becomes private. This can occur on pages that have references to objects on the heap, pages that have code offsets that get modified after load time, or on pages in the data segment that change.
A quick way to have almost all of your dll marked private is to have it rebased during load time.
This concept mainly applies to NGEN'd images as non-NGEN'd images mainly consist of IL that will be JIT compiled on the loader heap and therefore will all be private. The IL is shared, but it is discarded after JIT anyhow.
It's important to realize that the benefit of shared bytes are just that: that they can be shared across multiple processes and their cost can be amortized across the number of processes using those bytes. For assemblies like the .Net Framework assemblies, having shared bytes is a clear benefit. That said, in non-sharing cases, optimizing for shared bytes may sub-optimize your code so caveat emptor.
The term "fixups" refers to "fixing up" an address in the code to a new address. Since this modifies a page for a given process, any page that has a fixup, also becomes private.
Reducing Private Bytes
The following are some ways you can reduce the number of private bytes:
- put code that will be private together such that the number of pages that need to be marked private is decreased
- prevent rebasing
- fix addresses so that they don't become a fixup
String Freezing and Hard Binding
String Freezing and Hard Binding are both great examples of how the concepts above are being applied.
In .Net 2.0, you can mark an assembly such that it will freeze its string literals. What that means is that all of the strings in that assembly will be put in a separate segment (during NGEN) so that all of the references to string literals will not require a fixup.
The reason strings need fixups is because the literals need to be wrapped in a string instance and the code then points to those instances.
With string freezing, there is a benefit in that the string isn't duplicated in the literal and the string instance as well as the reduction in private pages. Note as well that those string instances are interned (with opt in/opt out in whibey see the article here) to avoid duplication.
The downside of string freezing is that such an assembly cannot be unloaded -- because the reference to that string now resides in a segment of that dll instead of on the heap and code in other assemblies may be depending on that reference.
Hard binding refers to another new NGEN feature where NGEN assumes that a reference from one assembly to another is always going to be there. That allows NGEN to hard code offsets from one assembly to another reducing the need for load time fixups.
The downside is that any assemblies that are hard bound will be loaded at the same time as the referencing assembly. This is in order to guarantee that the desired load addresses are gotten.
One of the things Rico mentioned in his talk at the PDC, was that there have been a lot of improvements in the area of shared bytes and working set in Whidbey. That will be really helpful for the performance of your .Net applications.
Looking forward to Whidbey. Thanks Rico for taking the time to explain this to me.