首页 > 其他 > 详细

HotSpotVM 字符串实现浅析#1

时间:2015-04-19 14:42:14      阅读:258      评论:0      收藏:0      [点我收藏+]

今天来看下,HotSpotVM里面字符串实现相关的一些东东。来看这样一个问题,下面的栗子中,在HotSpotVM里面会保存几份Hello,World.这个字符串?

    public static void main(String[] args) throws Throwable {
        String s1 = "Hello,World.";
        String s2 = "Hello,"+"World.";
        StringBuilder sb = new StringBuilder();
        sb.append("Hello,").append("World.");
        String s3 = sb.toString();
        String s4 = sb.toString().intern();
        System.out.println("s1 == s2 #" + (s1==s2));
        System.out.println("s1 == s3 #" + (s1==s3));
        System.out.println("s1 == s4 #" + (s1==s4));
        System.in.read();
    }

先给出结论,4份(似乎有点浪费:),至于这4份都在什么地方,下面就来看看。在那之前先来看看示例的输出,

s1 == s2 #true
s1 == s3 #false
s1 == s4 #true

==实际上对比的是oop所指向的地址,s1s2s4指向的都是同一个地址。s2已经被编译器优化了,来看下编译后的字节码,

    Code:
      stack=4, locals=6, args_size=1
         0: ldc           #2                  // String Hello,World.
         2: astore_1      
         3: ldc           #2                  // String Hello,World.
         5: astore_2      
    ...
Constant pool:
   #1 = Methodref          #19.#47        //  java/lang/Object."<init>":()V
   #2 = String             #48            //  Hello,World.
   #3 = Class              #49            //  java/lang/StringBuilder
   #4 = Methodref          #3.#47         //  java/lang/StringBuilder."<init>":()V
   #5 = String             #50            //  Hello,
   #6 = Methodref          #3.#51         //  java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   #7 = String             #52            //  World.
   #8 = Methodref          #3.#53         //  java/lang/StringBuilder.toString:()Ljava/lang/String;
   #9 = Methodref          #54.#55        //  java/lang/String.intern:()Ljava/lang/String;
...

编译器已经帮我们执行了+运算。

Stack Memory

我们可以通过HSDB来看看几个变量所指向的地址,

技术分享

技术分享

0x0000000002aced580x0000000002aced78(按字节编址)存放的分别是s4s3sbs2s1。可以看到s1s2s4指向的都是0x00000007d619bc90这个地址,所以用==来比较它们返回的都是true。我们来看看这个地址上存放的是啥,

hsdb> inspect 0x00000007d619bc90
instance of "Hello,World." @ 0x00000007d619bc90 @ 0x00000007d619bc90 (size = 24)
_mark: 1
value: [C @ 0x00000007d619bca8 Oop for [C @0x00000007d619bca8
hash: 0
hash32: 0

看不懂这个输出的可以先看看之前这篇分析HotSpot对象机制的文章。这个版本的HSDB貌似有bug,没有输出_metadata这段,忽略先。

该地址上是个String实例,valuehashhash32String对象的实例字段,上面输出的自然就是该实例的数据,来看下value字段,

hsdb> inspect 0x00000007d619bca8
instance of [C @ 0x00000007d619bca8 @ 0x00000007d619bca8 (size = 40)
_mark: 1
0: ‘H‘
1: ‘e‘
2: ‘l‘
3: ‘l‘
4: ‘o‘
5: ‘,‘
6: ‘W‘
7: ‘o‘
8: ‘r‘
9: ‘l‘
10: ‘d‘
11: ‘.‘

OK,第1份Hello,World.出现了。这份数据所在的地址0x00000007d619bca8是在eden区,

hsdb> universe
Heap Parameters:
ParallelScavengeHeap [ 
    PSYoungGen [ 
            eden =  [0x00000007d6000000,0x00000007d628f8f8,0x00000007d8000000] , 
            from =  [0x00000007d8500000,0x00000007d8500000,0x00000007d8a00000] , 
            to =  [0x00000007d8000000,0x00000007d8000000,0x00000007d8500000]  
    ] 
    PSOldGen [  [0x0000000782000000,0x0000000782000000,0x0000000787400000]  ] 
    PSPermGen [  [0x000000077ce00000,0x000000077d103078,0x000000077e300000]  ]  
]

接下来分别inspect下其他两个变量,也就是sbs3所指向的地址,sb所指向的0x00000007d619bf60

技术分享

s3所指向的0x00000007d619c018

技术分享

妥妥的我们在上面又看到了两份Hello,World.数据了。在找到第4份数据之前,我们先来看看s2s3s4的区别。s2前面已经说过了,编译器做了优化,至于运行时HotSpotVM是如何赋予同一个oop的,暂时先不管以后再研究。s3呢,看下StringBuilder#toString方法

    public String toString() {
        // Create a copy, don‘t share the array
        return new String(value, 0, count);
    }

直接new了一个String出来,所以不会是同一个oop,因此也是要复制一份Hello,World.了。而使用String#intern方法得到的s4s1又是同一个oop,那接下来就来看看这个方法的实现。

String Table

String#intern是个本地方法,

    /**
     * Returns a canonical representation for the string object.
     * <p>
     * A pool of strings, initially empty, is maintained privately by the
     * class <code>String</code>.
     * <p>
     * When the intern method is invoked, if the pool already contains a
     * string equal to this <code>String</code> object as determined by
     * the {@link #equals(Object)} method, then the string from the pool is
     * returned. Otherwise, this <code>String</code> object is added to the
     * pool and a reference to this <code>String</code> object is returned.
     * <p>
     * It follows that for any two strings <code>s</code> and <code>t</code>,
     * <code>s.intern()&nbsp;==&nbsp;t.intern()</code> is <code>true</code>
     * if and only if <code>s.equals(t)</code> is <code>true</code>.
     * <p>
     * All literal strings and string-valued constant expressions are
     * interned. String literals are defined in section 3.10.5 of the
     * <cite>The Java&trade; Language Specification</cite>.
     *
     * @return  a string that has the same contents as this string, but is
     *          guaranteed to be from a pool of unique strings.
     */
    public native String intern();

它的实现在String.c

JNIEXPORT jobject JNICALL
Java_java_lang_String_intern(JNIEnv *env, jobject this)
{
    return JVM_InternString(env, this);
}

看下JVM_InternString方法的实现

JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))
  JVMWrapper("JVM_InternString");
  JvmtiVMObjectAllocEventCollector oam;
  if (str == NULL) return NULL;
  oop string = JNIHandles::resolve_non_null(str);
  oop result = StringTable::intern(string, CHECK_NULL);
  return (jstring) JNIHandles::make_local(env, result);
JVM_END

使用了一个StringTableStringTable的代码是从SymbolTable里面拆出来的,看下symbolTable.hpp的说明,

// The symbol table holds all Symbol*s and corresponding interned strings.
// Symbol*s and literal strings should be canonicalized.
//
// The interned strings are created lazily.
//
// It is implemented as an open hash table with a fixed number of buckets.
//
// %note:
//  - symbolTableEntrys are allocated in blocks to reduce the space overhead.

StringTable是个Hashtable

class StringTable : public Hashtable<oop, mtSymbol>

看下StringTable::intern方法

oop StringTable::intern(oop string, TRAPS)
{
  if (string == NULL) return NULL;
  ResourceMark rm(THREAD);
  int length;
  Handle h_string (THREAD, string);
  jchar* chars = java_lang_String::as_unicode_string(string, length);
  oop result = intern(h_string, chars, length, CHECK_NULL);
  return result;
}
oop StringTable::intern(Handle string_or_null, jchar* name,
                        int len, TRAPS) {
  unsigned int hashValue = hash_string(name, len);
  int index = the_table()->hash_to_index(hashValue);
  oop found_string = the_table()->lookup(index, name, len, hashValue);

  // Found
  if (found_string != NULL) return found_string;

  debug_only(StableMemoryChecker smc(name, len * sizeof(name[0])));
  assert(!Universe::heap()->is_in_reserved(name) || GC_locker::is_active(),
         "proposed name of symbol must be stable");

  Handle string;
  // try to reuse the string if possible
  if (!string_or_null.is_null() && (!JavaObjectsInPerm || string_or_null()->is_perm())) {
    string = string_or_null;
  } else {
    string = java_lang_String::create_tenured_from_unicode(name, len, CHECK_NULL);
  }

  // Grab the StringTable_lock before getting the_table() because it could
  // change at safepoint.
  MutexLocker ml(StringTable_lock, THREAD);

  // Otherwise, add to symbol to table
  return the_table()->basic_add(index, string, name, len,
                                hashValue, CHECK_NULL);
}

所以当我们调用String#intern方法时,就会先来查找StringTable。从示例中我们可以猜想,当ldc一个字符串常量的时候,也就是在给s1赋值的时候,HotSpot会自动帮我们调用intern方法,所以在给s4赋值,查找StringTable时,发现已经有该字符串的oop了,于是就直接返回,赋值给了s4,因此s4s1便是同一个oop。

我们可以借助SA写一个小工具来dump下StringTable中所有的oop(SA真是个好东西哇:),

import sun.jvm.hotspot.memory.StringTable;
import sun.jvm.hotspot.oops.Instance;
import sun.jvm.hotspot.tools.Tool;

public class StringTableDumper extends Tool {

    public static void main(String[] args) {
        StringTableDumper printer = new StringTableDumper();
        printer.start(args);
        printer.stop();
    }

    @Override
    public void run() {
        StringTable stringTable = StringTable.getTheTable();
        stringTable.stringsDo(new StringTable.StringVisitor() {
            @Override
            public void visit(Instance instance) {
                instance.print();
            }
        });
    }

}

执行一下,

> java me.kisimple.just4fun.StringTableDumper 6092 > stringTable.txt
Attaching to process ID 6092, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.51-b03

输出的数据有点多,妥妥的我们可以找到s1s2s4共同指向的地址,0x00000007d619bc90

"Hello,World." @ 0x00000007d619bc90 (object size = 24)
 - _mark:    {0} :1
 - _metadata._compressed_klass:  {8} :InstanceKlass for java/lang/String @ 0x000000077ce0afe8
 - value:    {12} :[C @ 0x00000007d619bca8
 - hash:     {16} :0
 - hash32:   {20} :0

_metadata也终于是打印出来了:)

Symbol Table

最后我们就不卖关子了,第4份Hello,World.是在上面提到了的SymbolTable中,同样的我们使用SA写个小工具来打印SymbolTable中的数据,

import sun.jvm.hotspot.memory.SymbolTable;
import sun.jvm.hotspot.oops.Symbol;
import sun.jvm.hotspot.tools.Tool;

public class SymbolTableDumper extends Tool {

    public static void main(String[] args) {
        SymbolTableDumper printer = new SymbolTableDumper();
        printer.start(args);
        printer.stop();
    }

    @Override
    public void run() {
        SymbolTable symbolTable = SymbolTable.getTheTable();
        symbolTable.symbolsDo(new SymbolTable.SymbolVisitor() {
            @Override
            public void visit(Symbol symbol) {
                System.out.println(symbol.asString() + "@" + symbol.getAddress());
            }
        });
    }

}

在打印的结果中我们可以看到这么一行,

Hello,World.@0x000000000ca89740

这就是第4份数据了,而这份数据所在的地址0x000000000ca89740,通过对比universe输出的结果,是不在GC堆上面的,而上面的3份Hello,World.则全都是在GC堆的YoungGen,因此都受GC管理,第4份则是使用引用计数来管理,具体可以看下源码。

那么这一份数据又是做什么用的?事实上这份数据对应的是class文件中,Constant pool中的这一行,

   #2 = String             #48            //  Hello,World.

当执行ldc #2时就需要用到SymbolTable中的这个符号。

参考资料

HotSpotVM 字符串实现浅析#1

原文:http://blog.csdn.net/kisimple/article/details/45128525

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!