Hadoop系列（四）Hadoop三大核心之HDFS Java API

时间：2019-09-01 19:49:46 阅读：69 评论：0 收藏：0 [点我收藏+]

概念
具体操作

HDFS 设计的主要目的是对海量数据进行存储，也就是说在其上能够存储很大量的文件。

HDFS 将这些文件分割之后，存储在不同的 DataNode 上，HDFS 提供了通过Java API 对 HDFS 里面的文件进行操作的功能，数据块在 DataNode 上的存放位置，对于开发者来说是透明的。

使用 Java API 可以完成对 HDFS 的各种操作，如新建文件、删除文件、读取文件内容等。下面将介绍 HDFS 常用的 Java API 及其编程实例。

概念

Configuration 封装了客户端或者服务器的配置
FileSystem 文件系统对象，用该对象的方法来对文件进行操作
FileStatus 用于向客户端展示系统中文件和目录的元数据
FSDatalnputStream HDFS 中的输入流，用于读取 Hadoop 文件
FSDataOutputStream HDFS 中的输出流，用于写 Hadoop 文件
Path 用于表示 Hadoop 文件系统中的文件或者目录的路径

具体操作

1 导入maven依赖包

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>3.1.1</version>
</dependency>

2 初始化配置

Configuration conf;
FileSystem fileSystem;

public HdfsAPI() {
    conf = new Configuration();
    conf.set("dfs.replication", "2");
    conf.set("dfs.blocksize", "128m");
    try {
        fileSystem = FileSystem.get(new URI("hdfs://${NameNode}:9000"), conf, "hadoop");
    } catch (Exception e) {
        e.printStackTrace();
    }

}

3 get文件 get

public void testGet() throws IllegalArgumentException, IOException {
    fileSystem.copyToLocalFile(new Path("/output/part.txt"), new Path("~/Downloads"));
    fileSystem.close();
}

4 获取文件信息 ls

public void testLs() throws IllegalArgumentException, IOException {
    RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new Path("/"), true);
    while (listFiles.hasNext()) {
        LocatedFileStatus status = listFiles.next();
        System.out.println("路径：" + status.getPath());
        System.out.println("块大小：" + status.getBlockSize());
        System.out.println("文件长度：" + status.getLen());
        System.out.println("副本数:" + status.getReplication());
        System.out.println("块的位置信息：" + Arrays.toString(status.getBlockLocations()) + "\n");
    }
    fileSystem.close();
}

public void testMkdir() throws IllegalArgumentException, IOException {
    fileSystem.mkdirs(new Path("/output/test/testmk"));
    fileSystem.close();
}

public void testDeldir() throws IllegalArgumentException, IOException {
    boolean delete = fileSystem.delete(new Path("/output/test/testmk"), true);
    if (delete) {
        System.out.println("文件已经删除");
    }
    fileSystem.close();
}

7 读取hdfs文件内容

public void testReadData() throws IOException {
    FSDataInputStream in = fileSystem.open(new Path("/test.txt"));//hdfs自带流打开文件
    BufferedReader br = new BufferedReader(new InputStreamReader(in, "utf-8"));//读入流并放在缓冲区
    String line = null;
    while ((line = br.readLine()) != null) {
        System.out.println(line);
    }

    in.close();
    br.close();
    fileSystem.close();
}

8 读取hdfs中文件中指定偏移的内容

public void testRandomReadData() throws IOException {
    FSDataInputStream in = fileSystem.open(new Path("/test.txt"));
    in.seek(12);//定位到12位置开始读
    byte[] buf = new byte[16];//往后读取16个位
    in.read(buf);//ba流读到buf中
    System.out.println(new String(buf));
    in.close();
    fileSystem.close();
}

9 数据写到hdfs中

public void testWriteData() throws IOException {
    FSDataOutputStream out = fileSystem.create(new Path("/yy.jpg"), false);
    FileInputStream in = new FileInputStream("~/Download/wechatpic_20190309221605.jpg");
    byte[] buf = new byte[1024];
    int read = 0;
    while ((read = in.read(buf)) != -1) {
        out.write(buf, 0, read);
    }
    out.close();
    fileSystem.close();
}

Hadoop系列（四）Hadoop三大核心之HDFS Java API

原文：https://www.cnblogs.com/valjeanshaw/p/11443124.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)