python dummy_Dummy-Spark版本：用于测试的纯Python火花模拟-白红宇

python dummy_Dummy-Spark版本：用于测试的纯Python火花模拟

阅读量：2519 次

发布时间：2019-05-11

本文共 1436 字，大约阅读时间需要 4 分钟。

python dummy

In a , I mentioned a little project to provide a pure-python mock of apache spark’s RDD object for testing and quick prototyping. Thanks to some help from contributors, we’ve made a bit of progress and now a good bit of the RDD API is supported, including using the newHadoopAPI with elasticsearch-hadoop, and pulling files from s3.

在上一篇，我提到了一个小项目，该项目提供了一个用于Apache Apache Spark的RDD对象的纯python模拟，以进行测试和快速原型设计。得益于贡献者的一些帮助，我们取得了一些进展，现在支持了很多RDD API，包括将newHadoopAPI与elasticsearch-hadoop一起使用，以及从s3中提取文件。

I’ve just published the v0.0.2 release, which can be installed as:

我刚刚发布了v0.0.2版本，可以将其安装为：

pip install dummyrdd==0.0.2

And used like:

像这样使用：

from dummy_spark import SparkContext, SparkConfsconf = SparkConf()sc = SparkContext(master='', conf=sconf)rdd = sc.parallelize([1, 2, 3, 4, 5])print(rdd.count())print(rdd.map(lambda x: x**2).collect())

In the new release, we’ve added two small bits of functionality:

在新版本中，我们添加了两个小的功能：

newHadoopAPI support for elasticsearch-hadoop functions, mocked using elasticsearch-py. Should be 1-to-1 functionality and format returned for testing out pyspark programs that query ES into RDDs.

repartition implemented for all RDDs

新的Hadoop API对elasticsearch-hadoop函数的支持，使用elasticsearch-py模拟。应返回1对1的功能和格式，以测试将ES查询为RDD的pyspark程序。

所有RDD实施分区

These are in addition to the large list of implemented methods that can be found in the readme on github.

这些是在github的自述文件中可以找到的大量实现方法的补充。

翻译自:

python dummy

转载地址：http://egqwd.baihongyu.com/

你可能感兴趣的文章