本文共 1436 字,大约阅读时间需要 4 分钟。
python dummy
In a , I mentioned a little project to provide a pure-python mock of apache spark’s RDD object for testing and quick prototyping. Thanks to some help from contributors, we’ve made a bit of progress and now a good bit of the RDD API is supported, including using the newHadoopAPI with elasticsearch-hadoop, and pulling files from s3.
在上一篇 ,我提到了一个小项目,该项目提供了一个用于Apache Apache Spark的RDD对象的纯python模拟,以进行测试和快速原型设计。 得益于贡献者的一些帮助,我们取得了一些进展,现在支持了很多RDD API,包括将newHadoopAPI与elasticsearch-hadoop一起使用,以及从s3中提取文件。
I’ve just published the v0.0.2 release, which can be installed as:
我刚刚发布了v0.0.2版本,可以将其安装为:
pip install dummyrdd==0.0.2
And used like:
像这样使用:
from dummy_spark import SparkContext, SparkConfsconf = SparkConf()sc = SparkContext(master='', conf=sconf)rdd = sc.parallelize([1, 2, 3, 4, 5])print(rdd.count())print(rdd.map(lambda x: x**2).collect())
In the new release, we’ve added two small bits of functionality:
在新版本中,我们添加了两个小的功能:
These are in addition to the large list of implemented methods that can be found in the readme on github.
这些是在github的自述文件中可以找到的大量实现方法的补充。
翻译自:
python dummy
转载地址:http://egqwd.baihongyu.com/