Skip to content
This repository was archived by the owner on Nov 11, 2022. It is now read-only.

Commit 4392368

Browse files
committed
Reverse-integrate: v2 to master
2 parents f89d619 + 10e9f86 commit 4392368

1,025 files changed

Lines changed: 4259 additions & 209469 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitattributes

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
1+
# Copyright (C) 2017 Google Inc.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
4+
# use this file except in compliance with the License. You may obtain a copy of
5+
# the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
11+
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
12+
# License for the specific language governing permissions and limitations under
13+
# the License.
14+
115
# The default behavior, which overrides 'core.autocrlf', is to use Git's
216
# built-in heuristics to determine whether a particular file is text or binary.
317
# Text files are automatically normalized to the user's platforms.

.gitignore

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
1+
# Copyright (C) 2017 Google Inc.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
4+
# use this file except in compliance with the License. You may obtain a copy of
5+
# the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
11+
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
12+
# License for the specific language governing permissions and limitations under
13+
# the License.
14+
115
target/
216

317
# Ignore IntelliJ files.

.travis.yml

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
1+
# Copyright (C) 2017 Google Inc.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
4+
# use this file except in compliance with the License. You may obtain a copy of
5+
# the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
11+
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
12+
# License for the specific language governing permissions and limitations under
13+
# the License.
14+
115
language: java
216

317
sudo: false
@@ -15,24 +29,26 @@ matrix:
1529
include:
1630
# On OSX, run with default JDK only.
1731
- os: osx
18-
env: MAVEN_OVERRIDE=""
1932
# On Linux, run with specific JDKs only.
2033
- os: linux
21-
env: CUSTOM_JDK="oraclejdk8" MAVEN_OVERRIDE="-Ddataflow.surefire_argline='-Xmx512m'"
22-
- os: linux
23-
env: CUSTOM_JDK="oraclejdk7" MAVEN_OVERRIDE="-Ddataflow.surefire_argline='-Xmx512m'"
24-
- os: linux
25-
env: CUSTOM_JDK="openjdk7" MAVEN_OVERRIDE="-Ddataflow.surefire_argline='-Xmx512m'"
34+
env: CUSTOM_JDK="oraclejdk8"
35+
# The distribution does not build with Java 7 by design. We need to rewrite these tests
36+
# to, for example, build and install with Java 8 and then test examples with Java 7.
37+
# - os: linux
38+
# env: CUSTOM_JDK="oraclejdk7"
39+
# - os: linux
40+
# env: CUSTOM_JDK="openjdk7"
2641

2742
before_install:
2843
- if [ "$TRAVIS_OS_NAME" == "osx" ]; then export JAVA_HOME=$(/usr/libexec/java_home); fi
2944
- if [ "$TRAVIS_OS_NAME" == "linux" ]; then jdk_switcher use "$CUSTOM_JDK"; fi
3045

3146
install:
3247
- travis_retry mvn install clean -U -DskipTests=true
33-
- travis_retry mvn -f contrib/kafka/pom.xml install clean -U -DskipTests=true
3448

3549
script:
36-
- travis_retry mvn $MAVEN_OVERRIDE install -U
37-
- travis_retry mvn -f contrib/kafka/pom.xml $MAVEN_OVERRIDE install -U
38-
- travis_retry travis/test_wordcount.sh
50+
# Verify that the project can be built and installed.
51+
- mvn install
52+
# Verify that starter and examples archetypes have the correct version of the NOTICE file.
53+
- diff -q NOTICE maven-archetypes/starter/src/main/resources/NOTICE
54+
- diff -q NOTICE maven-archetypes/examples/src/main/resources/NOTICE

CONTRIBUTING.md

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,31 @@
1+
<!--
2+
Copyright (C) 2017 Google Inc.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License"); you may not
5+
use this file except in compliance with the License. You may obtain a copy of
6+
the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
12+
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
13+
License for the specific language governing permissions and limitations under
14+
the License.
15+
-->
16+
117
Want to contribute? Great! First, read this page (including the small print at
218
the end).
319

20+
Google Cloud Dataflow SDK is a distribution of Apache Beam. If you'd like to
21+
change anything under the `org.apache.beam.*` namespace, please submit that
22+
change directly to the [Apache Beam](https://github.com/apache/beam) project.
23+
24+
This repository contains code to build the Dataflow distribution of Beam, and
25+
some Dataflow-specific code. Only changes to how the distribution is built, or
26+
the Dataflow-specific code under the `com.google.cloud.dataflow.*` namespace,
27+
can be merged here.
28+
429
### Before you contribute
530
Before we can use your code, you must sign the
631
[Google Individual Contributor License Agreement](https://developers.google.com/open-source/cla/individual?csw=1)
@@ -21,11 +46,6 @@ frustration later on.
2146
All submissions, including submissions by project members, require review. We
2247
use GitHub pull requests for this purpose.
2348

24-
### Organization
25-
During our review and triage of incoming pull requests, we'll advise whether to
26-
include your contribution into the mainline SDK, or to maintain it within the
27-
separate group of [community-contributed modules](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/master/contrib).
28-
2949
### The small print
3050
Contributions made by corporations are covered by a different agreement than
3151
the one above, the Software Grant and Corporate Contributor License Agreement.

NOTICE

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Google Cloud Dataflow SDK for Java
2+
Copyright 2017, Google Inc.
3+
4+
This product includes software developed at
5+
The Apache Software Foundation (http://www.apache.org/).

README.md

Lines changed: 40 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,30 @@
1+
<!--
2+
Copyright (C) 2017 Google Inc.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License"); you may not
5+
use this file except in compliance with the License. You may obtain a copy of
6+
the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
12+
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
13+
License for the specific language governing permissions and limitations under
14+
the License.
15+
-->
16+
117
# Google Cloud Dataflow SDK for Java
218

319
[Google Cloud Dataflow](https://cloud.google.com/dataflow/) provides a simple,
420
powerful programming model for building both batch and streaming parallel data
5-
processing pipelines. This repository hosts the open-sourced Cloud Dataflow SDK
6-
for Java, which can be used to run pipelines against the Google Cloud Dataflow
7-
Service.
21+
processing pipelines.
22+
23+
Dataflow SDK for Java is a distribution of a portion of the
24+
[Apache Beam](https://beam.apache.org) project. This repository hosts the
25+
code to build this distribution and any Dataflow-specific code/modules. The
26+
underlying source code is hosted in the
27+
[Apache Beam repository](https://github.com/apache/beam).
828

929
[General usage](https://cloud.google.com/dataflow/getting-started) of Google
1030
Cloud Dataflow does **not** require use of this repository. Instead:
@@ -25,13 +45,19 @@ environments like Eclipse or Apache Maven:
2545
[DataflowJavaSDK-examples](https://github.com/GoogleCloudPlatform/DataflowJavaSDK-examples)
2646
repository.
2747

28-
However, if you'd like to contribute to the SDK, write your own PipelineRunner,
29-
or just dig in for the fun of it, please stay with us here!
48+
<!-- 1. If you are using [Eclipse](https://eclipse.org/) integrated development
49+
environment (IDE), the
50+
[Cloud Dataflow Plugin for Eclipse](https://cloud.google.com/dataflow/getting-started-eclipse)
51+
provides tools to create and execute Dataflow pipelines locally and on the
52+
Dataflow Service. -->
3053

31-
## Status [![Build Status](https://travis-ci.org/GoogleCloudPlatform/DataflowJavaSDK.svg?branch=master)](https://travis-ci.org/GoogleCloudPlatform/DataflowJavaSDK)
54+
## Status [![Build Status](https://travis-ci.org/GoogleCloudPlatform/DataflowJavaSDK.svg?branch=v2)](https://travis-ci.org/GoogleCloudPlatform/DataflowJavaSDK)
3255

33-
Both the SDK and the Dataflow Service are generally available, open to all
34-
developers, and considered stable and fully qualified for production use.
56+
This branch is a work-in-progress for the Dataflow SDK for Java, version 2.0.0.
57+
It is currently supported on the Cloud Dataflow service in Beta.
58+
59+
<!--Both the SDK and the Dataflow Service are generally available, open to all
60+
developers, and considered stable and fully qualified for production use.-->
3561

3662
## Overview
3763

@@ -48,25 +74,17 @@ for execution.
4874
* [`PipelineRunner`](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/PipelineRunner.java):
4975
specifies where and how the pipeline should execute.
5076

51-
We provide three PipelineRunners:
77+
We provide two runners:
5278

53-
1. The [`DirectPipelineRunner`](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner.java)
79+
1. The [`DirectRunner`](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner.java)
5480
runs the pipeline on your local machine.
55-
2. The [`DataflowPipelineRunner`](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/DataflowPipelineRunner.java)
81+
1. The [`DataflowRunner`](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/DataflowPipelineRunner.java)
5682
submits the pipeline to the Dataflow Service, where it runs using managed
5783
resources in the [Google Cloud Platform](https://cloud.google.com) (GCP).
58-
3. The [`BlockingDataflowPipelineRunner`](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/BlockingDataflowPipelineRunner.java)
59-
submits the pipeline to the Dataflow Service via the `DataflowPipelineRunner`
60-
and then prints messages about the job status until the execution is complete.
6184

6285
The SDK is built to be extensible and support additional execution environments
63-
beyond local execution and the Google Cloud Dataflow Service. In partnership
64-
with [Cloudera](https://www.cloudera.com/), you can run Dataflow pipelines on
65-
an [Apache Spark](https://spark.apache.org/) backend using the
66-
[`SparkPipelineRunner`](https://github.com/cloudera/spark-dataflow).
67-
Additionally, you can run Dataflow pipelines on an
68-
[Apache Flink](https://flink.apache.org/) backend using the
69-
[`FlinkPipelineRunner`](https://github.com/dataArtisans/flink-dataflow).
86+
beyond local execution and the Google Cloud Dataflow Service. Apache Beam
87+
contains additional SDKs, runners, IO connectors, etc.
7088

7189
## Getting Started
7290

@@ -77,35 +95,12 @@ module provides a set of basic Java APIs to program against.
7795
* The [`examples`](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/examples)
7896
module provides a few samples to get started. We recommend starting with the
7997
`WordCount` example.
80-
* The [`contrib`](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/contrib)
81-
directory hosts community-contributed Dataflow modules.
8298

8399
The following command will build both the `sdk` and `example` modules and
84100
install them in your local Maven repository:
85101

86102
mvn clean install
87103

88-
You can speed up the build and install process by using the following options:
89-
90-
1. To skip execution of the unit tests, run:
91-
92-
mvn install -DskipTests
93-
94-
2. While iterating on a specific module, use the following command to compile
95-
and reinstall it. For example, to reinstall the `examples` module, run:
96-
97-
mvn install -pl examples
98-
99-
Be careful, however, as this command will use the most recently installed SDK
100-
from the local repository (or Maven Central) even if you have changed it
101-
locally.
102-
103-
If you are using [Eclipse](https://eclipse.org/) integrated development
104-
environment (IDE), the
105-
[Cloud Dataflow Plugin for Eclipse](https://cloud.google.com/dataflow/getting-started-eclipse)
106-
provides tools to create and execute Dataflow pipelines locally and on the
107-
Dataflow Service.
108-
109104
After building and installing, you can execute the `WordCount` and other
110105
example pipelines by following the instructions in this
111106
[README](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/examples/README.md).
@@ -121,5 +116,6 @@ on GitHub to report any bugs, comments or questions regarding SDK development.
121116
## More Information
122117

123118
* [Google Cloud Dataflow](https://cloud.google.com/dataflow/)
119+
* [Apache Beam](https://beam.apache.org/)
124120
* [Dataflow Concepts and Programming Model](https://cloud.google.com/dataflow/model/programming-model)
125121
* [Java API Reference](https://cloud.google.com/dataflow/java-sdk/JavaDoc/index)

contrib/README.md

Lines changed: 0 additions & 53 deletions
This file was deleted.

contrib/firebaseio/AUTHORS.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

contrib/firebaseio/README.md

Lines changed: 0 additions & 15 deletions
This file was deleted.

0 commit comments

Comments
 (0)