Scala to Python - rdd folder #1

pedromb · 2017-09-28T14:26:01Z

Converted all the scala files on the RDD folder to python

jleetutorial · 2017-09-28T19:44:45Z

rdd/airports/AirportsByLatitudeSolution.py

+from pyspark import SparkContext
+from commons.Utils import Utils
+
+def splitComma(line: str):


Have you tried to run this program? It doesn't compile

File "/Users/cwei/code/python-spark-tutorial-new/rdd/airports/AirportsByLatitudeSolution.py", line 4 def splitComma(line: str): ^ SyntaxError: invalid syntax

Hi, yes I did ran all programs. Which version of python are you running? This should work in the latest Python 3 version

sorry for the confusion. It works for Python 3. I was running python 2.7. Feel free to ignore this comment.

jleetutorial · 2017-09-28T19:47:48Z

For all the programs which print to standard output, please set the logging level to ERROR so that there is less noise in the output.

jleetutorial · 2017-09-28T19:50:48Z

rdd/airports/AirportsInUsaSolution.py

+from pyspark import SparkContext
+from commons.Utils import Utils
+
+def splitComma(line: str):


Again, it didn't compile, I think you don't need the type.

File "/Users/cwei/code/python-spark-tutorial-new/rdd/airports/AirportsInUsaSolution.py", line 4 def splitComma(line: str): ^ SyntaxError: invalid syntax

jleetutorial · 2017-09-28T19:52:01Z

rdd/collect/CollectExample.py

+from pyspark import SparkContext
+
+if __name__ == "__main__":
+    sc = SparkContext("local", "collect")


Please set the logging level to ERROR similar to what the Scala problem does to reduce the noise of the output

Hi James, some considerations about the logging level when using pyspark:

From the script itself, when using pyspark, we can only set the log level after starting the SparkContext, this means that logs printed when the SparkContext is starting will be printed anyway.

The best way to reduce the noise of the output is to configure the file log4j.properties inside spark/conf folder.
That being said, I will set the log levels to ERROR after the SparkContext starts

jleetutorial · 2017-09-28T19:55:04Z

rdd/count/CountExample.py

@@ -0,0 +1,11 @@
+from pyspark import SparkContext
+
+if __name__ == "__main__":


again, set the logging level to ERROR

jleetutorial · 2017-09-28T19:57:00Z

rdd/nasaApacheWebLogs/UnionLogSolutions.py

@@ -0,0 +1,17 @@
+from pyspark import SparkContext
+
+def isNotHeader(line:str):


doesn't compile

def isNotHeader(line:str): ^ SyntaxError: invalid syntax```

jleetutorial · 2017-09-28T19:58:16Z

rdd/reduce/ReduceExample.py

@@ -0,0 +1,8 @@
+from pyspark import SparkContext
+
+if __name__ == "__main__":


set the logging level to ERROR

Pedro Bernardo added 9 commits September 27, 2017 18:10

Added rdd/reduce/ReduceExample.py

8838805

Added rdd/count/CountExample.py

4d7b7a4

Added rdd/collect/CollectExample.py

7b40998

Added rdd/nasaApacheWebLogs/*.py

652d9ff

Added rdd/persist/PersistExample.py

80c20ed

Added commons/Utils.py

08b146a

Added rdd/airports/*.py

afc939b

Fixed app name on rdd/airports/AirportsInUsaSolution.py

c7168ce

Finished rdd | Removed scala files from rdd

131e3cf

jleetutorial reviewed Sep 28, 2017

View reviewed changes

Setting log level to ERROR in scripts that prints to the standard output

f637b18

jleetutorial approved these changes Sep 29, 2017

View reviewed changes

jleetutorial merged commit 9d9066c into master Sep 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scala to Python - rdd folder #1

Scala to Python - rdd folder #1

Uh oh!

pedromb commented Sep 28, 2017

Uh oh!

jleetutorial Sep 28, 2017

Uh oh!

pedromb Sep 28, 2017

Uh oh!

jleetutorial Sep 28, 2017

Uh oh!

jleetutorial commented Sep 28, 2017

Uh oh!

jleetutorial Sep 28, 2017

Uh oh!

jleetutorial Sep 28, 2017

Uh oh!

pedromb Sep 29, 2017

Uh oh!

jleetutorial Sep 28, 2017

Uh oh!

jleetutorial Sep 28, 2017

Uh oh!

jleetutorial Sep 28, 2017

Uh oh!

Uh oh!

		@@ -0,0 +1,11 @@
		from pyspark import SparkContext

		if __name__ == "__main__":

		@@ -0,0 +1,17 @@
		from pyspark import SparkContext

		def isNotHeader(line:str):

		@@ -0,0 +1,8 @@
		from pyspark import SparkContext

		if __name__ == "__main__":

Scala to Python - rdd folder #1

Scala to Python - rdd folder #1

Uh oh!

Conversation

pedromb commented Sep 28, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jleetutorial commented Sep 28, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!