Skip to content

Scala to Python - rdd folder #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 29, 2017
Merged

Conversation

pedromb
Copy link
Collaborator

@pedromb pedromb commented Sep 28, 2017

Converted all the scala files on the RDD folder to python

from pyspark import SparkContext
from commons.Utils import Utils

def splitComma(line: str):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried to run this program? It doesn't compile

  File "/Users/cwei/code/python-spark-tutorial-new/rdd/airports/AirportsByLatitudeSolution.py", line 4
    def splitComma(line: str):
                       ^
SyntaxError: invalid syntax

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, yes I did ran all programs. Which version of python are you running? This should work in the latest Python 3 version

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the confusion. It works for Python 3. I was running python 2.7. Feel free to ignore this comment.

@jleetutorial
Copy link
Owner

For all the programs which print to standard output, please set the logging level to ERROR so that there is less noise in the output.

from pyspark import SparkContext
from commons.Utils import Utils

def splitComma(line: str):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it didn't compile, I think you don't need the type.

  File "/Users/cwei/code/python-spark-tutorial-new/rdd/airports/AirportsInUsaSolution.py", line 4
    def splitComma(line: str):
                       ^
SyntaxError: invalid syntax

from pyspark import SparkContext

if __name__ == "__main__":
sc = SparkContext("local", "collect")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set the logging level to ERROR similar to what the Scala problem does to reduce the noise of the output

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi James, some considerations about the logging level when using pyspark:

  • From the script itself, when using pyspark, we can only set the log level after starting the SparkContext, this means that logs printed when the SparkContext is starting will be printed anyway.
  • The best way to reduce the noise of the output is to configure the file log4j.properties inside spark/conf folder.
    That being said, I will set the log levels to ERROR after the SparkContext starts

@@ -0,0 +1,11 @@
from pyspark import SparkContext

if __name__ == "__main__":
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, set the logging level to ERROR

@@ -0,0 +1,17 @@
from pyspark import SparkContext

def isNotHeader(line:str):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't compile

    def isNotHeader(line:str):
                        ^
SyntaxError: invalid syntax```

@@ -0,0 +1,8 @@
from pyspark import SparkContext

if __name__ == "__main__":
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set the logging level to ERROR

@jleetutorial jleetutorial merged commit 9d9066c into master Sep 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants