-
Notifications
You must be signed in to change notification settings - Fork 302
Scala to Python - rdd folder #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
from pyspark import SparkContext | ||
from commons.Utils import Utils | ||
|
||
def splitComma(line: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried to run this program? It doesn't compile
File "/Users/cwei/code/python-spark-tutorial-new/rdd/airports/AirportsByLatitudeSolution.py", line 4
def splitComma(line: str):
^
SyntaxError: invalid syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, yes I did ran all programs. Which version of python are you running? This should work in the latest Python 3 version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the confusion. It works for Python 3. I was running python 2.7. Feel free to ignore this comment.
For all the programs which print to standard output, please set the logging level to ERROR so that there is less noise in the output. |
from pyspark import SparkContext | ||
from commons.Utils import Utils | ||
|
||
def splitComma(line: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, it didn't compile, I think you don't need the type.
File "/Users/cwei/code/python-spark-tutorial-new/rdd/airports/AirportsInUsaSolution.py", line 4
def splitComma(line: str):
^
SyntaxError: invalid syntax
from pyspark import SparkContext | ||
|
||
if __name__ == "__main__": | ||
sc = SparkContext("local", "collect") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please set the logging level to ERROR similar to what the Scala problem does to reduce the noise of the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi James, some considerations about the logging level when using pyspark:
- From the script itself, when using pyspark, we can only set the log level after starting the SparkContext, this means that logs printed when the SparkContext is starting will be printed anyway.
- The best way to reduce the noise of the output is to configure the file log4j.properties inside spark/conf folder.
That being said, I will set the log levels to ERROR after the SparkContext starts
@@ -0,0 +1,11 @@ | |||
from pyspark import SparkContext | |||
|
|||
if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, set the logging level to ERROR
@@ -0,0 +1,17 @@ | |||
from pyspark import SparkContext | |||
|
|||
def isNotHeader(line:str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't compile
def isNotHeader(line:str):
^
SyntaxError: invalid syntax```
@@ -0,0 +1,8 @@ | |||
from pyspark import SparkContext | |||
|
|||
if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set the logging level to ERROR
Converted all the scala files on the RDD folder to python