Machine Learning with Spark on Google Cloud Dataproc recensioni
10347 recensioni
Shiv P. · Recensione inserita circa 7 anni fa
Shiv P. · Recensione inserita circa 7 anni fa
Shiv P. · Recensione inserita circa 7 anni fa
鈺介 陸. · Recensione inserita circa 7 anni fa
Juypter notebook cannot be launched with the firewall rules in my environment. Could not do that portion of the lab.
Kevin R. · Recensione inserita circa 7 anni fa
Sergey S. · Recensione inserita circa 7 anni fa
edward c. · Recensione inserita circa 7 anni fa
Andrea C. · Recensione inserita circa 7 anni fa
I'll be contacting Qwiklabs for a refund on this lab. I had problems like many others before I was able to get any points. This lab has proved to me that I must read the comments first. The spark.read statement is out-of-date per the version of PySpark that they are using in the lab. I tried to research on StackOverflow and other places but could not make that statement work properly, and that is what is supposed to set up the first dataset: Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'property' object has no attribute 'option' If I could award 0 stars, I would.
Lawrence M. · Recensione inserita circa 7 anni fa
WidenResearch w. · Recensione inserita circa 7 anni fa
I have completed all the steps but the last 5 points was not registered.
Yap G. · Recensione inserita circa 7 anni fa
Devin L. · Recensione inserita circa 7 anni fa
Very nice tutorial of PySpark, easy to understand and follow
Nhan D. · Recensione inserita circa 7 anni fa
I had all correct output and changed to the correct shard 00004, the lab would not give me credit. Very frustrating!
Eden E. · Recensione inserita circa 7 anni fa
Most of the code in pyspark section throws error.
Girish G. · Recensione inserita circa 7 anni fa
Was NOT successful. Got errors in spark and did not found the reason.Did Try to repeat all the steps before to avoid such errors (assumed to having forgotten one step as a potential reason for the error).: Connected, host fingerprint: ssh-rsa 0 07:38:4A:8F:17:EA:A2:6C:AA:36:0E:D1:F9:EB :9D:ED:5E:09:95:1F:48:EE:04:0F:20:5E:31:59:D4:B3:E1:E4 Linux ch6cluster-m 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Creating directory '/home/google2832717_student'. google2832717_student@ch6cluster-m:~$ export PROJECT_ID=$(gcloud info --format='value(config.project)') google2832717_student@ch6cluster-m:~$ export BUCKET=${PROJECT_ID} google2832717_student@ch6cluster-m:~$ export ZONE=us-central1-a google2832717_student@ch6cluster-m:~$ pyspark Python 2.7.13 (default, Sep 26 2018, 18:42:22) [GCC 6.3.0 20170516] on linux2 Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 19/03/23 14:29:46 WARN org.apache.spark.scheduler.FairSchedulableBuilder: Fair Scheduler configuration file not fou nd so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spa rk.scheduler.allocation.file to a file that contains the configuration. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.3.2 /_/ Using Python version 2.7.13 (default, Sep 26 2018 18:42:22) SparkSession available as 'spark'. >>> from pyspark.mllib.classification import LogisticRegressionWithLBFGS >>> from pyspark.mllib.regression import LabeledPoint >>> >>> "BUCKET=os.environ['BUCKET'] File "<stdin>", line 1 "BUCKET=os.environ['BUCKET'] ^ SyntaxError: EOL while scanning string literal >>> traindays = spark.read \ ... .option(""header"", ""true"") \ File "<stdin>", line 2 .option(""header"", ""true"") \ ^ SyntaxError: invalid syntax >>> .csv('gs://{}/flights/trainday.csv'.format(BUCKET))" File "<stdin>", line 1 .csv('gs://{}/flights/trainday.csv'.format(BUCKET))" ^ IndentationError: unexpected indent >>> >>> from pyspark.mllib.classification import LogisticRegressionWithLBFGS >>> >>> from pyspark.mllib.regression import LabeledPoint >>> BUCKET=os.environ['BUCKET'] >>> traindays = spark.read \ ... .option("header", "true") \ ... .csv('gs://{}/flights/trainday.csv'.format(BUCKET)) ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,/etc/hive/conf.dist/ivysettings.xml will be used >>> traindays.createOrReplaceTempView('traindays') >>> spark.sql("SELECT * from traindays ORDER BY FL_DATE LIMIT 5").show() +----------+------------+ | FL_DATE|is_train_day| +----------+------------+ |2015-01-01| True| >>> flights = spark.read\... .schema(schema)\... .csv(inputs)>>> flights.createOrReplaceTempView('flights')19/03/23 14:37:25 WARN org.apache.spark.util.Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.>>> trainquery = """... SELECT... F.DEP_DELAY,F.TAXI_OUT,f.ARR_DELAY,F.DISTANCE... FROM flights f... JOIN traindays t... ON f.FL_DATE == t.FL_DATE... WHERE... t.is_train_day == 'True'... """>>> traindata = spark.sql(trainquery)>>> >>> >>> trainquery = """... SELECT... F.DEP_DELAY,F.TAXI_OUT,f.ARR_DELAY,F.DISTANCE... FROM flights f... JOIN traindays t... ON f.FL_DATE == t.FL_DATE... WHERE... t.is_train_day == 'True'... """>>> traindata = spark.sql(trainquery)>>> traindata.head(2)[Row(DEP_DELAY=-2.0, TAXI_OUT=26.0, ARR_DELAY=0.0, DISTANCE=677.0), Row(DEP_DELAY=-2.0, TAXI_OUT=22.0, ARR_DELAY=3.0, DISTANCE=451.0)]>>> traindata.describe().show()+-------+------------------+-----------------+-----------------+-----------------+|summary| DEP_DELAY| TAXI_OUT| ARR_DELAY| DISTANCE|+-------+------------------+-----------------+-----------------+-----------------+| count| 151446| 151373| 150945| 152566|| mean|10.726252261532164|16.11821791204508|5.310126204909073|837.4265432665208|| stddev| 36.38718688562445|8.897148233750972|38.04559816976176|623.0449480656523|| min| -39.0| 1.0| -68.0| 31.0|| max| 1393.0| 168.0| 1364.0| 4983.0|+-------+------------------+-----------------+-----------------+-----------------+>>> def to_example(raw_data_point):... return LabeledPoint(\... float(raw_data_point['ARR_DELAY'] < 15), # on-time? \... [ \... raw_data_point['DEP_DELAY'], \... raw_data_point['TAXI_OUT'], \... raw_data_point['DISTANCE'], \... ])... >>> >>> traindays.createOrReplaceTempView('traindays')>>> >>> spark.sql("SELECT * from traindays ORDER BY FL_DATE LIMIT 5").show()Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: unbound method sql() must be called with SparkSession instance as first argument (got str instance instead)>>> from pyspark.sql.types \... import StringType, FloatType, StructType, StructField>>> header = 'FL_DATE,UNIQUE_CARRIER,AIRLINE_ID,CARRIER,FL_NUM,ORIGIN_AIRPORT_ID,ORIGIN_AIRPORT_SEQ_ID,ORIGIN_CITY_MARKET_ID,ORIGIN,DEST_AIRPORT_ID,DEST_AIRPORT_SEQ_ID,DEST_CITY_MARKET_ID,DEST,CRS_DEP_TIME,DEP_TIME,DEP_DELAY,TAXI_OUT,WHEELS_OFF,WHEELS_ON,TAXI_IN,CRS_ARR_TIME,ARR_TIME,ARR_DELAY,CANCELLED,CANCELLATION_CODE,DIVERTED,DISTANCE,DEP_AIRPORT_LAT,DEP_AIRPORT_LON,DEP_AIRPORT_TZOFFSET,ARR_AIRPORT_LAT,ARR_AIRPORT_LON,ARR_AIRPORT_TZOFFSET,EVENT,NOTIFY_TIME'>>> def get_structfield(colname):... if colname in ['ARR_DELAY', 'DEP_DELAY', 'DISTANCE', 'TAXI_OUT']:... return StructField(colname, FloatType(), True)... else:... return StructField(colname, StringType(), True)... >>> schema = StructType([get_structfield(colname) for colname in header.split(',')])>>> inputs = 'gs://{}/flights/tzcorr/all_flights-00004-*'.format(BUCKET)>>> >>> flights = spark.read\... .schema(schema)\... .csv(inputs)Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'property' object has no attribute 'schema'>>> flights.createOrReplaceTempView('flights')>>> trainquery = """... SELECT... F.DEP_DELAY,F.TAXI_OUT,f.ARR_DELAY,F.DISTANCE... FROM flights f... JOIN traindays t... ON f.FL_DATE == t.FL_DATE... WHERE... t.is_train_day == 'True'... """>>> traindata = spark.sql(trainquery)Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: unbound method sql() must be called with SparkSession instance as first argument (got str instance instead)>>> >>> traindata.head(2)[Row(DEP_DELAY=-2.0, TAXI_OUT=26.0, ARR_DELAY=0.0, DISTANCE=677.0), Row(DEP_DELAY=-2.0, TAXI_OUT=22.0, ARR_DELAY=3.0, DISTANCE=451.0)]>>> Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: unbound method sql() must be called with SparkSession instance as first argument (got str instance instead)>>> traindata.describe().show()+-------+------------------+-----------------+-----------------+-----------------+|summary| DEP_DELAY| TAXI_OUT| ARR_DELAY| DISTANCE|+-------+------------------+-----------------+-----------------+-----------------+| count| 151446| 151373| 150945| 152566|| mean|10.726252261532164|16.11821791204508|5.310126204909073|837.4265432665208|| stddev| 36.38718688562445|8.897148233750972|38.04559816976176|623.0449480656523|| min| -39.0| 1.0| -68.0| 31.0|| max| 1393.0| 168.0| 1364.0| 4983.0|+-------+------------------+-----------------+-----------------+-----------------+>>> def to_example(raw_data_point):... return LabeledPoint(\... float(raw_data_point['ARR_DELAY'] < 15), # on-time? \... [ \... raw_data_point['DEP_DELAY'], \... raw_data_point['TAXI_OUT'], \... raw_data_point['DISTANCE'], \... ])... >>> examples = traindata.rdd.map(to_example)>>> lrmodel = LogisticRegressionWithLBFGS.train(examples, intercept=True)[Stage 22:> (0 + 2) / 2]Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/spark/python/pyspark/mllib/classification.py", line 398, in train return _regression_train_wrapper(train, LogisticRegressionModel, data, initialWeights) File "/usr/lib/spark/python/pyspark/mllib/regression.py", line 215, in _regression_train_wrapper data, _convert_to_vector(initial_weights)) File "/usr/lib/spark/python/pyspark/mllib/classification.py", line 388, in train float(tolerance), bool(validateData), int(numClasses)) File "/usr/lib/spark/python/pyspark/mllib/common.py", line 130, in callMLlibFunc return callJavaFunc(sc, api, *args) File "/usr/lib/spark/python/pyspark/mllib/common.py", line 123, in callJavaFunc return _java2py(sc, func(*args)) File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__ File "/usr/lib/spark/python/pyspark/sql/utils.py", line 79, in deco raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)pyspark.sql.utils.IllegalArgumentException: u'requirement failed: init value should <= bound'>>> >>> >>> print lrmodel.weights,lrmodel.interceptTraceback (most recent call last): File "<stdin>", line 1, in <module>NameError: name 'lrmodel' is not defined>>> >>> [-0.17315525007,-0.123703577812,0.00047521823417] 5.26368986835 File "<stdin>", line 1 [-0.17315525007,-0.123703577812,0.00047521823417] 5.26368986835 ^SyntaxError: invalid syntax>>> lrmodel.predict([6.0,12.0,594.0])Traceback (most recent call last): File "<stdin>", line 1, in <module>NameError: name 'lrmodel' is not defined>>> >>> lrmodel.predict([36.0,12.0,594.0])Traceback (most recent call last): File "<stdin>", line 1, in <module>NameError: name 'lrmodel' is not defined>>> >>> lrmodel.clearThreshold()Traceback (most recent call last): File "<stdin>", line 1, in <module>NameError: name 'lrmodel' is not defined>>> print lrmodel.predict([6.0,12.0,594.0])Traceback (most recent call last): File "<stdin>", line 1, in <module>NameError: name 'lrmodel' is not defined>>> print lrmodel.predict([36.0,12.0,594.0])Traceback (most recent call last): File "<stdin>", line 1, in <module>NameError: name 'lrmodel' is not defined>>> lrmodel.setThreshold(0.7)Traceback (most recent call last): File "<stdin>", line 1, in <module>NameError: name 'lrmodel' is not defined>>> print lrmodel.predict([6.0,12.0,594.0])Traceback (most recent call last): File "<stdin>", line 1, in <module>NameError: name 'lrmodel' is not defined>>> print lrmodel.predict([36.0,12.0,594.0])Traceback (most recent call last): File "<stdin>", line 1, in <module>NameError: name 'lrmodel' is not defined>>> MODEL_FILE='gs://' + BUCKET + '/flights/sparkmloutput/model'>>> os.system('gsutil -m rm -r ' +MODEL_FILE)CommandException: 1 files/objects could not be removed.256>>> lrmodel.save(sc, MODEL_FILE)Traceback (most recent call last): File "<stdin>", line 1, in <module>NameError: name 'lrmodel' is not defined>>> print '{} saved'.format(MODEL_FILE)gs://qwiklabs-gcp-a17a185bd4f73119/flights/sparkmloutput/model saved>>> lrmodel = 0 >>> print lrmodel 0 >>> from pyspark.mllib.classification import LogisticRegressionModel >>> lrmodel = LogisticRegressionModel.load(sc, MODEL_FILE) Traceback (most recent call last): File "<stdin>", line 1, in <module>
Stephan H. · Recensione inserita circa 7 anni fa
Adam C. · Recensione inserita circa 7 anni fa
Anurag M. · Recensione inserita circa 7 anni fa
The notebook is not loading any kernels and thus the steps regarding the notebook cannot be performed.
Ilias S. · Recensione inserita circa 7 anni fa
I'll be contacting Qwiklabs for a refund on this lab. I had problems like many others before I was able to get any points. This lab has proved to me that I must read the comments first. The spark.read statement is out-of-date per the version of PySpark that they are using in the lab. I tried to research on StackOverflow and other places but could not make that statement work properly, and that is what is supposed to set up the first dataset: Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'property' object has no attribute 'option' If I could award 0 stars, I would.
Lawrence M. · Recensione inserita circa 7 anni fa
Even if I did everything as stated in the lab, I was not able to get the full score. The last part to check the replacement to 00004 in the input did not returned any score
hari s. · Recensione inserita circa 7 anni fa
I finished the lab but it didn't give me the last credit for updating the notebook to reference all_flights_00004
Michael V. · Recensione inserita circa 7 anni fa
joe k. · Recensione inserita circa 7 anni fa
Sakthi Pravin N. · Recensione inserita circa 7 anni fa
Non garantiamo che le recensioni pubblicate provengano da consumatori che hanno acquistato o utilizzato i prodotti. Le recensioni non sono verificate da Google.