Connecting the Spark Yarn Cluster to the Hbase Error


I have an application that parses vcf files and inserts data into hbase. The application runs using local master without problems using the apache spark, but when I run it using the apache spark son cluster, I get a failure by following: 17/03/31 10:

Kerberos issue on Spark in cluster mode (YARN)


I'm using Spark with Kerberos authentication. I can run my code using spark-shell fine and I can also use spark-submit in local mode (eg -master local[16] ). Both work as expected. local mode - spark-submit --class "graphx_sp" --master local[16]

Spark does not use Yarn cluster resources


I'm trying to run a Python script using Spark (1.6.1) on a Hadoop cluster (2.4.2). The cluster has been installed, configured, and managed using Ambari ( I have a group of 4 nodes (each 40Gb HD-8 cores-16Gb RAM). My script uses sklearn lib:

spark-submit yarn-cluster with --jars does not work?


I'm trying to send a spark to the CDH thread cluster via the following commands I tried several combinations and everything does not work ... I now have all the pots of poi located in my local / root, as well as HDFS / user / root / lib, from where I

Spark / Yarn: The file does not exist on HDFS


I have a Hadoop / Yarn cluster configuration on AWS, I have a master and 3 slaves. I checked that I have 3 live nodes running on port 50070 and 8088. I have tested a sparking job in the client deployment mode, everything is working fine. When I try t

PySpark distributed processing on a YARN cluster


I have Spark running on a Cloudera CDH5.3 cluster, using YARN as a resource manager. I develop Spark applications in Python (PySpark). I can submit jobs and they work successfully, but they never seem to work on more than one machine (the local machi

Code working in Spark-Shell and not in eclipse


I have a small Scala code that works correctly on Spark-Shell, but not in Eclipse with the Scala plugin. I can access hdfs using a plugin tried to write another file and it worked .. FirstSpark.scala package bigdata.spark import org.apache.spark.Spar

How to launch the Spark EC2 cluster with Hadoop 2.6


I'm trying to run the Spark EC2 cluster on Spark 1.6.1 with Hadoop 2.6 -Here's what I tried: ./spark-ec2 -i ~/.ssh/***.pem \ --instance-profile-name *** \ -k *** \ --region=us-east-1 \ --instance-type=m3.xlarge \ -s 2 \ --copy-aws-credentials \ launc

Role of the master in the Spark Autonomous Cluster


In a stand-alone Spark cluster, what exactly is the role of the master (a node started with the script)? I understand that this is the node that receives the jobs from the script, but what is its role when processing a j