The hottest qingcloud new dual engine big data ser

  • Detail

Qingyun qingcloud's new dual engine big data service sparkmr was officially launched

in order to enable users to obtain more flexible and efficient big data solutions, Qingyun qingcloud officially launched the "spark" + "Hadoop" dual engine big data platform sparkmr

sparkmr on qingcloud integrates spark and Hadoop MapReduce dual computing engines, provides a unified HDFS data storage engine and yarn scheduling system, and provides users with a new cloud big data processing platform that is flexible, efficient and multi-mode switchable

in the era of big data, data resources are intangible assets of enterprises and one of their core competitiveness. How to manage and analyze data in a unified way with low cost and high efficiency and obtain business decision support has become a difficult problem for enterprises. The big data platform came into being for this demand of enterprises and continued to develop and innovate

Qingyun qingcloud guaranteed the normal use of the experimental machine in August 2015 and launched spark based big data clustering service. In December of the same year, it launched Hadoop clustering service as a powerful supplement to the big data basic platform to meet the different needs of enterprises in the field of big data

however, as spark and Hadoop are two independent services, users need to deploy two sets of HDFS when using these two processing engines at the same time, and two copies of the same data need to be loaded and stored, which is not the best choice in terms of cost or efficiency

at the same time, from the perspective of unified data management, Qingyun qingcloud launched sparkmr on qingcloud, which was delivered to users through the qingcloud AppCenter in the form of cloud applications, and comprehensively upgraded and integrated the spark and Hadoop services of the original big data platform. (sparkmr supports Apache Hadoop 2.7.3 and Apache spark 2.2.0.)

after the combination of spark and Hadoop, in addition to significantly reducing costs, compared with the original big data platform, it also provides richer and more flexible optional configurations. Users can customize node configurations by role (CPU 2~16 cores optional, memory 2~64 GB optional)

in general, sparkmr on qingcloud, as an important component supporting the new dual engine big data platform, has multiple highlights:

flexible computing mode

sparkmr provides a unified HDFS as the data storage engine at the bottom, spark and MapReduce as the two computing engines at the top, and yarn as the scheduling system. Users can easily realize three different calculation modes, namely spark standalone, spark on yarn and MapReduce on yarn. The switching between them has low friction, low damping and good lateral force resistance

powerful computing power

sparkmr in order to facilitate users to develop spark applications, in addition to supporting Java and scala development, it also provides the running environment of Python and r. Among them, python 2 and python 3 of Anaconda distribution are provided for Python users, and switching between these two Python versions is supported. At the same time, several data science packages of Anaconda distribution have been preset for these two Python versions, providing strong computing power support for AI development scenarios such as data science and machine learning/deep learning

convenient integration capability

sparkmr supports the function of specifying dependent services, that is, it realizes automatic and seamless integration with other big data analysis components through the native application awareness mechanism within the AppCenter 2.0 framework

sparkmr and qingstor? The object storage platform is also preset and integrated. Users can start the qingstor? Object storage support to deal with the storage problem of massive and large-scale data

good scheduling strategy

sparkmr provides the function of customized scheduler of spark and yarn. Users can customize the resource scheduling strategy in the cluster according to their actual needs, giving users more refined management ability in the multi tenant use scenario

simple service customization

sparkmr provides nearly 60 configuration parameters through the console. Users can complete the cluster deployment and personalized customization of services through the UI operation of the console. For example, users can complete the function of setting Hadoop proxy users through the UI

configuration parameter page

sparkmr's client node also realizes complete automatic configuration, and users no longer need to create and manually configure bigdata client or spark client. This means that after the configuration and service customization of the console are completed, the user can start to perform computing tasks when the deployment is completed, which truly realizes one click deployment and immediate use

perfect service monitoring

node monitoring information

they will cause large cutting force, centralized wear and workpiece damage.

sparkmr provides a perfect service level and role-based monitoring ability. Users can not only see the regular resource layer monitoring, but also clearly understand the operation of the overall service through visualization. At the same time, based on service monitoring, it also provides functions such as monitoring alarm, health check and automatic service recovery

in the future, sparkmr application will gradually replace the existing spark and Hadoop services, and provide users with more powerful and convenient big data basic component services

scan the QR code below to get more instructions for sparkmr

launch big data service at the same time

HBase is an open-source, distributed, multi version, column stored NoSQL database. The construction period of the distributed file system project relying on Hadoop is expected to be 10 months. As the underlying storage, HDFS can provide random and real-time read and write access to billions of rows and millions of columns of massive data tables

hbase on qingcloud provides native Apache HBase 1.2.6 distribution, and HDFS uses native Apache Hadoop 2.7.3 distribution. The services provided include: HBase database service, HDFS distributed file system, Phoenix query engine, HBase restful service, HBase thrift service, and gzip, bzip2, LZO, and snappy are supported in compression format

mongodb is a scalable, high-performance, high availability, open source document database, which is the most popular NoSQL database product at present. In many scenarios, it is used to replace the traditional relational database or key value pair storage method, and its MapReduce function can also be used for data analysis. Mongodb natively supports replica sets and sharding, which can store massive amounts of data at the same time of high availability. Thanks to its schema free feature, developers can quickly develop iteratively and flexibly respond to business changes

mongodb on qingcloud provides native mongodb replication cloud services, providing redundancy and increasing the high availability of data

Copyright © 2011 JIN SHI