Abstract- can deal with structured semi structured and unstructured

Abstract- The rapid growth
in data volume, complexity, variety and velocity of data in organizations, need
for handling unstructured data is increasing continuously.  NoSQL databases are well suited in dealing
with big data applications.  The enormous
amount of data generated on web is highly unstructured in nature.  Relational database are designed to manage
structured data and is not capable of managing unstructured data and high data
volume.  This paper presents comparative analysis of an
Oracle Database and NoSQL document oriented database management system –
MongoDB.  The comparison depicts key
features, theoretical differences, restrictions and focuses on basic CRUD operations in MogoDB

 

Key Words- Big data, NoSQL, MongoDB,
RDBMS, crud

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 

I.       
Introduction

The term NoSQL was first introduced by
Carlo Strozzi in year 1998.  NoSQL stands
for “Not Only SQL”.  The rapid growth of data
and having massive amount of data that comes out every day from the web and business
applications become hard to handle for RDBMS. 
This has added interest to alternatives to RDBMS.  NoSQL databases are defined as distributed,
horizontally scalable and open source. 5

 

Relational database management systems
define fixed schema and data is inserted strictly according to schema.  NoSQL databases are built to allow the
insertion of data without predefined schema, which makes it easy to make
significant application changes in real time and makes development faster.  NoSQL databases are high performance,
scalable systems 1.  It is difficult to
handle both the size of data and concurrent actions on data within standard
RDBMS.  Some of the reasons to employ
NoSQL technique are scalability, high availability; distribute architecture
support, flexible schema, varied data structure, fault tolerance and
consistency. 

 

MongoDB is an open source project held
by the 10gen.company. It is a document-oriented, schema-less database, which
stores data in BSON (Binary JSON) format. 
MongoDB can deal with structured semi structured and unstructured data
unlike RDBMS. MongoDB documents can vary in structure. Fields can vary from
document to document. Similar documents are stored in collections. Here, collection
corresponds to a table and document corresponds to a record.
MongoDB can add, remove or change a field for a document without affecting
other documents in the same collection. This saves the expensive ALTER table
operations that can lead to redesigning the entire set of schemas and the
migration of existing database to the new schema.

 

MongoDB documents hold all data for a
given record in a single document as against relational databases where data
for a single record is spread across different tables. Therefore data in
MongoDB is more localized, which reduces the need to JOIN separate tables 3.
Joins are avoided in MongoDB by embedding documents within the document. The
result is increased performance and scalability as a single read to the
database can retrieve the entire document. MongoDB also provides horizontal
scalability by a technique called Auto sharding and therefore chances of any
node failure are almost nil. Most of the research studies reveal that MongoDB
is much faster than MS SQL in writing (inserts/updates) and reading (retrieval)
1

 

II .   No SQL Databases (Classification)

 

NoSQL
databases are classified as6 –

i.                    
Document
oriented store

ii.                   
Key-value
store

iii.                 
Column
oriented store

iv.                 
Graph
oriented store

 

A.
Document-Oriented

Document-Oriented stores are like
Key-Value stores with the distinction that values are visible and may be
queried. Data formats like JSON or XML are used to store document-oriented
datasets. Document stores give versatile schema therefore there’s no restriction
for documents to possess a similar information or schema. In contrast to
Key-Value store, it offers the indexing and querying based on values.  These databases store their data in form of
documents within the databases. Here the documents are recognized by a unique
set of keys and values that are almost same as there in the Key value
databases. Document Stores Databases are schema free and are variable in
nature.614

 

Other characteristics of
Document-Oriented stores are horizontal scalability and sharding across the
cluster nodes. Examples of some Document- oriented stores are MongoDB, Amazon
DynamoDB, CouchDB, CouchBase, MarkLogic, OrientDB, Rethink DB, Cloudant,
RavenDB and Microsoft Azure DocumentDB 6.

 

B.
Key-Value

Key-Value Stores is a combination of 2
entities: Key and Values. it’s one of the traditional databases that has given
birth to all the other databases of NoSQL. it has a concrete application
programming interface (API) and permits its users to store data in a schemaless
manner. The stored is in 2 parts: key is a unique identifier to a particular
data entry. Key shouldn’t be repeated if one used that it’s not duplicate in
nature. Value is a kind of data that is pointed by a key. 14

 

Key-Value store is the least complex
storage paradigm amongst NoSQL databases. Key-Value Stores give best
performance on basic CRUD (Create, Read, Update and Delete) operations. They
additionally offer scalability and sharding across cluster nodes. Sharding is a
horizontal partitioning technique used to partition great deal of data into
smaller and easily manageable parts/shards. However, Key-Value databases are
less flexible for querying and indexing complex and connected data. Queries for
this category are sometimes based on keys instead of values. Examples of some
Key- value stores are Redis, Memcached, Riak KV, Hazelcast, Ehcached, OrientDB,
Aerospike, Amazon simple db etc.6

 

C.
Column-Oriented

Column oriented databases are also
referred as column family databases. Column oriented stores are feasible once
there is a necessity to handle distributed and huge quantity of data. Column
stores in NoSQL are primarily hybrid row/column store unlike pure relational
column databases. Although it makes use of the columnar extensions but rather
storing data in the tables it stores them in extensively distributed
architecture. Columns are grouped according to the relationship of data. In
column stores, each key is related to one or more attributes (columns). A
Column oriented data storestores its data in such a fashion that it can be aggregated rapidly with
less I/O activity. It focuses on high scalability in data storage. the data is
stored in the sorted sequence of the column family.

 

In the
comparison of row oriented databases, column oriented databases have better
capabilities to manage data and storage space. Horizontal scalability is one in
every of its trending characteristics. Some distinguished examples of column
oriented databases include bloging and event logging etc. examples of
column-oriented stores are Hbase, Accumulo, Hypertable, Google Cloud Bigtable,
Sqrrl, ScyllaDB, MapR-DB614

 

D. Graph-Oriented

Graph
databases evolved from the Graph Theory that is designed to represent entities
and their relationships as nodes and edges respectively. The graph consists of
nodes and edges, where nodes act as the objects and edges act as the
relationship between the objects. Graph databases replace relational tables
with structured relational graphs of interconnected key-value pairings. The
graph also consists of properties related to nodes. It uses a technique
referred to as index free adjacency i.e. each node consists of an immediate
pointer that points to the adjacent node. millions of records can be traversed
using this technique. in a graph database, focus is on the relation established
between data using pointers. Graph databases provides schema less and efficient
storage of semi structured data. The queries are expressed as traversals, thus
creating graph databases quicker than relational databases. it is easy to scale
and whiteboard friendly. Graph databases support ACID axiom and support
rollback14.  As graphs have an
expressive power and strong modeling characteristics therefore each situation
from the real world are often represented as graphs and it is possible to model
in graph database as well. Graph data can be queried more efficient as a result
of intensive joins don’t seem to be essentially needed in graph query
languages. 6

Fig. 1 NoSQL database types

III. COMPARISON -ORACLE AND MONGODB

MongoDB may be a NoSQL management system
discharged in 2009. It stores information as JSON-like documents with dynamic
schemas (the format is named BSON).  
NoSQL may be a category of management system totally different from the
normal relative informationbases therein data isn’t keep victimization mounted
table schemas. primarily its purpose is to function information system for
Brobdingnagian web-scale applications wherever they vanquish ancient relative
databases

MongoDB focussed on four factors: flexibility,
power, speed and simple use.  It supports
classification and it offers multiple programming languages drivers.
information model for MongoDB is schemaless document oreinted wherease Oracle
information supports relative model. Oracle databases possesses a standarnd
search language SQl whereas MongoDB supports API calls.

MongoDB has aggregation functions. A
intrinsic  map-reduce operate are often
wont to mixture giant amounts of information. 
MongoDB accepts larger information. The Oracle information supports most
price size 4KB whereas MongoDB has most price size sixteen MB.  The integrity model utilized by Oracle
information is ACID, whereas MongoDB uses BASE. MongoDB offers consistency,
sturdiness and conditional atomicity. Oracle information provides integrity
options that MongoDB does not offer like: isolation, transactions, denotive
integrity and revision management.  In
manners of distribution each MongoDB and Oracle information ar horizontal
climbable and have support for information replication. whereas MongoDB offers
sharing support, Oracle information does not. each MongoDB and Oracle
information ar cross platform management systems. Oracle information was
written in C++, C and Java, whereas MongoDB was written in C++. MongoDB may be
a software system product, whereas licencence is required to use Oracle
databases.  17.

A.       FEATURES
OF MONGODB

•       MongoDB provides high performance.

•       Has made query language, support all major
CRUD operations, and provides Aggregation options.

•       MongoDB provides High accessibility with
auto- Replication feature. Data is restored through backup (replica) just in
case failure of server.

•       Provides automatic failover mechanism

•       Sharding is major feature due to that
horizontal scalbility is possible.

•       A record in MongoDB may be a document

•      
Holds
collections of documents

B.            ADVANTAGES
OF MONGODB

•       MongoDB simple and extremely easy to install
and setup.

•       MongoDB provides schema-less structure.

•       The document query language supported by
MongoDB plays a significant role in supporting dynamic queries.

•       Very easy to scale.

•       In MongoDB no complex joins are required.
Because data kept in BSON format – key value pair method.

•       It uses internal memory for storage of data due
to this quicker access of data is possible in MongoDB.

•       In MongoDB improvement in performance are
often done easily compared to any relational databases.

•       No need of mapping
the application objects to the data objects.

•      
MongoDB
support Sharding ends up in the horizontal scaling. relative databases support
vertical scaling.

 

Table 1 Comparison of MongoDB and Oracle 14

Key Feature

Oracle

MongoDB

Data Model

Data
Stores in form of tables.  Follow fixed
schema structure.

Follow
Document based model for representing the data. It is schema less and can
handle unstructured data efficiently

Scalability

Providing
both vertical as  well as horizontal scalability

Provide
an effective horizontal scalability

Transaction reliability

follow
ACID rule hence are more reliable

follow
BASE rule

Complexity

More
Complex

Less Complex

Security

Very secure
mechanism

Less Secure

Crash Recovery

Ensure
crash recovery through its ACID properties

depends
on replication as back up to recover from crash.

Cloud

Not
suitable for cloud applications

Suitable for cloud applications

Big Data Handling

Unable
to handle big data problem

Designed
to deal with the Big Data problem effectively.

 

IV . Crud Operations

 

This
section focuses on the basic operations of CRUD. Two databases, one using
Oracle and one in MongoDB are created to compare the way that data will be
created, selected, inserted and deleted in both databases 21.  MongoDB is a fast
responding database management system. If you want a simple database that will
respond very fast, MongoDB is best choice. 
MongoDB support all major CRUD operations, and provides Aggregation
features.  Following are the major CRUD
operations – 

 

Table 2 CRUD Operations

Operations

Oracle

MongoDB

Create Table

CREATE TABLE Accounts (first_name`
VARCHAR(64) NULL , `last_name` VARCHAR(45) NULL , PRIMARY KEY (`id`) );

db.accounts.insert({
name:”abc”, age:26, address:”indore”})

Delete
a Table

Drop table accounts;

db.accounts.drop()

Insert

Insert into accounts( name, age,
address ) VALUES ( “abc”, 26, “indore”)

db.accounts.insert({
name:”abc”, age:26, address:”indore”})

Select

Select * from accounts

db.accounts.find()

Select fields

Select first_name, last_name  from accounts

db.accounts.find({ }, {
first_name: 1, last_name: 1 })

Conditional Select

Select * from Accounts where dep_wid=”D”
and balance>5000

db.accounts.find({dep_wid:”d”,
balance:{$gt:5000}})

Ordered Select ascending

Select * from accounts order
by user_id asc

db.accounts.find({}).sort({user_id
: 1})

Ordered Select descending

Select * from accounts order
by user_id desc

db.accounts.find({}).sort({user_id:
-1 })

Select with count

Select count(*) from users

db.articles.count()

Update

update table student set
section=”F”  where marks<30; db.Student.update({marks:{lt:30}}, {$set:{Section:"F"}}) Delete delete from Student db.Student.remove( ) Delete with condition delete from Student where section="a" db.student.delete({section:"a"})   V . Related Work             Several database technologies were developed to handle the present explosive growth of data. Many NoSQL databases evolved over time like Mongo DB, Cassandra, Hbase, Couch base etc for dealing huge unstructured data. This paper analyzes the deployment of MongoDB- a popular NoSQL database in different industrial application areas for the better understanding of its scope and to explore the reasons for employing MongoDB.  Unstructured big data related web or mobile application that requires horizontal scaling and which needs fast and rich querying capabilities, MongoDB is the mostly preferred NoSQL database.1 As the number of records in document database increases, the difference between the execution time taken by different databases for the computation of different database operations is what we are looking for.  For the data retrieval operation, data updation, data creation operation and data deletion the performance of which NoSQL document database is better for the different numbers of records or as the number of records increases.  So far relational databases are used for storing the data for the applications but now there is need to store huge amount of data to store and manage which cannot stored by relational databases. NoSQL technology over comes this problem. The operations are performed to explore the results as distinguish between both NoSql databases. The study shows the performance of Mongodb and CouchDB. Results prove that CouchDB is more powerful than Mongodb to load and process on big data and processing very fast as compare to Mongodb. 2   NoSQL systems are relatively new and most of them implement their own query language or interface. Developers need to learn to use these constructs. If a company needs to train its employees a new technology this also adds to the costs of the database system. Eventually a query language for NoSQL data stores.  One should carefully research if NoSQL database are reasonable to use in his application scenario. However, there is no sign of NoSQL databases disappearing. In any case we therefore need to carefully monitor these systems, as they will become more mature and will surpass traditional relational database systems in even more domains. Because of the vast amount of available NoSQL data stores there will be some consolidation in the market eventually.413   Developers have to evaluate their data in order to identify a suitable data model to avoid unnecessary complexity due to transformation or mapping tasks. Queries which should be supported by the database have to be considered at the same time, because these requirements massively influence the design of the data model. Since no common query language is available, every store differs in its supported query feature set. Afterwards, developers have to trade between high performance through partitioning and load balanced replica servers, high availability supported by asynchronous replication and strict consistency. If partitioning is required, the selection of different partition strategies depends on the supported queries and cluster complexity. Beside these different requirements, also durability mechanism, community support and useful features like versioning influence the database selection. In general, key value stores should be used for very fast and simple operations, document stores offer a flexible data model with great query possibilities, column family stores are suitable for very large datasets which have to be scaled at large size, and graph databases should be used in domains, where entities are as important as the relationships between them.8   NoSQL databases are database management system which uses few or no SQL commands to query, store and delete data.  They are used for situations on which traditional relational database managements were not designed for, such as horizontal scaling and storing large amount of complex objects, which are difficult to store on tables.  Nasal has some advantages to be used for large amount of data.  Nasal may be good option applications which deal to large transactions to persist complex data objects. 7   NoSQL databases different in many aspects from traditional databases like structured schema, transactions methodology, complexity, crash recovery and dealing with storing big data which the feature lead to use NoSQL in cloud computing and may be data warehouse.  NoSQL has shortage in security mainly because their designer focuses on other purposes than security and generally the NoSQL databases solution still fresh it didn't reach the full maturity yet, for all that we can find many security vulnerabilities in it.1215   VI.    Conclusion This paper explores NoSQL databases, its types, key features and need. Comparing these with relational databases and list various advantages and features of NoSQL databases. Also the comparative study of Oracle Database and NoSQL MongoDB has been presented.   Basic CRUD operations in MogoDB and Oracle are being analyzed.   VII.             Future Work MongoDB is well suited for big data applications and also satisfying the needs of this digital world, but still lacks maturity compared to relational databases. Relational Databases have a standard development process.  NoSQL lacks standard development methodology. In future there is an exigent need of investigating development methodologies for NoSQL databases also.