WINTER INDUSTRY FOCUSED TRAINING 2016 – 2017

MAKING INDIA, VIRTUAL READY – CLOUD !!
(Exclusively for our LinuxWorld Students only)
————————–————————–
Completely Designed & Delivered by – Mr Vimal Daga – Inculcating Actual need of corporate (Use Cases)

making-india
Utilise your Winter Vacations at right place under the right guidance of Mr. Vimal Daga

Training Content : RedHat Linux + Python + Cloud Computing + Dockers + DevOps + Splunk

Fee – INR 3,500 + Service Tax

Old Cloud Computing Students – Free of Cost

To know more : http://bit.ly/2fkec2k

Lets Join Hands Together for our very own Mission – Stand for What’s Best – Be Hero of your Life !!

Admin Contact: +91 9351009002
Email: training@linuxworldindia.org
LinuxWorld Informatics Pvt. Ltd. has Initiated Winter Training Program for a span of 4 Weeks / 6 Weeks / 8 Weeks wherein you would be learning the most demanding technology in the market.

CCNA Winter Training in Jaipur

CISCO Certification

If you are student or employee in I.T. industry and you want to secure your job regarding career than CISCO certifications are best option for you. CISCO is the top organization in networking field, CISCO provide higher level training and advance certification components and modules, which train I.T. experts on several phases of networking and interrelated technologies. Be a CISCO expert, you have to clear all three levels of certifications – the Associate/starting Level (CCNA), the Professional Level (CCNP) and the Expert Level (CCDP).

cisco

CCNA

CCNA certification program is in high demand because CCNA certifications show that you have knowledge, expertise and coaching with a today’s preferred networking technology and hardware. CCNA is capable of installing, configuring, managing and implementing connections to remote site using WAN. CCNA have also good knowledge of routers and switches, network security issues, wireless system networking model and processes of computer network.

We make you able to identify and utilize the basics of OSI layers model and the basic fundamental principles used in network design, work on routers and routing protocols, IPv6 basics, bridges, switches, Knowledge on Ethernet and knowledge on TCP/IP protocols, Troubleshoot IP connectivity.

ccna_networking1

CISCO certifications mention in your resume prove that you have excellent technical knowledge, have up-to-date data network skills and qualified for top IT Company. CISCO Certifications are never easy to clear but with excellent professional guidance and well experienced expert instructor you can get CISCO certifications easily by passing set of exams.

LinuxWorld Informatics is a Training & Development Centre: An ISO 9001:2008 Certified Organization that’s offer Jobs Oriented Training program. The program is the most effective aid in enabling our trainees to relate technical theory to practice and also provides end to end training solutions to all engineering undergraduates’ students. During the winter training, the students will have the opportunity to participate in the dynamic industrial environment.

Bigdata Hadoop Training in Jaipur

Todays, lots of technologies are used to manage large number of database and one of the most successful technologies is Big data Hadoop to create, control and maintain data, because it can analyze, regulate and manipulate data with excellent accuracy. The major benefit of Big data Hadoop is quantity means it can produce and store data in large quantity, store different kind of data, data analysis ability, eliminate redundancy and inconsistency, and capture data with good quality and accuracy.

big-data-definition4

Hadoop is open source software platform that has the ability of giant processing storage and has the ability to control database synchronized jobs. Apache Hadoop software is supreme platform for developing gigantic data websites and Bigdata technology is closely connect with huge data base management; we clubbed together both technologies for teaching purpose for computer science engineering students.

BigData3

Apache Hadoop is very popular technologies in these days because world most renowned websites like Google, Facebook, Yahoo, Amazon etc. using this technology, it will be also interesting to learn how Bigdata technology combine with Apache Hadoop. Bigdata hadoop is mainly use in Internet searching indexing, medical records, military surveillance, social networking sites, government projects etc…

Big data Hadoop Training

LinuxWorld Informatics Pvt. Ltd. is providing Training in Big data Hadoop. The major advantage of this training from LinuxWorld Informatics Pvt. Ltd. is students get to learn from the basic steps of technology, and also work on live project in supervision of our well experienced industrial professionals. Live project is very important aspect for students because they learn how to solve technical problems.

LinuxWorld Informatics Pvt. Ltd. is providing an opportunity for students to improve their technical skills: Training offers a practical and professional environment where they can develop soft skills.

Article source == http://www.bigdatahadoop.info/bigdata-hadoop-training-in-jaipur/

Study reveals that most companies are failing at big data

Research from PwC and Iron Mountain reports some surprising statistics about how companies are using the data they collect.

Enterprises like to talk about data, but what you may not hear as often is how they are actually exploiting the data they collect. According to a report entitled “How organizations can unlock value and insight from the information they hold,” from Pricewaterhouse Coopers (PwC) and Iron Mountain, companies have a lot of progress to make before they start making better use of the data.
JavaScript: The Good Parts
Free course: “JavaScript: The Good Parts”

The study surveyed 1,800 senior business leaders in North America and Europe at mid-sized companies with more than 250 employees and enterprise-level organizations with over 2,500 employees. And the results were surprising, — only a small percentage of companies reported effective data management practices.

“Data is the lifeblood of the digital economy, it can give insight, inform decisions and deepen relationships,” according to Richard Petley, director of PwC Risk and Assurance. “It can be bought, sold, shared and even stolen — all things that suggest that data has value. Yet when we conducted our research very few organizations can attribute a value and, more concerning, many do not yet have the capabilities we would expect to manage, protect and extract that value.”
Businesses lack data strategies

The study found that while 75 percent of business leaders from companies of all sizes, locations and sectors feel they’re “making the most of their information assets,” in reality, only 4 percent are set up for success. Overall, 43 percent of companies surveyed “obtain little tangible benefit from their information,” while 23 percent “derive no benefit whatsoever,” according to the study.

That means three quarters of organizations surveyed lack the skills and technology to use their data to gain an edge on competitors. Even further, three out of four companies haven’t employed a data analyst, and out of companies that do, only one quarter are using these employees competently, according to the survey.

It’s not just a problem for tech companies. This lack of data understanding spans across manufacturing and engineering, pharmaceuticals, financial services, legal services, insurance, energy and healthcare. Using the data, PwC was able to create what it calls an Information Value Index, which measures how well businesses use the information they collect and how much value they derive from data.

Derived from a sample of 1,650 businesses that responded to 36 survey questions, the Information Value Index gives businesses a score from 0 to 100, with 100 being the best use of data possible. This index evaluates a company’s general awareness and understanding of the importance of data, how aligned the company is with data driven goals, the skills and tools used to gain value from data and overall benefits the company has gained from tapping into data. Mid-market companies earned an average score of 48.8, while enterprise businesses earned an average score of 52.6; combined, the overall score for all companies surveyed came in a just over 50.

Petley concludes that “data is so pervasive that it is taken for granted or is seen as a by-product. Often it is only when disaster strikes that this assumption is broken.” Alternatively, some companies see data as the responsibility of IT and data architects, rather than an important resource that should be employed across the company. And that’s an important shift to make; the idea that data isn’t just a problem for IT, but rather a valuable asset that reaches far beyond the technical side of the business.

Data strategy is the biggest resource in gaining a competitive edge against other companies, according to the study. By ignoring data or treating it as unimportant, business leaders do their companies a huge disservice when it comes to staying ahead of the game.

“The essence of analytics is for business units, marketing, emerging business offices, etc. to determine what they want to learn from the data and then use the records information management team, IT, data analysts and scientists to identify data sources, understand access controls, execute the analysis, and deliver the results in a user friendly, typically visual, mode,” says Sue Trombley, managing Ddirector of Thought Leadership at Iron Mountain.

Businesses might be investing significant money into data capture, but then drop the ball when it comes time to actually use that data. Instead, business leaders need to focus on figuring out how to take the data and boil it down into easily digestible formats for internal use. It’s all about “having a strategy for data management,” says Trombley. The first step, he says, is to identify data sources, then understand the importance of analytics to every department and, finally, create a plan to stay competitive.

And the data suggests businesses aren’t aware of the untapped resource they have in stored data. The study found that 16 percent of business leaders reported that they didn’t believe their organizations knew what data it had, 23 percent said they didn’t know how data transferred through their businesses or where it could be used best, and 20 percent didn’t know where their data was most vulnerable.

Not surprisingly, Trombley says that a quarter of C-suite executives report not seeing any value from data around decision-making, product development, cost savings or customer acquisition and retention. But that’s because they simply haven’t invested in a strategy. Analytics is quickly becoming one of the most valuable resources to a successful business, and every company will need a unique and tailored plan for managing data.
Should your company hire a data scientist?

Before rushing to hire a data scientist or building an entire department dedicated to analytics, business leaders need to first sit down and figure out what they want to achieve with analytics, according to Trombley. Every company’s needs are different and the best data strategy will depend on the overall mission and goals of the business. That means, your business might not necessarily need an entire department dedicated to data and instead tap into the skillsets of your current employees.

“Companies with less sophisticated analytics requirements may be able to fill the skills gap using existing employees by sending them to focused training sessions such as data analytics boot camps [and] night courses,” says Trombley.

For some companies, this might be the best option, considering there is currently a lack of capable data scientists since it is a relatively new and fast growing position. Simply getting an employee up to speed can help lessen the impact of a lacking data strategy, but it still might not be enough.

“There is no one-size-fits-all regarding the CDO [Chief Digital Officer] position,” Trombley says. “Whether they exist or not the basic responsibilities attributed to the role need to be assigned to one or more individuals in the current organizational structure. Also, there is the supply and demand dilemma — not enough talent available to fill the CDO position in all organizations.”
Characteristics of the ‘data elite’

Of the businesses surveyed, only 4 percent were classified as “data elite,” with a typical business profile of medium or very large businesses within healthcare and manufacturing and engineering. These businesses, according to the study, first and foremost had a well-established “information governance oversight body.” Furthermore, these businesses had fostered a “strong culture of evidence-based decision making,” appointed analysts that can access data, had strong control over their data and had extensive analysis tools in place.

These progressive companies have tapped into the most valuable resource available to them and made it part of the company culture. Some of the most agile mid-market businesses are found in this category, which the study suggests is because they aren’t bogged down by legacy and are in industries that are less regulated than others. However, less agile enterprises businesses are also found in the data elite, thanks to strong leadership, global information governance arrangements and relevant departments outside of IT in the data functions.

This story, “Study reveals that most companies are failing at big data” was originally published by CIO.

How big data is changing the database landscape for good

Mention the word “database,” and most people think of the venerable RDBMS that has dominated the landscape for more than 30 years. That, however, may soon change.
JavaScript: The Good Parts
Free course: “JavaScript: The Good Parts”

What better time to sharpen your JavaScript skills? And for free!
Read Now

A whole crop of new contenders are now vying for a piece of this key enterprise market, and while their approaches are diverse, most share one thing in common: a razor-sharp focus on big data.

Much of what’s driving this new proliferation of alternatives is what’s commonly referred to as the “three V’s” underlying big data: volume, velocity and variety.
[ Also on ITworld: Bracing for big data: Preparing your data center for rapid change. Don’t miss a thing! Sign up for ITworld’s daily newsletter. ]

Essentially, data today is coming at us faster and in greater volumes than ever before; it’s also more diverse. It’s a new data world, in other words, and traditional relational database management systems weren’t really designed for it.

“Basically, they cannot scale to big, or fast, or diverse data,” said Gregory Piatetsky-Shapiro, president of KDnuggets, an analytics and data-science consultancy.

That’s what Harte Hanks recently found. Up until 2013 or so, the marketing services agency was using a combination of different databases including Microsoft SQL Server and Oracle Real Application Clusters (RAC).

“We were noticing that with the growth of data over time, our systems couldn’t process the information fast enough,” said Sean Iannuzzi, the company’s head of technology and development. “If you keep buying servers, you can only keep going so far. We wanted to make sure we had a platform that could scale outward.”

Minimizing disruption was a key goal, Iannuzzi said, so “we couldn’t just switch to Hadoop.”

Instead, it chose Splice Machine, which essentially puts a full SQL database on top of the popular Hadoop big-data platform and allows existing applications to connect with it, he said.

Harte Hanks is now in the early stages of implementation, but it’s already seeing benefits, Iannuzzi said, including improved fault tolerance, high availability, redundancy, stability and “performance gains overall.”

There’s a sort of perfect storm propelling the emergence of new database technologies, said Carl Olofson, a research vice president with IDC.

First, “the equipment we’re using is much more capable of handling large data collections quickly and flexibly than in the past,” Olofson noted.

In the old days, such collections “pretty much had to be put on spinning disk” and the data had to be structured in a particular way, he explained.

Now there’s 64-bit addressability, making it possible to set up larger memory spaces, as well as much faster networks and the ability to string multiple computers together to act as single, large databases.

“Those things have opened up possibilities that weren’t available before,” Olofson said.

Workloads, meanwhile, have also changed. Whereas 10 years ago websites were largely static, for example, today we have live Web service environments and interactive shopping experiences. That, in turn, demands new levels of scalability, he said.
Resources

eGuide
Sponsored
Mitigating Security Threats With Big Data
White Paper
IDC MarketScape: European Enterprise Social Networks in 2014

See All

Companies are using data in new ways as well. Whereas traditionally most of our focus was on processing transactions — recording how much we sold, for instance, and storing that data in place where it could be analyzed — today we’re doing more.

Application state management is one example.

Say you’re playing an online game. The technology must record each session you have with the system and connect them together to present a continuous experience, even if you switch devices or the various moves you make are processed by different servers, Olofson explained.

That data must be made persistent so that companies can analyze questions such as “why no one ever crosses the crystal room,” for example. In an online shopping context, a counterpart might be why more people aren’t buying a particular brand of shoe after they click on the color choices.

“Before, we weren’t trying to solve those problems, or — if we were — we were trying to squeeze them into a box that didn’t quite fit,” Olofson said.

Hadoop is a heavyweight among today’s new contenders. Though it’s not a database per se, it’s grown to fill a key role for companies tackling big data. Essentially, Hadoop is a data-centric platform for running highly parallelized applications, and it’s very scalable.

By allowing companies to scale “out” in distributed fashion rather than scaling “up” via additional expensive servers, “it makes it possible to very cheaply put together a large data collection and then see what you’ve got,” Olofson said.

Among other new RDBMS alternatives are the NoSQL family of offerings, including MongoDB — currently the fourth most popular database management system, according to DB-Engines — and MarkLogic.
The hit list

Computerworld holiday gift guide 2015
Computerworld’s holiday gift guide 2015 (with video!)
How to open specific browsers using hyperlinks
IDG Contributor Network
How to open specific web browsers using hyperlinks
skype for business desktop sharing
Microsoft’s new premium Office 365 subscription for businesses is here

“Relational has been a great technology for 30 years, but it was built in a different era with different technological constraints and different market needs,” said Joe Pasqua, MarkLogic’s executive vice president for products.

Big data is not homogeneous, he said, yet in many traditional technologies, that’s still a fundamental requirement.

“Imagine the only program you had on your laptop was Excel,” Pasqua said. “Imagine you want to keep track of network of friends — or you’re writing a contract. Those don’t fit into rows and columns.”

Combining data sets can be particularly tricky.

“Relational says that before you bring all these data sets together, you have to decide how you’re going to line up all the columns,” he added. “We can take in any format or structure and start using it immediately.”

NoSQL databases don’t use a relational data model, and they typically have no SQL interface. Whereas many NoSQL stores compromise consistency in favor of speed and other factors, MarkLogic pitches its own offering as a more consistency-minded option tailored for enterprises.

There’s considerable growth in store for the NoSQL market, according to Market Research Media, but not everyone thinks it’s the right approach — at least, not in all cases.

NoSQL systems “solved many problems with their scale-out architecture, but they threw out SQL,” said Monte Zweben, Splice Machine’s CEO. That, in turn, poses a problem for existing code.

Splice Machine is an example of a different class of alternatives known as NewSQL — another category expecting strong growth in the years ahead.

“Our philosophy is to keep the SQL but add the scale-out architecture,” Zweben said. “It’s time for something new, but we’re trying to make it so people don’t have to rewrite their stuff.”

Deep Information Sciences has also chosen to stick with SQL, but it takes yet another approach.

The company’s DeepSQL database uses the same application programming interface (API) and relational model as MySQL, meaning that no application changes are required in order to use it. But it addresses data in a different way, using machine learning.

DeepSQL can automatically adapt for physical, virtual or cloud hosts using any workload combination, the company says, thereby eliminating the need for manual database optimization.

Among the results are greatly increased performance as well as the ability to scale “into the hundreds of billions of rows,” said Chad Jones, the company’s chief strategy officer.

An altogether different approach comes from Algebraix Data, which says it has developed the first truly mathematical foundation for data.

Whereas computer hardware is modeled mathematically before it’s built, that’s not the case with software, said Algebraix CEO Charles Silver.

“Software, and especially data, has never been built on a mathematical foundation,” he said. “Software has largely been a matter of linguistics.”

Following five years of R&D, Algebraix has created what it calls an “algebra of data” that taps mathematical set theory for “a universal language of data,” Silver said.

“The dirty little secret of big data is that data still sits in little silos that don’t mesh with other data,” Silver explained. “We’ve proven it can all be represented mathematically, so it all integrates.”

Equipped with a platform built on that foundation, Algebraix now offers companies business analytics as a service. Improved performance, capacity and speed are all among the benefits Algebraix promises.

Time will tell which new contenders succeed and which do not, but in the meantime, longtime leaders such as Oracle aren’t exactly standing still.

“Software is a very fashion-conscious industry,” said Andrew Mendelsohn, executive vice president for Oracle Database Server Technologies. “Things often go from popular to unpopular and back to popular again.”

Many of today’s startups are “bringing back the same old stuff with a little polish or spin on it,” he said. “It’s a new generation of kids coming out of school and reinventing things.”

SQL is “the only language that lets business analysts ask questions and get answers — they don’t have to be programmers,” Mendelsohn said. “The big market will always be relational.”

As for new types of data, relational database products evolved to support unstructured data back in the 1990s, he said. In 2013, Oracle’s namesake database added support for JSON (JavaScript Object Notation) in version 12c.

Rather than a need for a different kind of database, it’s more a shift in business model that’s driving change in the industry, Mendelsohn said.

“The cloud is where everybody is going, and it’s going to disrupt these little guys,” he said. “The big guys are all on the cloud already, so where is there room for these little guys?

“Are they going to go on Amazon’s cloud and compete with Amazon?” he added. “That’s going to be hard.”

Oracle has “the broadest spectrum of cloud services,” Mendelsohn said. “We’re feeling good about where we’re positioned today.”

Rick Greenwald, a research director with Gartner, is inclined to take a similar view.

“The newer alternatives are not as fully functional and robust as traditional RDBMSes,” Greenwald said. “Some use cases can be addressed with the new contenders, but not all, and not with one technology.”

Looking ahead, Greenwald expects traditional RDBMS vendors to feel increasing price pressure, and to add new functionality to their products. “Some will freely bring new contenders into their overall ecosystem of data management,” he said.

As for the new guys, a few will survive, he predicted, but “many will either be acquired or run out of funding.”

Today’s new technologies don’t represent the end of traditional RDBMSes, “which are rapidly evolving themselves,” agreed IDC’s Olofson. “The RDBMS is needed for well-defined data — there’s always going to be a role for that.”

But there will also be a role for some of the newer contenders, he said, particularly as the Internet of Things and emerging technologies such as Non-Volatile Dual In-line Memory Module (NVDIMM) take hold.

There will be numerous problems requiring numerous solutions, Olofson added. “There’s plenty of interesting stuff to go around.”

IBM Strengthens Effort to Support Open Source Spark for Machine Learning

Spark 300x251 IBM Strengthens Effort to Support Open Source Spark for Machine Learning

IBM is providing substantial resources to the Apache Software Foundation’s Spark project to prepare the platform for machine learning tasks, like pattern recognition and classification of objects. The company plans to offer Bluemix Spark as a service and has dedicated 3,500 researchers and developers to assist in its preservation and further development.

In 2009, AMPLab of the University of Berkeley developed the Spark framework that went open source a year later as an Apache project. This framework, which runs on a server cluster, can process data up to 100 times faster than Hadoop MapReduce. Given that the data and analyzes are embedded in the corporate structure and society – from applications to the Internet of Things (IoT) – Spark provides essential advancements in large-scale data processing.

First, it significantly improves the performance of applications dependent data. Then it radically simplifies the development process of intelligence, which are supplied by the data. Specifically, in its effort to accelerate innovation on Spark ecosystem, IBM decided to include Spark in its own platforms of predictive analysis and machine learning.

IBM Watson Health Cloud will use Spark to healthcare providers and researchers as they have access to new health data of the population. At the same time, IBM will make available its SystemML machine learning technology open source. IBM is also collaborating with Databricks in changing Spark capabilities.

IBM will hire more than 3,500 researchers and developers to work on Spark-related projects in more than a dozen laboratories worldwide. The big blue company plans to open a Spark Technology Center in San Francisco for the Data Science and the developer community. IBM will also train Spark to more than one million data scientists and data engineers through partnerships with DataCamp, AMPLab, Galvanize, MetiStream, and Big Data University.

A typical large corporation will have hundreds or thousands of data sets that reside in different databases through their computer system. A data scientist can design an algorithm using to plumb the depths of any database. But is needs 90 working days of scientific data to develop the algorithm. Today, if you want to implement another system, it is a quarter of work to adjust the algorithm so that it works. Spark eliminates that time in half. The spark-based system can access and analyze any database, without development and no additional delay.

Spark has another virtue of ease of use where developers can concentrate on the design of the solution, rather than building an engine from scratch. Spark brings advances in data processing technology on a large scale because it improves the performance of data-dependent applications, radically simplifies the process of developing intelligent solutions and enables a platform capable of unifying all kinds of information on real work schemes.

Many experts consider Spark as the successor to Hadoop, but its adoption remains slow. Spark works very well for machine learning tasks that normally require running large clusters of computers. The latest version of the platform, which recently came out, extends to the machine learning algorithms to run.

Oracle Triggers an Avalanche of 24 Cloud Services to Compete with Amazon

oraclecloud 300x168 Oracle Triggers an Avalanche of 24 Cloud Services to Compete with AmazonThe Cloud represents a market where Oracle wants to win. The 24 services launched by the database leader include all the tools companies need to conduct their operations in the cloud and thus should help customers make the move to the cloud.

Oracle Cloud Platform was enriched by nearly 24 new services for developers, IT professionals, end users and analysts to achieve, to expand and more easily integrate cloud applications. They are Oracle Database Cloud – Exadata Service, Oracle Archive Storage Cloud Service, Oracle Big Data Cloud Service and Big Data SQL Cloud Service, Oracle Integration Cloud Service, Oracle Mobile Cloud Service and Oracle Process Cloud Service.

With the new services launched, companies can move all their applications hosted in the data center to the cloud Oracle. Oracle now claims to be the only cloud provider to offer a full range of enterprise software services, platform services and infrastructure services under the banner Oracle Cloud Platform. Compared to Oracle, cloud providers have chosen to focus on certain services: Salesforce.com has specialized in software and services Amazon has essentially focused its activity on infrastructure services.

The provider has also launched several integrated cloud services that must allow companies to move their operations to the cloud, including a service to develop and execute mobile applications directly from the Oracle cloud and an integration service which allows them to combine multiple enterprise applications into complete systems.

Oracle now offers online services for enterprise resource planning, managing the customer experience, management of human resources, management of business performance, and management services supply chain. The aim of these new offerings is to make Oracle a single window for all the cloud computing needs.

Oracle Databases deployed in the cloud as part of this service is 100% compatible with those deployed on-premises, thereby enabling a smooth migration to the cloud and seamless transition to a hybrid cloud strategy. The Oracle Big Data Service Cloud and Big Data SQL Cloud Service is a secure and efficient platform to run various loads on Hadoop and NoSQL databases and to help companies collect and organize their big data.

Oracle Mobile Cloud Service offers a set of Android applications development tools or iOS operating entirely in the cloud. The developer can use Mobile Cloud to build a user interface or to configure an API for data exchange. All development will be done entirely through a browser, so it is not necessary to install software on the desktop machine to each developer.

Developers can use their favorite languages ​​or go through the Mobile Application Framework Oracle framework. The service also includes a software development kit (SDK) that allows developers to follow their application, for example to know who uses it and how it is used.

The company also launched a service called Oracle Integration Cloud Service, which allows companies to work together their different enterprise applications and cloud services. Finally, Oracle has updated Business Intelligence Cloud Service, in particular by equipping it with new data visualization tools.

Oracle wants to clearly position themselves to Amazon, its main competitor, which also offers various solutions in the cloud. To attract customers, the company also said that the group is ready to compete with Amazon on price. Oracle claims 70 million users in the cloud. The services offered by Oracle are from 19 data centers spread across the planet, which manage 700 PB of data.

Major IT Players Form R Consortium to Strengthen Data Analysis

The Linux Foundation announced the formation of R Consortium, with the intention of strengthening technical and user communities around the R language, the open source programming language for statistical data analysis.

The new organization R Consortium became an official project of Linux Foundation and is designed to strengthen R language users.  It is expected that R Consortium will complement the existing fund, and will focus on expanding the user base of R, as well as focus on improving the interaction of users and developers.

The Representatives of the R Foundation and industry representatives are behind the new consortium. Microsoft and RStudio have joined the consortium as platinum members. TIBCO Software is a gold member and Alteryx, Google, HP, Mango Solutions, Ketchum Trading and Oracle have joined as silver members.

R Consortium will complement the work of R Foundation, establishing communication with user groups and engaging in supporting projects – related to the creation and maintenance of R mirror sites, testing, resources for quality control, the financial support and promotion of the language. Also, the consortium will assist in creating support packages for R and organizing other related software projects.

R is a programming language and development environment for scientific calculations and graphics that originated at the University of Auckland (New Zealand). The R language has enjoyed significant growth and now supports more than two million users. A wide grass industries adopted the R language, including biotech, finance, research and high-tech industries. The R language is integrated with frequency analysis, visualization, and reporting applications.

Having acquired the company Revolution Analytics (which makes strong use of language), Microsoft announced that it is joining the consortium together with other founding members such as Google, Oracle, HP, Tibcom, Rstudio, Alteryx to finance the new consortium.

Microsoft’s official said that “the R Consortium will complement the work of the R Foundation, a nonprofit organization that maintains the language, and will focus on user outreach and other projects designed to assist the R user and developer communities. This includes both technical and infrastructure projects such as building and maintaining mirrors for downloading R, testing, QA resources, financial support for the annual useR! Conference and promotion and support of worldwide user groups.”

Google also says they have thousands of users and their own developers using R, so this language is crucial for many of their products. Google is happy to join the rest of companies to continue to maintain the infrastructure of the open source R.

Microsoft’s support of real-time analytics for Apache Hadoop in Azure HDInsight and machine learning in Azure Marketplace use R language to service anomaly detection for preventive maintenance or detection of fraud.

Apache Spark 1.5.2 and new versions of Ganglia monitoring, Presto, Zeppelin, and Oozie now available in Amazon EMR

You can now deploy new applications on your Amazon EMR cluster. Amazon EMR release 4.2.0 now offers Ganglia 3.6, an upgraded version of Apache Spark (1.5.2), and upgraded sandbox releases of Apache Oozie (4.2.0), Presto (0.125), and Apache Zeppelin (0.5.5). Ganglia provides resource utilization monitoring for Hadoop and Spark. Oozie 4.2.0 includes several new features, such as adding Spark actions and HiveServer2 actions in your Oozie workflows. Spark 1.5.2, Presto 0.125, and Zeppelin 0.5.5 are maintenance releases, and contain bug fixes and other optimizations.

You can create an Amazon EMR cluster with release 4.2.0 by choosing release label “emr-4.2.0” from the AWS Management Console, AWS CLI, or SDK. You can specify Ganglia, Spark, Oozie-Sandbox, Presto-Sandbox, and Zeppelin-Sandbox to install these applications on your cluster. To view metrics in Ganglia or create a Zeppelin notebook, you can connect to the web-based UIs for these applications on the master node of your cluster. Please visit the Amazon EMR documentation for more information about Ganglia 3.6, Spark 1.5.2, Oozie 4.2.0, Presto 0.125, and Zeppelin 0.5.5.

Amazon EMR Update – Apache Spark 1.5.2, Ganglia, Presto, Zeppelin, and Oozie

  • Today we are announcing Amazon EMR release 4.2.0, which adds support for Apache Spark 1.5.2, Ganglia 3.6 for Apache Hadoop and Spark monitoring, and new sandbox releases for Presto (0.125), Apache Zeppelin (0.5.5), and Apache Oozie (4.2.0).

New Applications in Release 4.2.0
Amazon EMR provides an easy way to install and configure distributed big data applications in the Hadoop and Spark ecosystems on managed clusters of Amazon EC2 instances. You can create Amazon EMR clusters from the Amazon EMR Create Cluster Page in the AWS Management Console, AWS Command Line Interface (CLI), or using a SDK with EMR API. In the latest release, we added support for several new versions of applications:

  • Spark 1.5.2 – Spark 1.5.2 was released on November 9th, and we’re happy to give you access to it within two weeks of general availability. This version is a maintenance release, with improvements to Spark SQL, SparkR, the DataFrame API, and miscellaneous enhancements and bug fixes. Also, Spark documentation now includes information on enabling wire encryption for the block transfer service. For a complete set of changes, view the JIRA. To learn more about Spark on Amazon EMR, click here.
  • Ganglia 3.6 – Ganglia is a scalable, distributed monitoring system which can be installed on your Amazon EMR cluster to display Amazon EC2 instance level metrics which are also aggregated at the cluster level. We also configure Ganglia to ingest and display Hadoop and Spark metrics along with general resource utilization information from instances in your cluster, and metrics are displayed in a variety of time spans. You can view these metrics using the Ganglia web-UI on the master node of your Amazon EMR cluster. To learn more about Ganglia on Amazon EMR, click here.
  • Presto 0.125 – Presto is an open-source, distributed SQL query engine designed for low-latency queries on large datasets in Amazon S3 and the Hadoop Distributed Filesystem (HDFS). Presto 0.125 is a maintenance release, with optimizations to SQL operations, performance enhancements, and general bug fixes. To learn more about Presto on Amazon EMR, click here.
  • Zeppelin 0.5.5 – Zeppelin is an open-source interactive and collaborative notebook for data exploration using Spark. You can use Scala, Python, SQL, or HiveQL to manipulate data and visualize results. Zeppelin 0.5.5 is a maintenance release, and contains miscellaneous improvements and bug fixes. To learn more about Zeppelin on Amazon EMR, click here.
  • Oozie 4.2.0 – Oozie is a workflow designer and scheduler for Hadoop and Spark. This version now includes Spark and HiveServer2 actions, making it easier to incorporate Spark and Hive jobs in Oozie workflows. Also, you can create and manage your Oozie workflows using the Oozie Editor and Dashboard in Hue, an application which offers a web-UI for Hive, Pig, and Oozie. Please note that in Hue 3.7.1, you must still use Shell actions to run Spark jobs. To learn more about Oozie in Amazon EMR, click here.

Launch an Amazon EMR Cluster with Release 4.2.0 Today
To create an Amazon EMR cluster with 4.2.0, select release 4.2.0 on the Create Cluster page in the AWS Management Console, or use the release label emr-4.2.0 when creating your cluster from the AWS CLI or using a SDK with the EMR API.

Jon Fritz, Senior Product Manager

  • Now Available: Version 1.0 of the AWS SDK for Go

    by Jeff Barr | on | in Developers | Permalink
    Earlier this year, my colleague Peter Moon shared our plans to launch an AWS SDK for Go. As you will read in Peter’s guest post below, the SDK is now generally available!— Jeff;


    At AWS, we work hard to promote and serve the developer community around our products. This is one of the reasons we open-source many of our libraries and tools on GitHub, where we cherish the ability to directly communicate and collaborate with our developer customers. Of all the experiences we’ve had in the open source community, the story of how the AWS SDK for Go came about is one we particularly love to share.

    Since the day we took ownership of the project 10 months ago, community feedback and contributions have made it possible for us progress through the experimental and preview stages, and today we are excited to announce that the AWS SDK for Go is now at version 1.0 and recommended for production use. Like many of our projects, the SDK follows Semantic Versioning, which means starting from 1.0, you can upgrade the SDK within the same major version 1.x and have confidence your existing code will continue to work.

    Since the Developer Preview announcement in June, we have added a number of key improvements to the SDK, including:

    • Sessions – Easily share configuration and request handlers between clients.
    • JMESPATH support – Query and reshape complex API responses and other structures using simple expressions.
    • Paginators – Iterate over multiple pages of list-type API responses.
    • Waiters – Wait for asynchronous state changes in AWS resources.
    • Documentation – Revamped developer guide.

    Here’s a code sample that exercises some of these new features:

    Go
    // Create a session
    s := session.New(aws.NewConfig().WithRegion("us-west-2"))
    // Add a handler to print every API request for the session
    s.Handlers.Send.PushFront(func(r *request.Request) {
    	fmt.Printf("Request: %s/%s\n", r.ClientInfo.ServiceName, r.Operation)
    })
    // We want to start all instances in a VPC, so let's get their IDs first.
    ec2client := ec2.New(s)
    var instanceIDsToStart []*string
    describeInstancesInput := &ec2.DescribeInstancesInput{
    	Filters: []*ec2.Filter{
    		&ec2.Filter{
    			Name:   aws.String("vpc-id"),
    			Values: aws.StringSlice([]string{"vpc-82977de9"}),
    		},
    	},
    }
    // Use a paginator to easily iterate over multiple pages of response
    ec2client.DescribeInstancesPages(describeInstancesInput,
    	func(page *ec2.DescribeInstancesOutput, lastPage bool) bool {
    		// Use JMESPath expressions to query complex structures
    		ids, _ := awsutil.ValuesAtPath(page, "Reservations[].Instances[].InstanceId")
    		for _, id := range ids {
    			instanceIDsToStart = append(instanceIDsToStart, id.(*string))
    		}
    		return !lastPage
    	})
    // The SDK provides several utility functions for literal <--> pointer transformation
    fmt.Println("Starting:", aws.StringValueSlice(instanceIDsToStart))
    // Skipped for brevity here, but *always* handle errors in the real world 🙂
    ec2client.StartInstances(&ec2.StartInstancesInput{
    	InstanceIds: instanceIDsToStart,
    })
    // Finally, use a waiter function to wait until the instances are running
    ec2client.WaitUntilInstanceRunning(describeInstancesInput)
    fmt.Println("Instances are now running.") 
    

    We would like to again thank Coda Hale and our friends at Stripe for contributing the original code base and giving us a wonderful starting point for the AWS SDK for Go. Now that it is fully production-ready, we can’t wait to see all the innovative applications our customers will build with the SDK!

    For more information please see:

    Peter Moon, Senior Product Manager

  • AWS Device Farm Update – Test Web Apps on Mobile Devices

    by Jeff Barr | on | in AWS Device Farm | Permalink | Comments
    If you build mobile apps, you know that you have two implementation choices. You can build native or hybrid applications that compile to an executable file. You can also build applications that run within the device’s web browser.We launched the AWS Device Farm in July with support for testing native and hybrid applications on iOS and Android devices (see my post, AWS Device Farm – Test Mobile Apps on Real Devices, to learn more).

    Today we are adding support for testing browser-based applications on iOS and Android devices. Many customers have asked for this option and we are happy to be able to announce it. You can now create a single test run that spans any desired combination of supported devices and makes use of the Appium Java JUnit or Appium Java TestNG frameworks (we’ll add additional frameworks over time; please let us know what you need).

    Testing a Web App
    I tested a simple web app. It opens amazon.com and searches for the string “Kindle”. I opened the Device Farm Console and created a new project (Test Amazon Site). Then I created a new run (this was my second test, so I called it Web App Test #2):

    Then I configured the test by choosing the test type (TestNG) and uploading the tests (prepared for me by one of my colleagues):

    The file (chrome-with-screenshot.zip) contains the compiled test and the dependencies (a bunch of JAR files):

    Next, I choose the devices. I had already created a “pool” of Android devices, so I used it:

    I started the run and then checked in on it a few minutes later:

    Then I inspected the output, including screen shots, from a single test:

    Available Now
    This new functionality is available now and you can start using it today! Read the Device Farm Documentation to learn more.