Techdivinity Cloud IT-Software & Application Development Agency

CUSTOMER

Big Data (Santosh Shinde)

CUSTOMER

The Customer is a leading market research company.

CHALLENGE

Though having a robust analytical system, the Customer believed that it would not be able to satisfy the company’s future needs. Acknowledging this situation, the Customer was keeping their eyes open for a future-focused innovative solution. A system-to-be was to cope with the continuously growing amount of data, to analyze big data faster and enable comprehensive advertising channel analysis.

After deciding on the system’s-to-be architecture, the Customer was searching for a highly qualified and experienced team to implement the project. Satisfied with a long-lasting cooperation with ScienceSoft, the Customer addressed our consultants to do the entire migration from the old analytical system to a new one.

SOLUTION

During the project, the Customer’s business intelligence architects were cooperating closely with ScienceSoft’s big data team. The former designed an idea, and the latter was responsible for its implementation.

For the new analytical system, the Customer’s architects selected the following frameworks:

Apache Hadoop – for data storage;
Apache Hive – for data aggregation, query and analysis;
Apache Spark – for data processing.

Amazon Web Services and Microsoft Azure were selected as cloud computing platforms.

Upon the Customer’s request, during the migration, the old system and the new one were operating in parallel.

Overall, the solution included five main modules:

Data preparation
Staging
Data warehouse 1
Data warehouse 2
Desktop application

Data preparation

The system has been supplied with raw data taken from multiple sources, such as TV views, mobile devices browsing history, website visits data and surveys. To enable the system to process more than 1,000 different types of raw data (archives, XLS, TXT, etc.), data preparation included the following stages coded in Python:

Data transformation
Data parsing
Data merging
Data loading into the system.

Staging

Apache Hive formed the core of that module. At that stage, data structure was similar to raw data structure and had no established connections between respondents from different sources, for example, TV and internet.

Data warehouse 1

Similar to the previous block, that one also based on Apache Hive. There, data mapping took place. For example, the system processed the respondents’ data for radio, TV, internet and newspaper sources and linked users’ ID from different data sources according to the mapping rules. ETL for that block was written in Python.

Data warehouse 2

With Apache Hive and Spark as a core, the block guaranteed data processing on the fly according to the business logic: it calculated sums, averages, probabilities, etc. Spark’s DataFrames were used to process SQL queries from the desktop app. ETL was coded in Scala. Besides, Spark allowed filtering query results according to access rights granted to the system’s users.

Desktop application

The new system enabled a cross analysis of almost 30,000 attributes and built intersection matrices allowing multi-angled data analytics for different markets. In addition to standard reports, such as Reach Pattern, Reach Ranking, Time Spent, Share of Time, etc., the Customer was able to create ad hoc reports. After the Customer selected several parameters of interest (for example, a particular TV channel, group of customers, time of day), the system returned a quick reply in the form of easy-to-understand charts. The Customer could also benefit from forecasting. For example, based on expected reach and planned advertising budget, the system would forecast the revenue.

RESULTS

At the project closing stage, the new system was able to process several queries up to 100 times faster than the outdated solution. With the valuable insights that the analysis of almost 30,000 attributes brought, the Customer was able to carry out comprehensive advertising channel analysis for different markets.

TECHNOLOGIES AND TOOLS

Apache Hadoop, Apache Hive, Apache Spark, Python (ETL), Scala (Spark, ETL), SQL (ETL), Amazon Web Services (Cloud storage), Microsoft Azure (Cloud storage), .NET (desktop application).

Comments

Add Comment See All Comments

ftpisrbpgz@gmail.com
pittsburgh pirates bucket hat ukulele chords pandora gold disney charms pandora birthstone earrings august pandora cat ring pandora bee happy gel cumulus 21 asics adidas superstar womens white all black grey jordan 6 rings 3m metalleic silver spacesmacks http://www.spacesmacks.com/

qzifbk@gmail.com
coach tabby pleated mk red tote bag black panther coach wallet mk pink and white purse tiger short sleeve shirt corinthians jersey 2020 lebron james shoes 2008 iphone xs swarovski case rainbow cover for iphone 7 plus samsung galaxy s 21 ultra 5g price best case for iphone 2020 se minnesota twins fitted hat ladies clothing spacesmacks http://www.spacesmacks.com/

xpumun@gmail.com
red sox blue and yellow uniforms pandora heart and wings necklace christian louboutin choca end travis scott fragment dodgers 16 andre ethier pink womens fashion stitched mlb jersey nike sb city pack andylampert http://www.andylampert.net/

uvgukwlloy@gmail.com
michael kors medium brooke shoulder bag moncler puffer jacket womens sale moncler gilet cheap moncler vintage down jacket moncler alpin jacket womens bronx bombers hat for cheap james harden high top shoes louboutin heels iriza polo ralph lauren polo shirts womens saint laurent t shirt womens sleeveless rock t shirts womens dressy shirts moltenmama http://www.moltenmama.net/

qnqtnfol@gmail.com
black tie guest wedding dresses nike air revolution sky hi black patent adidas superstar track pants red nike air max 2014 unisexo rojo.negro evecalls http://www.evecalls.net/

Bringing your idea to life and in front of billions of eyes

CUSTOMER

Big Data (Santosh Shinde)

CATEGORY

Comments

Our Services

Company

Industries Experience