How to build a single column, pipe delimited value from multiple rows’ values

I need to read multiple rows’ values for a column and group them into a single, pipe delimited value for a given key. When processing the data, there is not a fixed number of rows per key.

Example:

Raw Data:

Person          Language

A                       English

A                       Spanish

B                       English

B                       French

B                       German

Expected Result:

Person           Language

A                        English|Spanish

B                        English|French|German

 

How can I do this in Diyotta? I’m building my data flow using Splice Machine as the source.

 

Thanks!

Beginner Asked on September 20, 2019 in Diyotta Studio.
Add Comment
1 Answer(s)

Hi Breiney,

Splice Machine is currently not having any string aggregate functions where it is included as part of Rollup transformation. Like LISTAGG  function in snowflake and STRING_AGG function in MSSQL. It concatenates the values of string expressions and places separator values between them.

So in order to achieve this particular use case in Splice machine, we have following the below approach

Stage1: Create a temporary table with each row of a specific value in person column ranked.

drop table test_stage1;
create table test_stage1 (person varchar(10), language varchar(1000), rnk int);
insert into test_stage1 select person, language, row_number() over (partition by person order by language) rnk from column_delimit_test;

Stage2:  Create a table to hold the concatenated value of the column “language” for each distinct value in column “person”.

For this first insert the first row corresponding to each distinct value in column “person” and then iterate through the remaining rows for each of the value in the person column and update the target table with the value in the “language” column.

drop table test_stage2;
create table test_stage2 (person varchar(10), language varchar(1000));
insert into test_stage2 select person, language from test_stage1 where rnk = 1;

Stage3: Loop through the max count of rows for any distinct value in “person” column and keep updating the table with the value in “language” column for the value matching the “person” column.

update test_stage2 a set a.col2=a.col2||’|’||b.col2 from test_stage1 b where a.col1=b.col1 and b.rnk=<loop iteration value>;

In Diyotta I  have included the sql’s of stage1 and stage2 in the DBCommand.

RE: How to build a single column, pipe delimited value from multiple rows' values

In the looper, we used a select statement to get values starting from 2 to the max possible rows for any distinct value in “person” column. 

RE: How to build a single column, pipe delimited value from multiple rows' values

The looper parameter is replaced in the dataflow paramters present at the DesignJob level

RE: How to build a single column, pipe delimited value from multiple rows' values

The update statement is run using data flow.

Source Instance is filtered based on the value replaced by the dataflow parameter

RE: How to build a single column, pipe delimited value from multiple rows' values

Joiner present in the dataflow is performing the concat operation for the language column.

RE: How to build a single column, pipe delimited value from multiple rows' values

Target instance load type is selected as UPDATE and the unique column is checked.

RE: How to build a single column, pipe delimited value from multiple rows' values

Final Target Table data looks as shown below.

RE: How to build a single column, pipe delimited value from multiple rows' values

Expert Answered on September 23, 2019.
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.