RE: How to build a single column, pipe delimited value from multiple rows’ values
I need to read multiple rows’ values for a column and group them into a single, pipe delimited value for a given key. When processing the data, there is not a fixed number of rows per key.
How can I do this in Diyotta? I’m building my data flow using Splice Machine as the source.
Splice Machine is currently not having any string aggregate functions where it is included as part of Rollup transformation. Like LISTAGG function in snowflake and STRING_AGG function in MSSQL. It concatenates the values of string expressions and places separator values between them.
So in order to achieve this particular use case in Splice machine, we have following the below approach
Stage1: Create a temporary table with each row of a specific value in person column ranked.
drop table test_stage1;
create table test_stage1 (person varchar(10), language varchar(1000), rnk int);
insert into test_stage1 select person, language, row_number() over (partition by person order by language) rnk from column_delimit_test;
Stage2: Create a table to hold the concatenated value of the column “language” for each distinct value in column “person”.
For this first insert the first row corresponding to each distinct value in column “person” and then iterate through the remaining rows for each of the value in the person column and update the target table with the value in the “language” column.
drop table test_stage2;
create table test_stage2 (person varchar(10), language varchar(1000));
insert into test_stage2 select person, language from test_stage1 where rnk = 1;
Stage3: Loop through the max count of rows for any distinct value in “person” column and keep updating the table with the value in “language” column for the value matching the “person” column.
update test_stage2 a set a.col2=a.col2||’|’||b.col2 from test_stage1 b where a.col1=b.col1 and b.rnk=<loop iteration value>;
In Diyotta I have included the sql’s of stage1 and stage2 in the DBCommand.
In the looper, we used a select statement to get values starting from 2 to the max possible rows for any distinct value in “person” column.
The looper parameter is replaced in the dataflow paramters present at the DesignJob level
The update statement is run using data flow.
Source Instance is filtered based on the value replaced by the dataflow parameter
Joiner present in the dataflow is performing the concat operation for the language column.
Target instance load type is selected as UPDATE and the unique column is checked.
Final Target Table data looks as shown below.