Expand my Community achievements bar.

SOLVED

depuplication based on other empty fields?

Avatar

Level 5

Hi, I've duplicated records in recipients table.

The deduplication field would be: national_id, that is the one repeated. However, I just want to keep the record that have info in the other fields: first_name and last_name. 

How to remove the recoeds that are duplicated, based one national_id,  and  have the first_name and last_name fields empty?


+-------------+------------+-----------+-------------+
| national_id | first_name | last_name |             email |
+-------------+------------+-----------+-------------+
| 123            |            jose |        perez | jp@mail.com |
+-------------+------------+-----------+-------------+
| 123             |                  |                  |                        |
+-------------+------------+-----------+-------------+



ogonzalesdiaz_0-1699025028725.png


When i select other, in the next screen, how to tell the deduplication activity to keep only the record where First Name is not empty. 

ogonzalesdiaz_1-1699025252923.png


the Deduplucation activity will just separete in a temp table the records not duplicated from the duplicated ones, right?. ... but how do I actually remove the duplicated ones from recipients? The ones that doesn't have first_name or last_name



1 Accepted Solution

Avatar

Correct answer by
Community Advisor

@ogonzalesdiaz , try this,

Temporary schema >

Identification of duplicates: keep national_id alone.

Duplication method:

Doubles to keep: 1

Method: Using an expression

expression: 

Iif(Length(@firstName)+Length(@lastName)>0,1 ,0 )

select 'keep the records with largest value' radio button.

ParthaSarathy_0-1699027776686.png

Now this will add your length of first name and last name. If both are empty, then the length will be 0, so the other national id with valid first name or last name will be in result.

View solution in original post

3 Replies

Avatar

Community Advisor

Hi @ogonzalesdiaz ,

Select temporary schema in first tab.

In 'identification of duplicates' tab, along with national_id, add firstName and LastName as well.

Avatar

Level 5

Hi @ParthaSarathy , 

I've added those in the identifcation of duplicated, but where do you tell Campaign to keep only the row where first and last name are not empty?


ogonzalesdiaz_0-1699027127950.png

 



ogonzalesdiaz_0-1699026602083.png


But this actually, would only separate the duplicated from the no duplicated records, right? 

How would I tell Campaign to remove the duplicated records, based on national_id, where first and last name are empty?

Edit 1: 

I'm getting this error from that config: 

03/11/2023 11:03:20 0 end ) FROM wkf68788481_15_1 W0 JOIN NmsRecipient R1 ON (R1.iRecipientId = W0.iId) WHERE ((R1.iRecipientId > 0 OR R1.iRecipientId < 0))' could not be executed.
03/11/2023 11:03:20 WDB-200001 SQL statement 'INSERT INTO wkf68788481_16_1 (iId,iInternalKeyField1,iInternalOrderNdx) SELECT W0.iId, W0.iId, Row_Number() OVER ( ORDER BY case when PARTITION BY R1.sCodinternocomputacional , R1.sFirstName , R1.sLastName IS NULL then 1 else
03/11/2023 11:03:20 PGS-220000 PostgreSQL error: ERROR: syntax error at or near "BY" LINE 1: ... Row_Number() OVER ( ORDER BY case when PARTITION BY R1.sCod... ^ .
03/11/2023 11:03:20 Executing query 'Result' (step 'Deduplication')

Avatar

Correct answer by
Community Advisor

@ogonzalesdiaz , try this,

Temporary schema >

Identification of duplicates: keep national_id alone.

Duplication method:

Doubles to keep: 1

Method: Using an expression

expression: 

Iif(Length(@firstName)+Length(@lastName)>0,1 ,0 )

select 'keep the records with largest value' radio button.

ParthaSarathy_0-1699027776686.png

Now this will add your length of first name and last name. If both are empty, then the length will be 0, so the other national id with valid first name or last name will be in result.