As an option, deduping should ideally look and the root domain or root URL less any variables after the ? in an URL and treating http the same as https as well as ignoring leading www subdomain.