Sponsored Content
Top Forums Shell Programming and Scripting find numeric duplicates from 300 million lines.... Post 302669653 by pamu on Wednesday 11th of July 2012 07:35:44 AM
Old 07-11-2012
find numeric duplicates from 300 million lines....

these are numeric ids..
Code:
   222932017099186177      
   222932014385467392      
   222932017371820032      
   222932017409556480

I have text file having 300 millions of line as shown above. I want to find duplicates from this file. Please suggest the quicker way..
sort | uniq -d will take longer time and may run out of memory.

Thanks...
 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

duplicates lines with one column different

Hi I have the following lines in a file SANDI108085FRANKLIN WRAP 7285 SANDI109514ZIPLOC STRETCH N SEAL 7285 SANDI110198CHOICE DM 0911 SANDI111144RANDOM WEIGHT BRAND 0704 SANDI111144RANDOM WEIGHT BRAND 0738... (10 Replies)
Discussion started by: dhanamurthy
10 Replies

2. Shell Programming and Scripting

How to delete lines in a file that have duplicates or derive the lines that aper once

Input: a b b c d d I need: a c I know how to get this (the lines that have duplicates) : b d sort file | uniq -d But i need opossite of this. I have searched the forum and other places as well, but have found solution for everything except this variant of the problem. (3 Replies)
Discussion started by: necroman08
3 Replies

3. Shell Programming and Scripting

Tail 86000 lines from 1.2 million line file?

I have a log file that is about 1.2 million lines long and about 300MB. we need a way to clean up this file and only keep the last few thousand lines. if i use tail command we run our of memory as the file is too big. I do have a key word to match on. example, we want to keep every line... (8 Replies)
Discussion started by: robsonde
8 Replies

4. UNIX for Dummies Questions & Answers

Find and Replace random numeric value with non-numeric value

Can someone tell me how to change the first column in a very large 17k line file from a random 10 digit numeric value to a non numeric value. The format of lines in the file is: 1702938475,SNU022,201004 the first 10 numbers always begin with 170 (6 Replies)
Discussion started by: Bahf1s
6 Replies

5. UNIX for Dummies Questions & Answers

Only print lines with 3 numeric values

Hey guys & gals, I am hoping for some advice on a sed or awk command that will allow to only print lines from a file that contain 3 numeric values. From previous searches here I saw that ygemici used the sed command to remove lines containing more than 3 numeric values ; however how... (3 Replies)
Discussion started by: TAPE
3 Replies

6. UNIX for Dummies Questions & Answers

Help with changing header of tsv with 30 million lines

Hi My 30 million line file has a header chr start end strand ref_context repeat_masked s1_smpl_context s1_c_count s1_ct_count s1_non_ct_count s1_m% s1_score s1_snp s1_indels s2_smpl_context s2_c_count s2_ct_count s2_non_ct_count s2_m% s2_score s2_snp s2_indels ... (2 Replies)
Discussion started by: plumb_r
2 Replies

7. Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

Hi, I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines. My input file: comp100002 aaa bbb ccc ddd eee fff ggg comp100003 aba aba aba aba aba aba aba comp100003 fff fff fff fff fff fff fff... (5 Replies)
Discussion started by: falcox
5 Replies

8. Shell Programming and Scripting

Fast processing(mv command) of 1 million+ files using find, mv and xargs

Hi, I'd like to ask if anybody can help improve my code to move 1 million+ files from a directory to another: find /source/dir -name file* -type f | xargs -I '{}' mv {} /destination/dir I learned this line of code from this forum as well and it works fine. However, file movement is kinda... (6 Replies)
Discussion started by: agentgrecko
6 Replies
XOSD(1xosd)															       XOSD(1xosd)

NAME
osd_cat - X on-screen file displayer SYNOPSIS
osd_cat [OPTION] [FILE]... osd_cat -b percentage|slider [OPTION] DESCRIPTION
Display FILE, or standard input, on X screen. -p, --pos=POS This option tells osd_cat where to display the text. POS can be top, middle, or bottom. The default is top. -o, --offset=OFFSET This option specifies the offset from the top or bottom of screen the text is displayed. The default is 0. -A, --align=ALIGN This option tells osd_cat where to display the text. ALIGN can be left, right or center. The default is left. -i, --indent=OFFSET This option specifies the INDENT from the left of screen the text is displayed. The default is 0. -f, --font=FONT This option specifies the FONT to be used for displaying the text. The default is fixed. -c, --color=COLOR This option specifies the COLOR to be used for displaying the text. The default is red. -d, --delay=TIME This option specifies the number of seconds the text is displayed. The default is 5 seconds. -l, --lines=LINES This option specifies the number of LINES to scroll the display over. The default is 5. -s, --shadow=OFFSET This option specifies the OFFSET of the text shadow. The default is 0, which means no text shadow is created. -a, --age[=SCROLL_AGE] This option affects screen redrawing. If SCROLL_AGE seconds pass before a new line is ready (for example, you're reading from a pipe), all lines are cleared at once instead of being scrolled off as new lines replace old lines. The default is 0. When no SCROLL_AGE is explicitly given, the current value from DELAY is used. -w, --wait This option also affects screen redrawing. When there is data ready to be put on screen, this option will cause osd_cat to wait until the display is clear. An alternative to scrolling. -b, --barmode=TYPE Lets you display a percentage or slider bar instead of just text. TYPE may be percentage or slider. In this mode no text is read from any file, but the following options can be used: -P, --percentage=PERCENTAGE This option specified the position of the percentage / slider bar. PERCENTAGE may be in the range from 0 to 100, the default is 50. -T, --text=TEXT This option specifies an optional TEXT which gets displayed above the percentage bar. The default is empty, so no additional text is displayed. -h, --help display help (which is often more up to date) and exit With no FILE, or when FILE is -, read standard input. AUTHOR
Martijn van de Streek <martijn@foodfight.org>, Some patching done by Malcolm Valentine <farkit@iprimus.com.au> and Tim Wright <tim@ignavus.net>. xosd was written by Andre Renaud <andre@ignavus.net> and is maintained by Tim Wright <tim@ignavus.net> SEE ALSO
More information on the X OSD Library and its author can be found on http://www.ignavus.net/software.html <http://www.ignavus.net/software.html> COPYRIGHT
It is distributed under the GNU General Public License. X OSD cat January 2001 XOSD(1xosd)
All times are GMT -4. The time now is 09:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy