Bash 从特定的列名中剪切一个CSV文件

Bash: Cut a CSV File from specific Column Names


问题

我有一个CSV文件,里面有很多无用的信息,我想把我需要的信息从该文件中导入另一个CSV文件。

当前状态:

First Name,Middle Name, Last Name, Title, Suffix, Nickname, Given Yomi, Surname Yomi....
Angel,,Romero,,,Romi,, ....

在新文件中应该是这样的。

First Name, Last Name, Nickname
Angel, Romero, Romi

我想通过使用类似剪切和列名的方法来做到这一点,而不仅仅是字段编号。 像这样:

cut -d',' -f"First Name" file

我知道这行不通,但有别的办法吗?

因为你不需要重新排序, cut -d ',' -f '1,3,6' 就足够了
你问了 个完全相同的问题 。请编辑你的原问题,而不是开一个新问题
"我想通过使用类似剪切和列名的方法来做到这一点,而不仅仅是字段编号。" 为什么?如果这真的是你想要的,可以试试SQL。应该很容易导入到任何小型数据库中。
来自 csvkit csvcut 命令正是这样做的: csvcut -Sc 'First Name','Last Name' file.csv
@glennjackman 我试过了,但它总是告诉我 "csvcut: command not found",即使我安装了csvkit
答案1

该工具是 米勒

mlr --csv cut -o -f "field A","field B" input.csv >output.csv

这里 cut 动词的文档。

由于某些原因,我无法安装该工具。
@MahmoudAbdulkarim 你的操作系统是什么?你在安装过程中出现了什么错误?
答案2
awk -v tags='First Name,Last Name,Nickname' '
    BEGIN {
        FS=", *"; OFS=", "
        numOutFlds = split(tags,outFldNr2tag)
    }
    NR==1 {
        for (inFldNr=1; inFldNr<=NF; inFldNr++) {
            tag = $inFldNr
            tag2inFldNr[tag] = inFldNr
        }
    }
    {
        for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
            tag = outFldNr2tag[outFldNr]
            inFldNr = tag2inFldNr[tag]
            val = $inFldNr
            printf "%s%s", val, (outFldNr<numOutFlds ? OFS : ORS)
        }
    }
' file
First Name, Last Name, Nickname
Angel, Romero, Romi
答案3
$ cat csvcut.awk
# csvcut.awk

function csvsplit(str, arr,     i,j,n,s,fs,qt) {
    # split comma-separated fields into arr; return number of fields in arr
    # fields surrounded by double-quotes may contain commas;
    #     doubled double-quotes represent a single embedded quote
    delete arr; s = "START"; n = 0; fs = ","; qt = "\""
    for (i = 1; i <= length(str); i++) {
        if (s == "START") {
            if (substr(str,i,1) == fs) { arr[++n] = "" }
            else if (substr(str,i,1) == qt) { j = i+1; s = "INQUOTES" }
            else { j = i; s = "INFIELD" } }
        else if (s == "INFIELD") {
            if (substr(str,i,1) == fs) {
                arr[++n] = substr(str,j,i-j); j = 0; s = "START" } }
        else if (s == "INQUOTES") {
            if (substr(str,i,1) == qt) { s = "MAYBEDOUBLE" } }
        else if (s == "MAYBEDOUBLE") {
            if (substr(str,i,1) == fs) {
                arr[++n] = substr(str,j,i-j-1)
                gsub(qt qt, qt, arr[n]); j = 0; s = "START" } } }
    if (s == "INFIELD" || s == "INQUOTES") { arr[++n] = substr(str,j) }
    else if (s == "MAYBEDOUBLE") {
        arr[++n] = substr(str,j,length(str)-j); gsub(qt qt, qt, arr[n]) }
    else if (s == "START") { arr[++n] = "" }
    return n }

BEGIN { # read and store output field names
    for (i=1; i<ARGC; i++) { fields[++nfields] = ARGV[i]; ARGV[i] = "" } }

NR == 1 { # read and store input field names, write output header
    for (i=1; i<=csvsplit($0,arr); i++) { names[arr[i]] = i }
    for (i=1; i<=nfields; i++) { printf "%s%s", sep, fields[i]; sep = "," }
    printf "\n" }

NR > 1 { # read input record, split fields, write output record
    delete csv; sep = ""; n = csvsplit($0, csv)
    for (i=1; i<=nfields; i++) {
        printf "%s%s", sep, csv[names[fields[i]]]; sep = "," }
    printf "\n" }
$ cat mahmoud.input
FirstName,MiddleName,LastName,Title,Suffix,Nickname,GivenYomi,SurnameYomi
Angel,,Romero,,,Romi,,
$ awk -f csvcut.awk FirstName LastName Nickname <mahmoud.input
FirstName,LastName,Nickname
Angel,Romero,Romi
答案4

鉴于你有一个没有变量空间的直接CSV,你可以直接使用Ruby的csv解析器(不用先清理csv文件......)

鉴于:

cat file
First Name,Middle Name,Last Name,Title,Suffix,Nickname,Given Yomi,Surname Yomi
Angel,,Romero,,,Romi,,

你可以直接过滤每个csv行:

ruby -r CSV -e 'BEGIN{wanted=["First Name", "Last Name", "Nickname"]
                      puts wanted.to_csv
                      }     
CSV.parse($<.read, headers:true).each{
    |h| puts h.to_hash.select{
    |k,v| wanted.include?(k) }.values.to_csv}' file

打印:

First Name,Last Name,Nickname
Angel,Romero,Romi

这里的好处是支持完整的csv文件,包括带有内嵌定界符的引号字段。